ORA-15137: The ASM cluster is in rolling patch state

SYMPTOM

While adding a new disk into an existing diskgroup, below errors occurred:

ORA-15032: not all alterations performed
ORA-15137: The ASM cluster is in rolling patch state.

INVESTIGATION

1) On both nodes, it shows the cluster in “In Rolling Patch”, and the patch levels are all the same.

SQL> SELECT SYS_CONTEXT('SYS_CLUSTER_PROPERTIES', 'CLUSTER_STATE') 
     FROM DUAL;

SYS_CONTEXT('SYS_CLUSTER_PROPERTIES','CLUSTER_STATE')
--------------------------------------------------------------------
In Rolling Patch

SQL> SELECT SYS_CONTEXT('SYS_CLUSTER_PROPERTIES', 'CURRENT_PATCHLVL') 
     FROM DUAL;

SYS_CONTEXT('SYS_CLUSTER_PROPERTIES','CURRENT_PATCHLVL')
--------------------------------------------------------------------
3628626982

$ asmcmd
ASMCMD> showclusterstate
In Rolling Patch

ASMCMD> showpatches
---------------
List of Patches
===============
26609817
26609966
26839277
27105253
27128906
27144050
27335416
27458609
27464465
27674384

ASMCMD> showversion
ASM version         : 12.2.0.1.0

2) “crsctl query crs softwarepatch” shows the same results on both nodes:

$ crsctl query crs softwarepatch
Oracle Clusterware patch level on node racnode1 is [3628626982].

$ crsctl query crs softwarepatch
Oracle Clusterware patch level on node racnode2 is [3628626982].

3) “crsctl query crs releasepatch” shows the same results on both nodes:

— racnode1:

$crsctl query crs releasepatch
Oracle Clusterware release patch level is [3628626982] and the complete 
list of patches [26609817 26609966 26839277 27105253 27128906 27144050 
27335416 27458609 27464465 27674384 ] have been applied on the local node.

— racnode2:

$crsctl query crs releasepatch
Oracle Clusterware release patch level is [3628626982] and the complete 
list of patches [26609817 26609966 26839277 27105253 27128906 27144050 
27335416 27458609 27464465 27674384 ] have been applied on the local node.

4) kfod command shows the same results on both nodes:

— racnode1:

$ $ORACLE_HOME/bin/kfod op=patchlvl
-------------------
Current Patch level
===================
3628626982

$ $ORACLE_HOME/bin/kfod op=patches
---------------
List of Patches
===============
26609817
26609966
26839277
27105253
27128906
27144050
27335416
27458609
27464465
27674384

— racnode2:

$ $ORACLE_HOME/bin/kfod op=patchlvl
-------------------
Current Patch level
===================
3628626982

$ $ORACLE_HOME/bin/kfod op=patches
---------------
List of Patches
===============
26609817
26609966
26839277
27105253
27128906
27144050
27335416
27458609
27464465
27674384

5) lsinventory shows the same results on both nodes:

$ $ORACLE_HOME/OPatch/opatch lsinventory | grep -i desc
ARU platform description:: Linux x86-64
Patch description: "Database Apr 2018 Release Update : 12.2.0.1.180417 (27674384)"
Patch description: "OCW APR 2018 RELEASE UPDATE 12.2.0.1.0(180129) (27464465)"
Patch description: "ACFS APR 2018 RELEASE UPDATE 12.2.0.1.0(180129) (27458609)"
Patch description: "Tomcat Release Update 12.2.0.1.0(ID:171023.0830) (27144050)"
Patch description: "DBWLM RELEASE UPDATE 12.2.0.1.0(ID:170913) (26839277)"

SOLUTIONS

$crsctl stop rollingpatch
CRS-1161: The cluster was successfully patched to patch level [3628626982].

recheck with above commands, the cluster status is changed from “In Rolling Patch” to “Normal” now.

SQL> SELECT SYS_CONTEXT('SYS_CLUSTER_PROPERTIES', 'CLUSTER_STATE') 
     FROM DUAL;

SYS_CONTEXT('SYS_CLUSTER_PROPERTIES','CLUSTER_STATE')
--------------------------------------------------------------------------------
Normal

Generally speaking, For “ORA-15137: The ASM cluster is in rolling patch state” issue, the below steps can be followed one after another, until the issue is resolved:

a) stop rolling patch status one node by another:

SQL>Alter system stop rolling patch;

b) stop rolling patch for whole cluster:

$crsctl stop rollingpatch

c) In case of  postpatch not complete successfully for some reason, which can also cause this issue:

— As super root user :

 $GRID_HOME/crs/install/rootcrs.sh -prepatch 
 $GRID_HOME/crs/install/rootcrs.sh -postpatch

d) For some reason, OCR is not updated with right patchlevel:

$GRID_HOME/crs/install/rootcrs.sh -prepatch 
$GI_HOME/bin/clscfg -patch
$GRID_HOME/crs/install/rootcrs.sh -postpatch

e) For some reason,  patches are available from “opatch lsinventory”, but they are missing from kfod output:

-- as super user
$GRID_HOME/crs/install/rootcrs.sh -prepatch 

-- as grid owner,
$GRID_HOME/bin/patchgen commit -pi 12345678 
$GRID_HOME/bin/patchgen commit -pi 23456789 

-- as super user
$GRID_HOME/crs/install/rootcrs.sh -postpatch

ORA-27140 ORA-27300 ORA-27301 ORA-27302 ORA-273003 When Start Up or Shut Down RAC Instance

ORACLE_HOME is shared by both RAC instance A and RAC instance B. Both instance A and B are shut down, then ORACLE_HOME is patched, the ORACLE_HOME/bin/oracle binary will be as below just after the patching:

$ id oracle
uid=122(oracle) gid=202(oinstall) groups=202(oinstall),201(dba)

$ id grid
uid=518(grid) gid=202(oinstall) groups=202(oinstall),201(dba)

$ ls -ltr /dev/oracleasm/disks/
total 0
brw-rw---- 1 grid dba 253, 16 Jul 30 16:38 OCR_VOTE01
brw-rw---- 1 grid dba 253, 50 Jul 30 16:38 ASM_FRA01
brw-rw---- 1 grid dba 253, 57 Jul 30 16:38 ASM_DISK01
..
.

$ ls -ltr /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle
-rwsr-s--x 1 oracle oinstall 228655023 May 17 11:11 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

Manually start up instance A by using sqlplus without using ‘srvctl start instance’, and check oracle binary gid( oinstall) still the same:

$ sqlplus / as sysdba

...
..
.
Connected to an idle instance.

SQL> startup
ORACLE instance started.
...
..
.
Database mounted.
Database opened.
SQL> exit
$
$ ls -ltr /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle
-rwsr-s--x 1 oracle oinstall 228655023 May 17 11:11 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

Then start up instance B through clusterware by using ‘srvctl start ‘, and noticed clusterware changed the group of the oracle binary:

$srvctl start database -d TESTB
$ ls -ltr /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle
-rwsr-s--x 1 oracle dba 228655023 May 17 11:11 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

The instance A got below errors in trace files and alert log:

Alert.log:

Process m000 died, see its trace file
Wed Jul 25 16:37:42 2018
Process m000 died, see its trace file
..
.
Process q000 died, see its trace file
Wed Jul 25 16:41:54 2018
Process W000 died, see its trace file
Process W000 died, see its trace file

Trace file:

$ cat TESTA1_m000_7813.trc
..
.
/u01/app/oracle/diag/rdbms/testa/TESTA1/trace/TESTA1_m000_7813.trc
ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1

*** 2018-07-25 16:35:55.662
Died during process startup with error 27140 (seq=39770)
OPIRIP: Uncaught error 27140. Error stack:
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 202 (oinstall), 
current egid = 201 (dba)

crsd_oraagent_oracle trace file shows clusterware is changed RAC $ORACLE_HOME/bin/oracle binary by using “setasmgidwrap” while instance B is started up by using ‘srvctl start ‘ by clusterware.

crsd_oraagent_oracle_42.trc:2018-07-25 16:34:31.584046 :CLSDYNAM:2785666816: [ora.testb.db]{1:44997:10712} [start] command = '/u01/app/12.1.0.2/grid/bin/setasmgidwrap oracle_binary_path=/u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle'
crsd_oraagent_oracle_42.trc:2018-07-25 16:34:31.584068 :CLSDYNAM:2785666816: [ora.testb.db]{1:44997:10712} [start] Utils:execCmd action = 1 flags = 6 ohome = /u01/app/12.1.0.2/grid cmdname = setasmgidwrap.
crsd_oraagent_oracle_42.trc:2018-07-25 16:43:38.726230 :CLSDYNAM:2128570112: [ora.testb.db]{1:44997:10835} [start] command = '/u01/app/12.1.0.2/grid/bin/setasmgidwrap oracle_binary_path=/u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle'
crsd_oraagent_oracle_42.trc:2018-07-25 16:43:38.726270 :CLSDYNAM:2128570112: [ora.testb.db]{1:44997:10835} [start] Utils:execCmd action = 1 flags = 6 ohome = /u01/app/12.1.0.2/grid cmdname = setasmgidwrap.

CONCLUSION:

Always shutdown or startup RAC instances by using ‘srvctl start” through clusterware. Otherwise you might get ORA-27140 ORA-27300 ORA-27301 ORA-27302 ORA-273003 errors.

CRS-2412: The Cluster Time Synchronization Service detects that the local time is significantly different from the mean cluster time

In crs alert.log, below message is seen every 30 minutes in 12.2.0.1 GI.

[OCTSSD(9932)]CRS-2412: The Cluster Time Synchronization Service 
detects that the local time is significantly different from the mean cluster time

ctss is running in observer mode, because NTP is running.

$ crsctl check ctss
CRS-4700: The Cluster Time Synchronization Service is in Observer mode.

Subscribe to get access

Read more of this content when you subscribe today.

How to Apply GI APR 2018 RU Patches

Here is the GI environment to be patched:

GI HOME:  12.2.0.1
DB HOMES: 12.2.0.1
          12.1.0.2
          11.2.0.4

Patches downloaded :

GI HOME: 12.2.0.1 : Combo OJVM Update 12.2.0.1.180417 and 
                    GI Update 12.2.0.1.180417 Patch 27726454, or
- 27468969/
          27674384 DATABASE APR 2018 RU 12.2.0.1.180417
          27464465 OCW APR 2018 RU 12.2.0.1.0.180417
          27458609 ACFS APR 2018 RU 12.2.0.1.0.180417 Only Grid Home
          26839277 DBWLM RELEASE UPDATE 12.2.0.1.170913 Only Grid Home
          27144050 TOMCAT RELEASE UPDATE 12.2.0.1.171023.0830 Only Grid Home
- 27475613/Oracle JavaVM Component Release Update 12.2.0.1.180417

--
-- 12.1.0.2
Combo OJVM PSU 12.1.0.2.180417 and GI PSU 12.1.0.2.180417 Patch 27726478
- 27475603/
          Oracle JavaVM Component 12.1.0.2.180417 Database PSU
- 27468957/
          27338013
          26983807 
          27338020  OCW 
          27338041  Database PSU 12.1.0.2.180417 Patch 27726471

--
-- 11.2.0.4
Combo OJVM PSU 11.2.0.4.180417 and GI PSU 11.2.0.4.180417 Patch 27726505
- 27475913/
           27338049 Database PSU 11.2.0.4.180417 Patch 27338049
           27441052 OCW PSU 11.2.0.4.180417
           22502505 ACFS PSU 11.2.0.4.160419

-27475598/Oracle JavaVM (OJVM) Component Database PSU 11.2.0.4.180417

1) Apply Apr 2018 RU 12.2.0.1.180417 for 12.2.0.1 GI HOME and DB HOME.

$GI_HOME/OPatch/opatchauto apply /PATCHES/27726454/27468969

2) Apply April 2018 RU for 12.1.0.2 DB HOME.

-- DB   
$cd $PATCH_TOP_LOCATION/27726478/27468957/27338041
$ opatch apply

-- OCW
$ $PATCH_TOP_LOCATION/27726478/27468957/27338020/custom/scripts/prepatch.sh 
 -dbhome  /u01/app/oracle/product/12.1.0/dbhome_1
$ opatch napply $PATCH_TOP_LOCATION/27726478/27468957/27338020 -local 
 -oh /u01/app/oracle/product/12.1.0/dbhome_1 -id 27338020
$ $PATCH_TOP_LOCATION/27726478/27468957/27338020/custom/scripts/postpatch.sh 
  -dbhome -dbhome /u01/app/oracle/product/12.1.0/dbhome_1

--OJAVA
$cd $PATCH_TOP_LOCATION/27726478/27475603
$ opatch apply -local

3)Apply April 2018 RU for 11.2.04 DB HOME.

-- DB PSU
$cd 27338049
$opatch prereq CheckConflictAgainstOHWithDetail -ph ./
$opatch apply or opatch apply -local ( RAC )

-- OJVM
$cd <PATCH_TOP_DIR>/27475598
$opatch prereq CheckConflictAgainstOHWithDetail -ph ./
$opatch apply or  opatch apply -local (if RAC)

--OCW
--11g version 
--chmod 750 on both files below if permission is not correct.

 $ ls -ltr 27475913/27441052/custom/scripts/p*patch.sh
-rw-r--r--. 1 oracle oinstall 4566 Feb 5 20:05 27475913/27441052/custom/scripts/prepatch.sh
-rw-r--r--. 1 oracle oinstall 8417 Feb 5 20:05 27475913/27441052/custom/scripts/postpatch.sh

$ chmod 750 27475913/27441052/custom/scripts/prepatch.sh
$ chmod 750 27475913/27441052/custom/scripts/postpatch.sh
$ $PATCH_TOP_LOCATION/27726505/27475913/27441052/custom/scripts/prepatch.sh 
  -dbhome /u01/app/oracle/product/11.2.0/dbhome_1
$ opatch napply -oh /u01/app/oracle/product/11.2.0/dbhome_1 
 -local PATCH_TOP_LOCATION/27726505/27475913/27441052/custom/server/27441052
$ $PATCH_TOP_LOCATION/27726505/27475913/27441052/custom/scripts/postpatch.sh 
-dbhome /u01/app/oracle/product/11.2.0/dbhome_1

4)Run “datapatch” or post patch SQL as per instructions.

RAC Database Hanging with “gc buffer busy acquire” and “gc buffer busy request” Wait Events

SYMPTOM

RAC database hangs for a short period of time with high “gc buffer busy acquire” and “gc buffer busy request” wait events by sql id =”4vs91dcv7u1p6″:

insert into sys.aud$( sessionid, entryid, statement, ntimestamp#, 
userid, userhost, terminal, action#, returncode, obj$creator, 
obj$name, auth$privileges, auth$grantee, new$owner, new$name, 
ses$actions, ses$tid, logoff$pread, logoff$lwrite, logoff$dead, 
comment$text, spare1, spare2, priv$used, clientid, sessioncpu, 
proxy$sid, user$guid, instance#, process#, xid, scn, auditid, 
sqlbind, sqltext, obj$edition, dbid) values(:1, :2, :3, 
SYS_EXTRACT_UTC(SYSTIMESTAMP), :4, :5, :6, :7, :8, :9, :10, :11, 
:12, :13, :14, :15, :16, :17, :18, :19, :20, :21, :22, :23, :24, 
:25, :26, :27, :28, :29, :30, :31, :32, :33, :34, :35, :36)

AWR report shows AUD$ table is the segment with the highest “Global Cache Buffer Busy” wait event:

CAUSE

With AUDIT_TRAIL set for database (AUDIT_TRAIL=db), and the AUD$ and FGA_LOG$ tables located in a dictionary segment space managed SYSTEM tablespace, “gc” wait events are sometimes observed during heavy periods of database logon activity.

SOLUTION

Move AUD$ and  FGA_LOG$ tables onto an ASSM tablespace like SYSAUX. On production, make sure to run it during off peak hours.

set pagesize 120
set linesize 180
SELECT T.TABLE_NAME, TS.SEGMENT_SPACE_MANAGEMENT
FROM DBA_TABLES T, DBA_TABLESPACES TS
WHERE TS.TABLESPACE_NAME = T.TABLESPACE_NAME
AND T.TABLE_NAME = 'AUD$';

TABLE_NAME SEGMEN
---------- ------
AUD$       MANUAL

select TABLESPACE_NAME,SEGMENT_SPACE_MANAGEMENT 
  from DBA_TABLESPACES 
 where TABLESPACE_NAME='SYSAUX'; 

TABLESPACE_NAME  SEGMEN 
---------------- ------ 
SYSAUX           AUTO

select TABLE_NAME,TABLESPACE_NAME,SEGMENT_NAME
  from dba_lobs
 where OWNER='SYS' and TABLE_NAME='AUD$';
      
TABLE_NAME TABLESPACE_NAME SEGMENT_NAME
---------- --------------- -------------------------
AUD$       SYSTEM          SYS_LOB0000000384C00040$$
AUD$       SYSTEM          SYS_LOB0000000384C00041$$

select SEGMENT_NAME,segment_type,tablespace_name
  from dba_segments 
 where segment_name in (
       select INDEX_NAME 
         from dba_indexes 
        where OWNER='SYS'
          and TABLE_NAME='AUD$');

SEGMENT_NAME             SEGMENT_TYPE TABLESPACE_NAME
------------------------ ------------ ---------------
SYS_IL0000000384C00041$$ LOBINDEX      SYSTEM
SYS_IL0000000384C00040$$ LOBINDEX      SYSTEM

Move AUD$ table:

SQL> EXEC DBMS_AUDIT_MGMT.SET_AUDIT_TRAIL_LOCATION(AUDIT_TRAIL_TYPE 
=> DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
   AUDIT_TRAIL_LOCATION_VALUE => 'SYSAUX');

PL/SQL procedure successfully completed.

Move FGA_LOG$ table if needed:

SQL> EXEC DBMS_AUDIT_MGMT.SET_AUDIT_TRAIL_LOCATION(AUDIT_TRAIL_TYPE 
=> DBMS_AUDIT_MGMT.AUDIT_TRAIL_FGA_STD,
   AUDIT_TRAIL_LOCATION_VALUE => 'SYSAUX');

PL/SQL procedure successfully completed.

Check the results.

select TABLE_NAME,TABLESPACE_NAME,SEGMENT_NAME
from dba_lobs
where OWNER='SYS' and TABLE_NAME='AUD$';

TABLE_NAME TABLESPACE_NAME SEGMENT_NAME
---------- -------------- --------------------------
AUD$       SYSAUX          SYS_LOB0000000384C00040$$
AUD$       SYSAUX          SYS_LOB0000000384C00041$$

select SEGMENT_NAME,segment_type,tablespace_name
  from dba_segments 
  where segment_name in (
        select INDEX_NAME 
          from dba_indexes 
         where OWNER='SYS'
         and TABLE_NAME='AUD$');

SEGMENT_NAME             SEGMENT_TYPE TABLESPACE_NAME
------------------------ ------------ ---------------
SYS_IL0000000384C00041$$ LOBINDEX     SYSAUX
SYS_IL0000000384C00040$$ LOBINDEX     SYSAUX

select count(*) from dba_objects where status!='VALID';

COUNT(*)
--------
0