How to Upgrade Linux OS In RAC Environment

This post introduces how to upgrade Linux OS from RHEL 5 to RHEL 7 with zero downtime for two nodes RAC. 

Subscribe to get access

Read more of this content when you subscribe today.

CRS-2883: Resource ‘ora.cluster_interconnect.haip’ failed during Clusterware stack start

When starting up second node CRS of 12.1.0.2 RAC, it failed with “CRS-2883” error.

root@racnode2]# /u01/app/12.1.0/grid/bin/crsctl start crs -wait
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'racnode2'
CRS-2672: Attempting to start 'ora.evmd' on 'racnode2'
CRS-2676: Start of 'ora.mdnsd' on 'racnode2' succeeded
CRS-2676: Start of 'ora.evmd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'racnode2'
CRS-2676: Start of 'ora.gpnpd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'racnode2'
CRS-2676: Start of 'ora.gipcd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'racnode2'
CRS-2676: Start of 'ora.cssdmonitor' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'racnode2'
CRS-2672: Attempting to start 'ora.diskmon' on 'racnode2'
CRS-2676: Start of 'ora.diskmon' on 'racnode2' succeeded
CRS-2676: Start of 'ora.cssd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode2'
CRS-2672: Attempting to start 'ora.ctssd' on 'racnode2'
CRS-2883: Resource 'ora.cluster_interconnect.haip' failed during Clusterware stack start.
CRS-4406: Oracle High Availability Services synchronous start failed.
CRS-4000: Command Start failed, or completed with errors.

Check ohasd_orarootagent_root.trc file:

...
..
.
2019-07-29 19:41:59.985295 : USRTHRD:3605423872: {0:9:3} HAIP: to set HAIP
2019-07-29 19:42:00.036464 : USRTHRD:3605423872: {0:9:3} HAIP: number of inf from clsinet -- 1
2019-07-29 19:42:00.037488 : CSSCLNT:3605423872: clssgsgrppridata: buffer too small - bufsize(4) < datasize(8)
2019-07-29 19:42:00.037795 : USRTHRD:3605423872: {0:9:3} CssGroup::getPrivateGroupData clssgsgrppridata() error, rc = 13
2019-07-29 19:42:00.037868 : USRTHRD:3605423872: {0:9:3} [NetHAMain] thread hit exception CssGroup::getPrivateGroupData clssgsgrppridata() error
2019-07-29 19:42:00.037881 : USRTHRD:3605423872: {0:9:3} [NetHAMain] thread stopping
...
..
.

CAUSE

Patch 29698592 (  Grid Infrastructure Patch Set Update 12.1.0.2.190716 ) has been applied onto first node, but it hasn’t been applied onto second node.

RESOLUTION

All nodes should be patched with same GI patches. Since CRS cannot be started up on 2nd node, so  opatchauto cannot be used.  All GI patches should be manually applied onto second node.

On second node:

1)Kill all the cluster processes manually.

2) Make GI_HOME read/write for GI owner “grid”

#chmod -R 775 $GI_HOME

3) Manually apply GI patches.

--OCW PSU
[grid@racnode2]$ $GI_HOME/OPatch/opatch apply -oh $GI_HOME -local /tmp/29698592/29509318

--ACFS PSU
[grid@racnode2]$ $GI_HOME/OPatch/opatch apply -oh $GI_HOME -local /tmp/29698592/29423125

--DBWLM PSU
[grid@racnode2]$ $GI_HOME/OPatch/opatch apply -oh $GI_HOME -local /tmp/29698592/26983807

--DB PSU
[grid@racnode2]$ $GI_HOME/OPatch/opatch apply -oh $GI_HOME -local /tmp/29698592/29494060

4) Starting up CRS still fails.

[root@racnode2]# $GI_HOME/bin/crsctl start crs -wait
CRS-6706: Oracle Clusterware Release patch level ('3536172590') does 
not match Software patch level ('0'). Oracle Clusterware cannot be started.
CRS-4000: Command Start failed, or completed with errors.

Still on node 2.

For 12.2, Execute “<GI_HOME>/crs/install/rootcrs.pl -prepatch”  “<GI_HOME>/crs/install/rootcrs.pl -postpatch” and as <root_user>  the patch level should be corrected.

For 12.1

[root@racnode2 ]# $GI_HOME/crs/install/rootcrs.sh -patch
Using configuration parameter file: /u01/app/12.1.0/grid/crs/install/crsconfig_params
2019/07/29 22:26:41 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.

2019/07/29 22:28:40 CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.

2019/07/29 22:28:56 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'

CRS-4123: Oracle High Availability Services has been started.
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'racnode2'
CRS-2672: Attempting to start 'ora.evmd' on 'racnode2'
CRS-2676: Start of 'ora.mdnsd' on 'racnode2' succeeded
CRS-2676: Start of 'ora.evmd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'racnode2'
CRS-2676: Start of 'ora.gpnpd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'racnode2'
CRS-2676: Start of 'ora.gipcd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'racnode2'
CRS-2676: Start of 'ora.cssdmonitor' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'racnode2'
CRS-2672: Attempting to start 'ora.diskmon' on 'racnode2'
CRS-2676: Start of 'ora.diskmon' on 'racnode2' succeeded
CRS-2676: Start of 'ora.cssd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode2'
CRS-2672: Attempting to start 'ora.ctssd' on 'racnode2'
CRS-2676: Start of 'ora.ctssd' on 'racnode2' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'racnode2' succeeded
CRS-2679: Attempting to clean 'ora.asm' on 'racnode2'
CRS-2681: Clean of 'ora.asm' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'racnode2'
CRS-2676: Start of 'ora.asm' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'racnode2'
CRS-2676: Start of 'ora.storage' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'racnode2'
CRS-2676: Start of 'ora.crf' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'racnode2'
CRS-2676: Start of 'ora.crsd' on 'racnode2' succeeded
CRS-6017: Processing resource auto-start for servers: racnode2
CRS-2672: Attempting to start 'ora.net1.network' on 'racnode2'
CRS-2676: Start of 'ora.net1.network' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.ons' on 'racnode2'
CRS-2673: Attempting to stop 'ora.racnode2.vip' on 'racnode1'
CRS-2677: Stop of 'ora.racnode2.vip' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.racnode2.vip' on 'racnode2'
CRS-2676: Start of 'ora.ons' on 'racnode2' succeeded
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'racnode1'
CRS-2676: Start of 'ora.racnode2.vip' on 'racnode2' succeeded
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'racnode1'
CRS-2672: Attempting to start 'ora.LISTENER.lsnr' on 'racnode2'
CRS-2677: Stop of 'ora.scan1.vip' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.scan1.vip' on 'racnode2'
CRS-2676: Start of 'ora.scan1.vip' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'racnode2'
CRS-2676: Start of 'ora.LISTENER.lsnr' on 'racnode2' succeeded
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.ractest.db' on 'racnode2'
CRS-2676: Start of 'ora.ractest.db' on 'racnode2' succeeded
CRS-6016: Resource auto-start has completed for server racnode2
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
Oracle Clusterware active version on the cluster is [12.1.0.2.0]. 
The cluster upgrade state is [NORMAL]. 
The cluster active patch level is [3536172590].

Finally use opatchauto to apply the rest patches other than GI patches, the already applied GI patches will be skipped smartly by opatchauto .

[root@racnode2]#$GI_HOME/OPatch/opatchauto apply /tmp/29698592

SUMMARY

  • All nodes should be applied with same patches with latest opatch ( patch id 6880880 )
  • opatchauto requires that CRS is able to be started up, and shutdown accordingly. If CRS cannot be started up, apply GI patches manually first, use “rootcrs.sh” or “rootcrs.pl” to correct the patch level on problematic node. After that CRS can be started up on all nodes.

How to Relink 12c Oracle GI / RAC Binaries after OS Upgrade

It is recommended to relink GI and RAC Home binaries after OS upgrading or patching.

This post demonstrates how to relink Oracle Grid Infrastructure ( GI ) and RAC Home binaries after OS upgrading or patching.

Subscribe to get access

Read more of this content when you subscribe today.