CRS-2883: Resource ‘ora.cluster_interconnect.haip’ failed during Clusterware stack start

When starting up second node CRS of 12.1.0.2 RAC, it failed with “CRS-2883” error.

root@racnode2]# /u01/app/12.1.0/grid/bin/crsctl start crs -wait
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'racnode2'
CRS-2672: Attempting to start 'ora.evmd' on 'racnode2'
CRS-2676: Start of 'ora.mdnsd' on 'racnode2' succeeded
CRS-2676: Start of 'ora.evmd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'racnode2'
CRS-2676: Start of 'ora.gpnpd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'racnode2'
CRS-2676: Start of 'ora.gipcd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'racnode2'
CRS-2676: Start of 'ora.cssdmonitor' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'racnode2'
CRS-2672: Attempting to start 'ora.diskmon' on 'racnode2'
CRS-2676: Start of 'ora.diskmon' on 'racnode2' succeeded
CRS-2676: Start of 'ora.cssd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode2'
CRS-2672: Attempting to start 'ora.ctssd' on 'racnode2'
CRS-2883: Resource 'ora.cluster_interconnect.haip' failed during Clusterware stack start.
CRS-4406: Oracle High Availability Services synchronous start failed.
CRS-4000: Command Start failed, or completed with errors.

Check ohasd_orarootagent_root.trc file:

...
..
.
2019-07-29 19:41:59.985295 : USRTHRD:3605423872: {0:9:3} HAIP: to set HAIP
2019-07-29 19:42:00.036464 : USRTHRD:3605423872: {0:9:3} HAIP: number of inf from clsinet -- 1
2019-07-29 19:42:00.037488 : CSSCLNT:3605423872: clssgsgrppridata: buffer too small - bufsize(4) < datasize(8)
2019-07-29 19:42:00.037795 : USRTHRD:3605423872: {0:9:3} CssGroup::getPrivateGroupData clssgsgrppridata() error, rc = 13
2019-07-29 19:42:00.037868 : USRTHRD:3605423872: {0:9:3} [NetHAMain] thread hit exception CssGroup::getPrivateGroupData clssgsgrppridata() error
2019-07-29 19:42:00.037881 : USRTHRD:3605423872: {0:9:3} [NetHAMain] thread stopping
...
..
.

CAUSE

Patch 29698592 (  Grid Infrastructure Patch Set Update 12.1.0.2.190716 ) has been applied onto first node, but it hasn’t been applied onto second node.

RESOLUTION

All nodes should be patched with same GI patches. Since CRS cannot be started up on 2nd node, so  opatchauto cannot be used.  All GI patches should be manually applied onto second node.

On second node:

1)Kill all the cluster processes manually.

2) Make GI_HOME read/write for GI owner “grid”

#chmod -R 775 $GI_HOME

3) Manually apply GI patches.

--OCW PSU
[grid@racnode2]$ $GI_HOME/OPatch/opatch apply -oh $GI_HOME -local /tmp/29698592/29509318

--ACFS PSU
[grid@racnode2]$ $GI_HOME/OPatch/opatch apply -oh $GI_HOME -local /tmp/29698592/29423125

--DBWLM PSU
[grid@racnode2]$ $GI_HOME/OPatch/opatch apply -oh $GI_HOME -local /tmp/29698592/26983807

--DB PSU
[grid@racnode2]$ $GI_HOME/OPatch/opatch apply -oh $GI_HOME -local /tmp/29698592/29494060

4) Starting up CRS still fails.

[root@racnode2]# $GI_HOME/bin/crsctl start crs -wait
CRS-6706: Oracle Clusterware Release patch level ('3536172590') does 
not match Software patch level ('0'). Oracle Clusterware cannot be started.
CRS-4000: Command Start failed, or completed with errors.

Still on node 2.

For 12.2, Execute “<GI_HOME>/crs/install/rootcrs.pl -prepatch”  “<GI_HOME>/crs/install/rootcrs.pl -postpatch” and as <root_user>  the patch level should be corrected.

For 12.1

[root@racnode2 ]# $GI_HOME/crs/install/rootcrs.sh -patch
Using configuration parameter file: /u01/app/12.1.0/grid/crs/install/crsconfig_params
2019/07/29 22:26:41 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.

2019/07/29 22:28:40 CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.

2019/07/29 22:28:56 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'

CRS-4123: Oracle High Availability Services has been started.
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'racnode2'
CRS-2672: Attempting to start 'ora.evmd' on 'racnode2'
CRS-2676: Start of 'ora.mdnsd' on 'racnode2' succeeded
CRS-2676: Start of 'ora.evmd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'racnode2'
CRS-2676: Start of 'ora.gpnpd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'racnode2'
CRS-2676: Start of 'ora.gipcd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'racnode2'
CRS-2676: Start of 'ora.cssdmonitor' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'racnode2'
CRS-2672: Attempting to start 'ora.diskmon' on 'racnode2'
CRS-2676: Start of 'ora.diskmon' on 'racnode2' succeeded
CRS-2676: Start of 'ora.cssd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode2'
CRS-2672: Attempting to start 'ora.ctssd' on 'racnode2'
CRS-2676: Start of 'ora.ctssd' on 'racnode2' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'racnode2' succeeded
CRS-2679: Attempting to clean 'ora.asm' on 'racnode2'
CRS-2681: Clean of 'ora.asm' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'racnode2'
CRS-2676: Start of 'ora.asm' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'racnode2'
CRS-2676: Start of 'ora.storage' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'racnode2'
CRS-2676: Start of 'ora.crf' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'racnode2'
CRS-2676: Start of 'ora.crsd' on 'racnode2' succeeded
CRS-6017: Processing resource auto-start for servers: racnode2
CRS-2672: Attempting to start 'ora.net1.network' on 'racnode2'
CRS-2676: Start of 'ora.net1.network' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.ons' on 'racnode2'
CRS-2673: Attempting to stop 'ora.racnode2.vip' on 'racnode1'
CRS-2677: Stop of 'ora.racnode2.vip' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.racnode2.vip' on 'racnode2'
CRS-2676: Start of 'ora.ons' on 'racnode2' succeeded
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'racnode1'
CRS-2676: Start of 'ora.racnode2.vip' on 'racnode2' succeeded
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'racnode1'
CRS-2672: Attempting to start 'ora.LISTENER.lsnr' on 'racnode2'
CRS-2677: Stop of 'ora.scan1.vip' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.scan1.vip' on 'racnode2'
CRS-2676: Start of 'ora.scan1.vip' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'racnode2'
CRS-2676: Start of 'ora.LISTENER.lsnr' on 'racnode2' succeeded
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.ractest.db' on 'racnode2'
CRS-2676: Start of 'ora.ractest.db' on 'racnode2' succeeded
CRS-6016: Resource auto-start has completed for server racnode2
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
Oracle Clusterware active version on the cluster is [12.1.0.2.0]. 
The cluster upgrade state is [NORMAL]. 
The cluster active patch level is [3536172590].

Finally use opatchauto to apply the rest patches other than GI patches, the already applied GI patches will be skipped smartly by opatchauto .

[root@racnode2]#$GI_HOME/OPatch/opatchauto apply /tmp/29698592

SUMMARY

  • All nodes should be applied with same patches with latest opatch ( patch id 6880880 )
  • opatchauto requires that CRS is able to be started up, and shutdown accordingly. If CRS cannot be started up, apply GI patches manually first, use “rootcrs.sh” or “rootcrs.pl” to correct the patch level on problematic node. After that CRS can be started up on all nodes.

CRS-6706: Oracle Clusterware Release patch level (‘nnnnnn’) does not match Software patch level (‘nnnnnn’)

“opatchauto” failed in the middle, try to rerun again, get “CRS-6706” error.

# /u01/app/12.1.0.2/grid/OPatch/opatchauto apply /tmp/12.1.0.2/27010872 -oh /u01/app/12.1.0.2/grid
...
..
.
Using configuration parameter file: /u01/app/12.1.0.2/grid/OPatch/auto/dbtmp/bootstrap_racnode1/patchwork/crs/install/crsconfig_params
CRS-6706: Oracle Clusterware Release patch level ('173535486') does not match Software patch level ('2039526626'). Oracle Clusterware cannot be started.
CRS-4000: Command Start failed, or completed with errors.
2018/01/23 16:29:02 CLSRSC-117: Failed to start Oracle Clusterware stack


After fixing the cause of failure Run opatchauto resume

]
OPATCHAUTO-68061: The orchestration engine failed.
OPATCHAUTO-68061: The orchestration engine failed with return code 1
OPATCHAUTO-68061: Check the log for more details.
OPatchAuto failed.

OPatchauto session completed at Tue Jan 23 16:29:04 2018
Time taken to complete the session 1 minute, 39 seconds

opatchauto failed with error code 42

On racnode1:

$/u01/app/12.1.0.2/grid/bin/kfod op=patches
---------------
List of Patches
===============
19941482
19941477
19694308
19245012
26925218   <---- Does not exist on node2

$/u01/app/12.1.0.2/grid/bin/kfod op=patchlvl
2039526626

On racnode2:

$/u01/app/12.1.0.2/grid/bin/kfod op=patches
---------------
List of Patches
===============
19941482
19941477
19694308
19245012

$/u01/app/12.1.0.2/grid/bin/kfod op=patchlvl
2039526626

We can see patch 26925218 has been applied onto racnode 1, the solution is :

1) Rollback this patch ( 26925218), and run “opatchauto” again to finish the patching successfully. then everything should be fine.

OR

2) Manually complete all the left patches in the GI. After this everything is fine.

In 12c, GI home must have identical patches for the clusterware to start unless during rolling patching.
After applied the same patches on all nodes, GI started fine.

———-

Another situation you might meet is all nodes have same patches but ‘opatch lsinventory’ shows the different patch level:

For example , on racnode1:

$ /u01/app/12.1.0.2/grid/bin/kfod op=patches
---------------
List of Patches
===============
11111111
22222222
33333333

$ /u01/app/12.1.0.2/grid/bin/kfod op=patchlvl
-------------------
Current Patch level
===================
8888888888

Node2

$ /u01/app/12.1.0.2/grid/bin/kfod op=patches
---------------
List of Patches
===============
11111111
22222222
33333333

$ /u01/app/12.1.0.2/grid/bin/kfod op=patchlvl
-------------------
Current Patch level
===================
9999999999

However opatch lsinventory shows the different patch level:

Patch level status of Cluster nodes :

Patching Level Nodes
-------------- -----
8888888888 node1                    ====>> different patch level
9999999999 node2

For 12.1.0.1/2:

Execute”/u01/app/12.1.0.2/grid/crs/install/rootcrs.sh -patch” as root user on the problematic node and the patch level should be corrected.

For 12.2

Execute”<GI_HOME>/crs/install/rootcrs.pl -prepatch”  “<GI_HOME>/crs/install/rootcrs.pl -postpatch”and as <root_user> on the problematic node and the patch level should be corrected

REFERENCES:

RS-6706: Oracle Clusterware Release patch level (‘nnn’) does not match Software patch level (‘mmm’) (Doc ID 1639285.1)