CRS-2883: Resource ‘ora.cluster_interconnect.haip’ failed during Clusterware stack start

When starting up second node CRS of 12.1.0.2 RAC, it failed with “CRS-2883” error.

root@racnode2]# /u01/app/12.1.0/grid/bin/crsctl start crs -wait
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'racnode2'
CRS-2672: Attempting to start 'ora.evmd' on 'racnode2'
CRS-2676: Start of 'ora.mdnsd' on 'racnode2' succeeded
CRS-2676: Start of 'ora.evmd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'racnode2'
CRS-2676: Start of 'ora.gpnpd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'racnode2'
CRS-2676: Start of 'ora.gipcd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'racnode2'
CRS-2676: Start of 'ora.cssdmonitor' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'racnode2'
CRS-2672: Attempting to start 'ora.diskmon' on 'racnode2'
CRS-2676: Start of 'ora.diskmon' on 'racnode2' succeeded
CRS-2676: Start of 'ora.cssd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode2'
CRS-2672: Attempting to start 'ora.ctssd' on 'racnode2'
CRS-2883: Resource 'ora.cluster_interconnect.haip' failed during Clusterware stack start.
CRS-4406: Oracle High Availability Services synchronous start failed.
CRS-4000: Command Start failed, or completed with errors.

Check ohasd_orarootagent_root.trc file:

...
..
.
2019-07-29 19:41:59.985295 : USRTHRD:3605423872: {0:9:3} HAIP: to set HAIP
2019-07-29 19:42:00.036464 : USRTHRD:3605423872: {0:9:3} HAIP: number of inf from clsinet -- 1
2019-07-29 19:42:00.037488 : CSSCLNT:3605423872: clssgsgrppridata: buffer too small - bufsize(4) < datasize(8)
2019-07-29 19:42:00.037795 : USRTHRD:3605423872: {0:9:3} CssGroup::getPrivateGroupData clssgsgrppridata() error, rc = 13
2019-07-29 19:42:00.037868 : USRTHRD:3605423872: {0:9:3} [NetHAMain] thread hit exception CssGroup::getPrivateGroupData clssgsgrppridata() error
2019-07-29 19:42:00.037881 : USRTHRD:3605423872: {0:9:3} [NetHAMain] thread stopping
...
..
.

CAUSE

Patch 29698592 (  Grid Infrastructure Patch Set Update 12.1.0.2.190716 ) has been applied onto first node, but it hasn’t been applied onto second node.

RESOLUTION

All nodes should be patched with same GI patches. Since CRS cannot be started up on 2nd node, so  opatchauto cannot be used.  All GI patches should be manually applied onto second node.

On second node:

1)Kill all the cluster processes manually.

2) Make GI_HOME read/write for GI owner “grid”

#chmod -R 775 $GI_HOME

3) Manually apply GI patches.

--OCW PSU
[grid@racnode2]$ $GI_HOME/OPatch/opatch apply -oh $GI_HOME -local /tmp/29698592/29509318

--ACFS PSU
[grid@racnode2]$ $GI_HOME/OPatch/opatch apply -oh $GI_HOME -local /tmp/29698592/29423125

--DBWLM PSU
[grid@racnode2]$ $GI_HOME/OPatch/opatch apply -oh $GI_HOME -local /tmp/29698592/26983807

--DB PSU
[grid@racnode2]$ $GI_HOME/OPatch/opatch apply -oh $GI_HOME -local /tmp/29698592/29494060

4) Starting up CRS still fails.

[root@racnode2]# $GI_HOME/bin/crsctl start crs -wait
CRS-6706: Oracle Clusterware Release patch level ('3536172590') does 
not match Software patch level ('0'). Oracle Clusterware cannot be started.
CRS-4000: Command Start failed, or completed with errors.

Still on node 2.

For 12.2, Execute “<GI_HOME>/crs/install/rootcrs.pl -prepatch”  “<GI_HOME>/crs/install/rootcrs.pl -postpatch” and as <root_user>  the patch level should be corrected.

For 12.1

[root@racnode2 ]# $GI_HOME/crs/install/rootcrs.sh -patch
Using configuration parameter file: /u01/app/12.1.0/grid/crs/install/crsconfig_params
2019/07/29 22:26:41 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.

2019/07/29 22:28:40 CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.

2019/07/29 22:28:56 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'

CRS-4123: Oracle High Availability Services has been started.
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'racnode2'
CRS-2672: Attempting to start 'ora.evmd' on 'racnode2'
CRS-2676: Start of 'ora.mdnsd' on 'racnode2' succeeded
CRS-2676: Start of 'ora.evmd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'racnode2'
CRS-2676: Start of 'ora.gpnpd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'racnode2'
CRS-2676: Start of 'ora.gipcd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'racnode2'
CRS-2676: Start of 'ora.cssdmonitor' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'racnode2'
CRS-2672: Attempting to start 'ora.diskmon' on 'racnode2'
CRS-2676: Start of 'ora.diskmon' on 'racnode2' succeeded
CRS-2676: Start of 'ora.cssd' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode2'
CRS-2672: Attempting to start 'ora.ctssd' on 'racnode2'
CRS-2676: Start of 'ora.ctssd' on 'racnode2' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'racnode2' succeeded
CRS-2679: Attempting to clean 'ora.asm' on 'racnode2'
CRS-2681: Clean of 'ora.asm' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'racnode2'
CRS-2676: Start of 'ora.asm' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'racnode2'
CRS-2676: Start of 'ora.storage' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'racnode2'
CRS-2676: Start of 'ora.crf' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'racnode2'
CRS-2676: Start of 'ora.crsd' on 'racnode2' succeeded
CRS-6017: Processing resource auto-start for servers: racnode2
CRS-2672: Attempting to start 'ora.net1.network' on 'racnode2'
CRS-2676: Start of 'ora.net1.network' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.ons' on 'racnode2'
CRS-2673: Attempting to stop 'ora.racnode2.vip' on 'racnode1'
CRS-2677: Stop of 'ora.racnode2.vip' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.racnode2.vip' on 'racnode2'
CRS-2676: Start of 'ora.ons' on 'racnode2' succeeded
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'racnode1'
CRS-2676: Start of 'ora.racnode2.vip' on 'racnode2' succeeded
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'racnode1' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'racnode1'
CRS-2672: Attempting to start 'ora.LISTENER.lsnr' on 'racnode2'
CRS-2677: Stop of 'ora.scan1.vip' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.scan1.vip' on 'racnode2'
CRS-2676: Start of 'ora.scan1.vip' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'racnode2'
CRS-2676: Start of 'ora.LISTENER.lsnr' on 'racnode2' succeeded
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'racnode2' succeeded
CRS-2672: Attempting to start 'ora.ractest.db' on 'racnode2'
CRS-2676: Start of 'ora.ractest.db' on 'racnode2' succeeded
CRS-6016: Resource auto-start has completed for server racnode2
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
Oracle Clusterware active version on the cluster is [12.1.0.2.0]. 
The cluster upgrade state is [NORMAL]. 
The cluster active patch level is [3536172590].

Finally use opatchauto to apply the rest patches other than GI patches, the already applied GI patches will be skipped smartly by opatchauto .

[root@racnode2]#$GI_HOME/OPatch/opatchauto apply /tmp/29698592

SUMMARY

  • All nodes should be applied with same patches with latest opatch ( patch id 6880880 )
  • opatchauto requires that CRS is able to be started up, and shutdown accordingly. If CRS cannot be started up, apply GI patches manually first, use “rootcrs.sh” or “rootcrs.pl” to correct the patch level on problematic node. After that CRS can be started up on all nodes.

Is MGMTDB Patched While Applying GI PSU/RU/RUR ?

MGMTDB will be patched as well at the same time when GI is upgraded or patched ( PSU, RU/RUR, etc).

Yes, while applying PSU patch onto GI, we can see GIMR database “-MGMTDB” is patched as well. It can be confirmed from logs.

a) Logs from “opatchauto apply” shows MGMTDB is patched while applying GI PSU:

System initialization log file is /u01/app/12.1.0.2/grid/cfgtoollogs/opatchautodb/systemconfig2016-09-19_12-46-41PM.log.

Session log file is /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/opatchauto2016-09-19_12-46-55PM.log
The id for this session is 2G1F
[init:init] Executing OPatchAutoBinaryAction action on home /u01/app/12.1.0.2/grid

Executing OPatch prereq operations to verify patch applicability on CRS Home........

[init:init] OPatchAutoBinaryAction action completed on home /u01/app/12.1.0.2/grid successfully
[init:init] Executing GIRACPrereqAction action on home /u01/app/12.1.0.2/grid

Executing prereq operations before applying on CRS Home........

[init:init] GIRACPrereqAction action completed on home /u01/app/12.1.0.2/grid successfully
[shutdown:shutdown] Executing GIShutDownAction action on home /u01/app/12.1.0.2/grid

Performing prepatch operations on CRS Home........

Prepatch operation log file location: /u01/app/12.1.0.2/grid/cfgtoollogs/crsconfig/crspatch_racnode2_2016-09-19_12-47-24AM.log

[shutdown:shutdown] GIShutDownAction action completed on home /u01/app/12.1.0.2/grid successfully
[offline:binary-patching] Executing OPatchAutoBinaryAction action on home /u01/app/12.1.0.2/grid

Start applying binary patches on CRS Home........

[offline:binary-patching] OPatchAutoBinaryAction action completed on home /u01/app/12.1.0.2/grid successfully
[startup:startup] Executing GIStartupAction action on home /u01/app/12.1.0.2/grid

Performing postpatch operations on CRS Home........

Postpatch operation log file location: /u01/app/12.1.0.2/grid/cfgtoollogs/crsconfig/crspatch_racnode2_2016-09-19_12-50-36AM.log

[startup:startup] GIStartupAction action completed on home /u01/app/12.1.0.2/grid successfully
[finalize:finalize] Executing OracleHomeLSInventoryGrepAction action on home /u01/app/12.1.0.2/grid

Verifying patches applied on CRS Home.

[finalize:finalize] OracleHomeLSInventoryGrepAction action completed on home /u01/app/12.1.0.2/grid successfully
OPatchAuto successful.

--------------------------------Summary--------------------------------

Patching is completed successfully. Please find the summary as follows:

Host:racnode2
CRS Home:/u01/app/12.1.0.2/grid
Summary:

==Following patches were SKIPPED:

Patch: /u01/app/software/23615308/23177536
Reason: This patch is not applicable to this specified target type - "cluster"


==Following patches were SUCCESSFULLY applied:

Patch: /u01/app/software/23615308/21436941
Log: /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/core/opatch/opatch2016-09-19_12-47-55PM_1.log

Patch: /u01/app/software/23615308/23054246
Log: /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/core/opatch/opatch2016-09-19_12-47-55PM_1.log

Patch: /u01/app/software/23615308/23054327
Log: /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/core/opatch/opatch2016-09-19_12-47-55PM_1.log

Patch: /u01/app/software/23615308/23054341
Log: /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/core/opatch/opatch2016-09-19_12-47-55PM_1.log

b) Check Postpatch operation log file /u01/app/12.1.0.2/grid/cfgtoollogs/crsconfig/crspatch_racnode2_2016-09-19_12-50-36AM.log

$ tail -120 /u01/app/12.1.0.2/grid/cfgtoollogs/crsconfig/crspatch_racnode2_2016-09-19_12-50-36AM.log
...

....


2016-09-19 12:52:30: Executing cmd: /u01/app/12.1.0.2/grid/bin/srvctl config mgmtdb
2016-09-19 12:52:31: Command output:
> Database unique name: _mgmtdb
> Database name:
> Oracle home: <CRS home>
> Oracle user: grid
> Spfile: +OCR_VOTE/_MGMTDB/PARAMETERFILE/spfile.268.922884353
> Password file:
> Domain:
> Start options: open
> Stop options: immediate
> Database role: PRIMARY
> Management policy: AUTOMATIC
> Type: Management
> PDB name: racnode-cluster
> PDB service: racnode-cluster
> Cluster name: racnode-cluster
> Database instance: -MGMTDB
>End Command output
2016-09-19 12:52:31: isMgmtdbConfigured: 1
2016-09-19 12:52:31: setting ORAASM_UPGRADE to 1
2016-09-19 12:52:31: Executing cmd: /u01/app/12.1.0.2/grid/bin/crsctl query crs softwarepatch racnode2
2016-09-19 12:52:31: Command output:
> Oracle Clusterware patch level on node racnode2 is [3696455212].
>End Command output
2016-09-19 12:52:31: Oracle Clusterware patch level on node 'racnode2' is [3696455212]
2016-09-19 12:52:31: setting ORAASM_UPGRADE to 1
2016-09-19 12:52:31: Executing cmd: /u01/app/12.1.0.2/grid/bin/crsctl query crs softwarepatch racnode1
2016-09-19 12:52:31: Command output:
> Oracle Clusterware patch level on node racnode1 is [3696455212].
>End Command output
2016-09-19 12:52:31: Oracle Clusterware patch level on node 'racnode1' is [3696455212]
2016-09-19 12:52:31: The local node has the same software patch level [3696455212] as remote node 'racnode1'
2016-09-19 12:52:31: Executing cmd: /u01/app/12.1.0.2/grid/bin/crsctl query crs releasepatch
2016-09-19 12:52:31: Command output:
> Oracle Clusterware release patch level is [3696455212] and the complete list of patches [19769480 20299023 20831110 21359755 21436941 21948354 22291127 23054246 23054327 23054341 ] have been applied on the local node.
>End Command output
2016-09-19 12:52:31: Oracle Clusterware release patch level is [3696455212]
2016-09-19 12:52:31: setting ORAASM_UPGRADE to 1
2016-09-19 12:52:31: Executing cmd: /u01/app/12.1.0.2/grid/bin/crsctl query crs activeversion -f
2016-09-19 12:52:31: Command output:
> Oracle Clusterware active version on the cluster is [12.1.0.2.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [3696455212].
>End Command output
2016-09-19 12:52:31: Oracle Clusterware active patch level is [3696455212]
2016-09-19 12:52:31: The Clusterware active patch level [3696455212] has been updated to [3696455212].
2016-09-19 12:52:31: Postpatch: isLastNode is 1
2016-09-19 12:52:31: Last node: enable Mgmt DB globally
2016-09-19 12:52:31: Invoking "/u01/app/12.1.0.2/grid/bin/srvctl enable mgmtdb"
2016-09-19 12:52:31: trace file=/u01/app/12.1.0.2/grid/cfgtoollogs/crsconfig/srvmcfg1.log
2016-09-19 12:52:31: Running as user grid: /u01/app/12.1.0.2/grid/bin/srvctl enable mgmtdb
2016-09-19 12:52:31: Invoking "/u01/app/12.1.0.2/grid/bin/srvctl enable mgmtdb" as user "grid"
2016-09-19 12:52:31: Executing /bin/su grid -c "/u01/app/12.1.0.2/grid/bin/srvctl enable mgmtdb"
2016-09-19 12:52:31: Executing cmd: /bin/su grid -c "/u01/app/12.1.0.2/grid/bin/srvctl enable mgmtdb"
2016-09-19 12:52:32: Modifying 10.2 resources
2016-09-19 12:52:32: Invoking "/u01/app/12.1.0.2/grid/bin/srvctl upgrade model -pretb"
2016-09-19 12:52:32: trace file=/u01/app/12.1.0.2/grid/cfgtoollogs/crsconfig/srvmcfg2.log
2016-09-19 12:52:32: Executing /u01/app/12.1.0.2/grid/bin/srvctl upgrade model -pretb
2016-09-19 12:52:32: Executing cmd: /u01/app/12.1.0.2/grid/bin/srvctl upgrade model -pretb
2016-09-19 12:52:32: 'srvctl upgrade model -pretb' ... success
2016-09-19 12:52:32: Executing cmd: /u01/app/12.1.0.2/grid/bin/srvctl status mgmtdb -S 1
2016-09-19 12:52:32: Command output:
> #@=result[0]: dbunique_name={_mgmtdb} inst_name={-MGMTDB} node_name={racnode2} up={true} state_details={Open} internal_state={STABLE}
>End Command output
2016-09-19 12:52:32: Mgmtdb is running on node: racnode2; local node: racnode2
2016-09-19 12:52:32: Mgmtdb is running on the local node
2016-09-19 12:52:32: Starting to patch Mgmt DB ...
2016-09-19 12:52:32: Invoking "/u01/app/12.1.0.2/grid/sqlpatch/sqlpatch -db -MGMTDB"
2016-09-19 12:52:32: Running as user grid: /u01/app/12.1.0.2/grid/sqlpatch/sqlpatch -db -MGMTDB
2016-09-19 12:52:32: Invoking "/u01/app/12.1.0.2/grid/sqlpatch/sqlpatch -db -MGMTDB" as user "grid"
2016-09-19 12:52:32: Executing /bin/su grid -c "/u01/app/12.1.0.2/grid/sqlpatch/sqlpatch -db -MGMTDB"
2016-09-19 12:52:32: Executing cmd: /bin/su grid -c "/u01/app/12.1.0.2/grid/sqlpatch/sqlpatch -db -MGMTDB"
2016-09-19 12:53:39: Command output:
> SQL Patching tool version 12.1.0.2.0 on Mon Sep 19 12:52:32 2016
> Copyright (c) 2016, Oracle. All rights reserved.
>
> Connecting to database...OK
> Note: Datapatch will only apply or rollback SQL fixes for PDBs
> that are in an open state, no patches will be applied to closed PDBs.
> Please refer to Note: Datapatch: Database 12c Post Patch SQL Automation
> (Doc ID 1585822.1)
> Determining current state...done
> Adding patches to installation queue and performing prereq checks...done
> Installation queue:
> For the following PDBs: CDB$ROOT PDB$SEED RACNODE-CLUSTER
> Nothing to roll back
> The following patches will be applied:
> 23054246 (Database Patch Set Update : 12.1.0.2.160719 (23054246))
>
> Installing patches...
> Patch installation complete. Total patches installed: 3
>
> Validating logfiles...done
> SQL Patching tool complete on Mon Sep 19 12:53:39 2016
>End Command output
2016-09-19 12:53:39: Successfully patched Mgmt DB
2016-09-19 12:53:39: Invoking "/u01/app/12.1.0.2/grid/bin/cluutil -ckpt -oraclebase /u01/app/grid -writeckpt -name ROOTCRS_POSTPATCH -state SUCCESS"
2016-09-19 12:53:39: trace file=/u01/app/grid/crsdata/racnode2/crsconfig/cluutil7.log
2016-09-19 12:53:39: Running as user grid: /u01/app/12.1.0.2/grid/bin/cluutil -ckpt -oraclebase /u01/app/grid -writeckpt -name ROOTCRS_POSTPATCH -state SUCCESS
2016-09-19 12:53:39: s_run_as_user2: Running /bin/su grid -c ' echo CLSRSC_START; /u01/app/12.1.0.2/grid/bin/cluutil -ckpt -oraclebase /u01/app/grid -writeckpt -name ROOTCRS_POSTPATCH -state SUCCESS '
2016-09-19 12:53:39: Removing file /tmp/filei2kicY
2016-09-19 12:53:39: Successfully removed file: /tmp/filei2kicY
2016-09-19 12:53:39: pipe exit code: 0
2016-09-19 12:53:39: /bin/su successfully executed

2016-09-19 12:53:39: Succeeded in writing the checkpoint:'ROOTCRS_POSTPATCH' with status:SUCCESS
[+ASM2] grid@racnode2:~$

c) Querying PDB shows MGMTDB has been patched as well.

-- check CDB$ROOT

SQL> set pagesize 120
SQL> set linesize 180
SQL> select * from dba_registry_sqlpatch;

PATCH_ID PATCH_UID VERSION FLAGS ACTION STATUS ACTION_TIME
---------- ---------- -------------------- ---------- --------------- --------------- ---------------------------------------------------------------------------
DESCRIPTION BUNDLE_SERIES BUNDLE_ID
---------------------------------------------------------------------------------------------------- ------------------------------ ----------
BUNDLE_DATA
--------------------------------------------------------------------------------
LOGFILE
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

23054246 20213895 12.1.0.2 NB APPLY SUCCESS 19-SEP-16 12.53.38.034518 PM
Database Patch Set Update : 12.1.0.2.160719 (23054246) PSU 160719
<bundledata version="12.1.0.2.1" series="Patch Set Update">
 <bundle id="1" des
/u01/app/grid/cfgtoollogs/sqlpatch/23054246/20213895/23054246_apply__MGMTDB_CDBROOT_2016Sep19_12_53_18.log



-- check PDB RACNODE-CLUSTER

SQL> alter session set container=racnode-cluster;

SQL> select * from dba_registry_sqlpatch;

PATCH_ID PATCH_UID VERSION FLAGS ACTION STATUS ACTION_TIME
---------- ---------- -------------------- ---------- --------------- --------------- ---------------------------------------------------------------------------
DESCRIPTION BUNDLE_SERIES BUNDLE_ID
---------------------------------------------------------------------------------------------------- ------------------------------ ----------
BUNDLE_DATA
--------------------------------------------------------------------------------
LOGFILE
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
23054246 20213895 12.1.0.2 NB APPLY SUCCESS 19-SEP-16 12.53.38.457190 PM
Database Patch Set Update : 12.1.0.2.160719 (23054246) PSU 160719
<bundledata version="12.1.0.2.1" series="Patch Set Update">
 <bundle id="1" des
/u01/app/grid/cfgtoollogs/sqlpatch/23054246/20213895/23054246_apply__MGMTDB_RACNODE-CLUSTER_2016Sep19_12_53_32.log