EXACC: Cluster Services Failed to Start Up after VM Machine Rebooted

After RAC Cluster VM machines rebooted, the cluster services are not up with the following messages in alert.log:

2021-07-12 18:48:20.741 [CLSECHO(27566)]ACFS-9327: Verifying ADVM/ACFS devices.
2021-07-12 18:48:20.791 [CLSECHO(27631)]ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
2021-07-12 18:48:20.855 [CLSECHO(27710)]ACFS-9156: Detecting control device '/dev/ofsctl'.
2021-07-12 18:48:21.469 [CLSECHO(28208)]ACFS-9294: updating file /etc/sysconfig/oracledrivers.conf
2021-07-12 18:48:21.530 [CLSECHO(28254)]ACFS-9322: completed
2021-07-12 18:48:23.302 [OSYSMOND(30089)]CRS-8500: Oracle Clusterware OSYSMOND process is starting with operating system process ID 30089
2021-07-12 18:48:23.274 [CSSDMONITOR(30080)]CRS-8500: Oracle Clusterware CSSDMONITOR process is starting with operating system process ID 30080
2021-07-12 18:48:23.698 [CSSDAGENT(30498)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 30498
2021-07-12 18:48:25.138 [OCSSD(30929)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 30929
2021-07-12 18:48:26.213 [OCSSD(30929)]CRS-1713: CSSD daemon is started in hub mode
2021-07-12 18:48:27.400 [OCSSD(30929)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/grid/diag/crs/racnode1/crs/trace/ocssd.trc
2021-07-12 18:48:27.416 [OCSSD(30929)]CRS-1652: Starting clean up of CRSD resources.
2021-07-12 18:48:27.421 [OCSSD(30929)]CRS-1653: The clean up of the CRSD resources failed.
2021-07-12T18:48:29.448153+10:00
Errors in file /u01/app/grid/diag/crs/racnode1/crs/trace/ocssd.trc  (incident=41):
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: /u01/app/grid/diag/crs/racnode1/crs/incident/incdir_41/ocssd_i41.trc

2021-07-12 18:48:29.436 [OCSSD(30929)]CRS-8503: Oracle Clusterware process OCSSD with operating system process ID 30929 experienced fatal signal or exception code 6.
2021-07-12 18:48:30.755 [CSSDMONITOR(38238)]CRS-8500: Oracle Clusterware CSSDMONITOR process is starting with operating system process ID 38238
2021-07-12 18:48:31.229 [CSSDAGENT(38659)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 38659

ocssd.trc:

2021-07-12 18:48:26.337 :    CSSD:1474272576: [     INFO] clssscUpdateInitState: Set state to 0x008c1e46, based on prior state of 0x008c1e06 and requested change of 0x00000040
2021-07-12 18:48:26.337 :    CSSD:1474272576: [     INFO] clssscGetParameterProfile: buffer passed for parameter ASM discovery (3) is too short, required 26, passed 20
2021-07-12 18:48:26.337 :    CSSD:1474272576: [     INFO] clssnmReadDiscoveryProfile: voting file discovery string(o/*/DATAC1_*,o/*/RECOC1_*)
2021-07-12 18:48:26.337 :    CSSD:1474272576: [     INFO] clssnkInit: NK generic layer initializing.
2021-07-12 18:48:26.337 :    CSSD:1474272576: [     INFO] clssnkipmiInit: start
2021-07-12 18:48:26.346 :    CSSD:1474272576: [     INFO] clssnkipmiInit: binary path none
2021-07-12 18:48:26.346 :    CSSD:1474272576: [     INFO] clssnkipmiInit: ipmi mech 0
2021-07-12 18:48:26.347 :    CSSD:4023375616: [     INFO] clssscthrdmain: Starting thread clssnmvDDiscThread
2021-07-12 18:48:26.390 :   SKGFD:4023375616: ERROR: -8(OS Error -1 (open,sskgxplp,Invalid protocol requested (2) or protocol not loaded.,Error 0)
)
2021-07-12 18:48:26.390 :   SKGFD:4023375616: ERROR: -10(OSS Operation oss_initialize failed with error 4 [Network initialization failed]
)
2021-07-12 18:48:26.390 :   SKGFD:4023375616: kgfkInit: Error lib:3 ret:-10

2021-07-12 18:48:26.399 :    CSSD:4023375616: [    ERROR] clsssnmvDDiscThread: Unable to create clsf context
2021-07-12 18:48:26.399 :    CSSD:4023375616: [     INFO] clssnmCheckForVfFailure: no voting file found

REASON and SOLUTION

Subscribe to get access

Read more of this content when you subscribe today.

How to run an ILOM Snapshot on a Sun/Oracle X86 System from the service processor CLI interface

ILOM snapshot is the first thing Oracle support normally asks for.

This post demonstrates how to run an ILOM snapshot on a cell server ( xx.xx.xx.xx ), on which one or more flashdisks have failed. This snapshot is required by Oracle support for diagnosing the issue.

1) Log in to the ILOM CLI interface.

# ssh xx.xx.xx.xx-ilom
Password:

Oracle(R) Integrated Lights Out Manager
Version 3.1.2.20.c r86871
Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved.

->

2) After the ‘->’ prompt, type command in below:

-> set /SP/diag/snapshot dataset=normal
Set 'dataset' to 'normal'

3) Type the following command:

Here password contains special characters, so  double quotes around the password are needed.

IP ( 10.10.10.17 ) could be either cell server or database server IP, as long as the user/password works on it.

-> set /SP/diag/snapshot dump_uri=sftp://testuser:"Password#"@10.10.10.17/tmp
Set 'dump_uri' to 'sftp://testuser:Password#@10.10.10.17/tmp'

4) cd to the snapshot directory and view the status. It shows “Running” initially.

-> cd /SP/diag/snapshot
/SP/diag/snapshot

-> show

/SP/diag/snapshot
 Targets:

Properties:
 dataset = normal
 dump_uri = (Cannot show property)
 encrypt_output = false
 result = Running

Commands:
 cd
 set
 show

5) Wait for the snapshot process to complete. It may take several minutes.
Continue to check until the status is shows ‘Snapshot Complete’
Do not use, access, view, copy or move the snapshot file until it has completed.

-> show

/SP/diag/snapshot
 Targets:

Properties:
 dataset = normal
 dump_uri = (Cannot show property)
 encrypt_output = false
 result = Collecting data into 
 sftp://testuser:*****@10.10.10.17/tmp/xx.xx.xx.xx-ilom_1152FMM0C1
 _2016-04-17T10-59-15.zip
 Snapshot Complete.
 Done.

Commands:
 cd
 set
 show

6) exit the CLI interface and find your snapshot in the directory you specified.

->exit
Connection to xx.xx.xx.xx-ilom closed.

$ssh testuser@10.10.10.17
testuser@10.10.10.17's password: 

$ls -ltr /tmp/xx.xx.xx.xx-ilom_1152FMM0C1_2016-04-17T10-59-15.zip

-rw-r--r-- 1 testuser dba 1129640 Apr 17 04:03 xx.xx.xx.xx-ilom_1152FMM0C1_2016-04-17T10-59-15.zip

Reference :

How to run an ILOM Snapshot on a Sun/Oracle X86 System (Doc ID 1448069.1)