see its trace file – Make DBA Life Easy

Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_m000_39281.trc (incident=198012): ORA-00603: ORACLE server session terminated by fatal error ORA-27504: IPC error creating OSD context ORA-27300: OS system dependent operation:sendmsg failed with status: 105 ORA-27301: OS failure message: No buffer space available ORA-27302: failure occurred at: sskgxpsnd2 opidrv aborting process M000 ospid (39281) as a result of ORA-603 2019-03-12T01:00:18.919871+11:00 Process m000 died, see its trace file

... .. . SKGXP:[7f620a621930.0]{0}: SKGXPVFYNET: Socket self-test could not verify successful transmission of 32768 bytes (mtype 61). SKGXP:[7f620a621930.1]{0}: The network is required to support UDP protocol sends of this size. Socket is bound to 169.254.174.228. SKGXP:[7f620a621930.2]{0}: phase 'send', 0 tries, 100 loops, 13590 ms (last) struct ksxpp * ksxppg_ [0x7f620a68a770, 0x7f6204ff1350) = 0x7f6204ff1348 ... .. .

Workaround

1) Shrink database instance SGA size to give more memory back to OS.
After OS gets more available memory, the issue is gone.

Physical memory utilisation( %):

2) Oracle has reported this issue in Doc ID 2041723.1. by changing the MTU of the loopback interface, and changing the value of kernel parameter min_free_kbytes.

a) Lower MTU to 16436 by adding following to /etc/sysconfig/network-scripts/ifcfg-lo.

MTU=16436

Then restart the network service.

# systemctl restart network.service

b) Increase the value of vm.min_free_kbytes to 0.4%of the total physical memory of the server. This can be done by adding following to /etc/sysctl.conf.

vm.min_free_kbytes = 2097152

and then run below command to make it effective:

#sysctl -p

ORACLE_HOME is shared by both RAC instance A and RAC instance B. Both instance A and B are shut down, then ORACLE_HOME is patched, the ORACLE_HOME/bin/oracle binary will be as below just after the patching:

$ id oracle
uid=122(oracle) gid=202(oinstall) groups=202(oinstall),201(dba)

$ id grid
uid=518(grid) gid=202(oinstall) groups=202(oinstall),201(dba)

$ ls -ltr /dev/oracleasm/disks/
total 0
brw-rw---- 1 grid dba 253, 16 Jul 30 16:38 OCR_VOTE01
brw-rw---- 1 grid dba 253, 50 Jul 30 16:38 ASM_FRA01
brw-rw---- 1 grid dba 253, 57 Jul 30 16:38 ASM_DISK01
..
.

$ ls -ltr /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle
-rwsr-s--x 1 oracle oinstall 228655023 May 17 11:11 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

Manually start up instance A by using sqlplus without using ‘srvctl start instance’, and check oracle binary gid( oinstall) still the same:

$ sqlplus / as sysdba

...
..
.
Connected to an idle instance.

SQL> startup
ORACLE instance started.
...
..
.
Database mounted.
Database opened.
SQL> exit
$

$ ls -ltr /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle
-rwsr-s--x 1 oracle oinstall 228655023 May 17 11:11 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

Then start up instance B through clusterware by using ‘srvctl start ‘, and noticed clusterware changed the group of the oracle binary:

$srvctl start database -d TESTB

$ ls -ltr /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle
-rwsr-s--x 1 oracle dba 228655023 May 17 11:11 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

The instance A got below errors in trace files and alert log:

Alert.log:

Process m000 died, see its trace file
Wed Jul 25 16:37:42 2018
Process m000 died, see its trace file
..
.
Process q000 died, see its trace file
Wed Jul 25 16:41:54 2018
Process W000 died, see its trace file
Process W000 died, see its trace file

Trace file:

$ cat TESTA1_m000_7813.trc
..
.
/u01/app/oracle/diag/rdbms/testa/TESTA1/trace/TESTA1_m000_7813.trc
ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1

*** 2018-07-25 16:35:55.662
Died during process startup with error 27140 (seq=39770)
OPIRIP: Uncaught error 27140. Error stack:
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 202 (oinstall), 
current egid = 201 (dba)

crsd_oraagent_oracle trace file shows clusterware is changed RAC $ORACLE_HOME/bin/oracle binary by using “setasmgidwrap” while instance B is started up by using ‘srvctl start ‘ by clusterware.

crsd_oraagent_oracle_42.trc:2018-07-25 16:34:31.584046 :CLSDYNAM:2785666816: [ora.testb.db]{1:44997:10712} [start] command = '/u01/app/12.1.0.2/grid/bin/setasmgidwrap oracle_binary_path=/u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle'
crsd_oraagent_oracle_42.trc:2018-07-25 16:34:31.584068 :CLSDYNAM:2785666816: [ora.testb.db]{1:44997:10712} [start] Utils:execCmd action = 1 flags = 6 ohome = /u01/app/12.1.0.2/grid cmdname = setasmgidwrap.
crsd_oraagent_oracle_42.trc:2018-07-25 16:43:38.726230 :CLSDYNAM:2128570112: [ora.testb.db]{1:44997:10835} [start] command = '/u01/app/12.1.0.2/grid/bin/setasmgidwrap oracle_binary_path=/u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle'
crsd_oraagent_oracle_42.trc:2018-07-25 16:43:38.726270 :CLSDYNAM:2128570112: [ora.testb.db]{1:44997:10835} [start] Utils:execCmd action = 1 flags = 6 ohome = /u01/app/12.1.0.2/grid cmdname = setasmgidwrap.

CONCLUSION:

Always shutdown or startup RAC instances by using ‘srvctl start” through clusterware. Otherwise you might get ORA-27140 ORA-27300 ORA-27301 ORA-27302 ORA-273003 errors.

Tag: see its trace file

ORA-00603 ORA-27504 ORA-27300 ORA-27301 ORA-27302 ORA-603 in ASM alert log file

Workaround

ORA-27140 ORA-27300 ORA-27301 ORA-27302 ORA-273003 When Start Up or Shut Down RAC Instance

CONCLUSION: