GI Postpatch Command Fails Due To rpm Database Corruption At The OS Level

The following errors occur when running “$ORACLE_HOME/crs/install/rootcrs.sh -postpatch”:

2024-11-09 13:31:20: Executing cmd: /bin/rpm -q sles-release
2024-11-09 13:31:20: Command output:
>  error: rpmdb: BDB0113 Thread/process 128527/140400711669824 failed: BDB1507 Thread died in Berkeley DB library
>  error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
>  error: cannot open Packages index using db5 -  (-30973)
>  error: cannot open Packages database in /var/lib/rpm
>  error: rpmdb: BDB0113 Thread/process 128527/140400711669824 failed: BDB1507 Thread died in Berkeley DB library
>  error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
>  error: cannot open Packages database in /var/lib/rpm
>  package sles-release is not installed
>End Command output
2024-11-09 13:31:20: Check if the startup mechanism systemd is being used
2024-11-09 13:31:20: Executing cmd: /bin/rpm -qf /sbin/init
2024-11-09 13:31:20: Command output:
>  error: rpmdb: BDB0113 Thread/process 128527/140400711669824 failed: BDB1507 Thread died in Berkeley DB library
>  error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
>  error: cannot open Packages index using db5 -  (-30973)
>  error: cannot open Packages database in /var/lib/rpm
>  error: rpmdb: BDB0113 Thread/process 128527/140400711669824 failed: BDB1507 Thread died in Berkeley DB library
>  error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
>  error: cannot open Packages database in /var/lib/rpm
>  error: rpmdb: BDB0113 Thread/process 128527/140400711669824 failed: BDB1507 Thread died in Berkeley DB library
>  error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
>  error: cannot open Packages database in /var/lib/rpm
>  file /sbin/init is not owned by any package
>End Command output
2024-11-09 13:31:20: Executing cmd: /u01/app/19.0.0/grid/bin/clsecho -p has -f clsrsc -m 180 '/bin/rpm -qf /sbin/init' '1'
2024-11-09 13:31:20: Executing cmd: /u01/app/19.0.0/grid/bin/clsecho -p has -f clsrsc -m 180 '/bin/rpm -qf /sbin/init' '1'
2024-11-09 13:31:20: Command output:
>  CLSRSC-180: An error occurred while executing the command '/bin/rpm -qf /sbin/init'
>End Command output
2024-11-09 13:31:20: CLSRSC-180: An error occurred while executing the command '/bin/rpm -qf /sbin/init'
2024-11-09 13:31:20: ###### Begin DIE Stack Trace ######
2024-11-09 13:31:20:     Package         File                 Line Calling
2024-11-09 13:31:20:     --------------- -------------------- ---- ----------
2024-11-09 13:31:20:  1: main            rootcrs.pl            358 crsutils::dietrap
2024-11-09 13:31:20:  2: s_crsutils      s_crsutils.pm        2729 main::__ANON__
2024-11-09 13:31:20:  3: s_crsutils      s_crsutils.pm        2701 s_crsutils::s_checkLinuxInitMethod
2024-11-09 13:31:20:  4: s_crsutils      s_crsutils.pm        3712 s_crsutils::s_is_Linux_Systemd
2024-11-09 13:31:20:  5: oraClusterwareComp::oraohasd oraohasd.pm          1537 s_crsutils::s_copy_afdinit_init
2024-11-09 13:31:20:  6: crspatch        crspatch.pm           730 oraClusterwareComp::oraohasd::update_OHASD
2024-11-09 13:31:20:  7: crspatch        crspatch.pm          1912 crspatch::updateSystemFiles
2024-11-09 13:31:20:  8: crspatch        crspatch.pm          2249 crspatch::performPostPatch
2024-11-09 13:31:20:  9: crspatch        crspatch.pm           526 crspatch::crsPostPatch
2024-11-09 13:31:20: 10: main            rootcrs.pl            371 crspatch::new
2024-11-09 13:31:20: ####### End DIE Stack Trace #######

2024-11-09 13:31:20: ROOTCRS_POSTPATCH_UPDATE_OHASD_SERVICE checkpoint has failed

Solution

OS level RPM database is corrupted.

Recreate RPM database at OS level to resolve this issue:

1. As root OS user run the following:

# rm -f /var/lib/rpm/__*
# /bin/rpm –rebuilddb
# echo $?

2. Validate the rpm database, as the root user:

# /bin/rpm -qa | more

3. If last command listing rpms correctly, then re-run postpatch command or resume your opatchauto patching session:

As root OS user:

cd $GI_HOME/crs/install
For Cluster Grid Infrastructure Install:#./rootcrs.sh -postpatch

For Restart Grid Infrastructure Install:#./roothas.sh -postpatch

Reference

Postpatch Command Fails on Grid Infrastructure Home Due To rpm Database Corruption At The OS Level (Doc ID 2365433.1)

How to Add a Disk into ASM DiskGroup Safely

For a client, there are situations when adding a disk into ASM disk group failed, finally rebooting the server is required to resolve this issue.

According to Oracle support suggestion, we can create a test diskgroup by using this new disk, if everything is successful, then we can drop this test diskgroup, and add the new disk into target diskgroup.

It is a good practice specially for critical production environment.

Check the new disk is a “CANDIDATE”

$sqlplus / as sysasm 



SQL>select INST_ID,GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,
           MODE_STATUS,STATE,REDUNDANCY,OS_MB,OS_MB,TOTAL_MB,NAME,
           FAILGROUP,PATH 
from gv$asm_disk 
where PATH like '/dev/mapper/prod_data_303p1' 
order by 1,2,3;

Create a Test Diskgroup

Create a test diskgroup “TEMP_TEST”, and check disk status is “MEMBER”

SQL> CREATE DISKGROUP TEMP_TEST EXTERNAL REDUNDANCY disk '/dev/mapper/prod_data_303p1' NAME TEMP_TEST_0001;



Diskgroup created.

SQL>select INST_ID,GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,
           MODE_STATUS,STATE,REDUNDANCY,OS_MB,OS_MB,TOTAL_MB,NAME,
           FAILGROUP,PATH 
from gv$asm_disk 
where PATH like '/dev/mapper/prod_data_303p1' 
order by 1,2,3;

Mount the Test Diskgroup on Other Nodes

Mount the test diskgrop on all the rest ASM nodes to make sure all working fine.

$sqlplus / as sysasm

SQL>set pagesize 100
SQL>set linesize 300
SQL>select INST_ID,GROUP_NUMBER,NAME,STATE,TOTAL_MB,FREE_MB,
           COMPATIBILITY,DATABASE_COMPATIBILITY 
from  gv$asm_diskgroup 
where name='TEMP_TEST' 
order by 1,2;

SQL> alter diskgroup TEMP_TEST mount;



Diskgroup altered.

SQL>select INST_ID,GROUP_NUMBER,NAME,STATE,TOTAL_MB,FREE_MB,
           COMPATIBILITY,DATABASE_COMPATIBILITY 
from  gv$asm_diskgroup 
where name='TEMP_TEST' 
order by 1,2;

Dismount the Test Diskgroup on Other Nodes

Dismount the test diskgroup on all the other nodes except the first node for dropping in the next step.

SQL> alter diskgroup TEMP_TEST dismount;



Diskgroup altered.

SQL>select INST_ID,GROUP_NUMBER,NAME,STATE,TOTAL_MB,FREE_MB,
           COMPATIBILITY,DATABASE_COMPATIBILITY 
from  gv$asm_diskgroup 
where name='TEMP_TEST' 
order by 1,2;

Drop the Test Diskgroup

Drop the test diskgroup on first node where the test diskgroup is still mounted.

SQL> DROP DISKGROUP  TEMP_TEST ;



Diskgroup dropped.

And check the new disk header status is “FORMER”, which is available for adding into target diskgroup now.

SQL>select INST_ID,GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,
           MODE_STATUS,STATE,REDUNDANCY,OS_MB,OS_MB,TOTAL_MB,NAME,
           FAILGROUP,PATH 
from gv$asm_disk 
where PATH like '/dev/mapper/prod_data_303p1' 
order by 1,2,3;

Add Disk into Target Diskgroup

Now it is time to add the new disk into target diskgroup.

SQL> set time on

08:08:17 SQL> ALTER DISKGROUP PROD_DATA ADD DISK '/dev/mapper/prod_data_303p1' NAME PROD_DATA_0012;



Diskgroup altered.

Then check disk and diskgroup status

09:09:01 SQL>select INST_ID,GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,
                    HEADER_STATUS,MODE_STATUS,STATE,REDUNDANCY,
                    OS_MB,OS_MB,TOTAL_MB,NAME,FAILGROUP,PATH 
from gv$asm_disk 
where PATH like '%prod_data%' 
order by 1,2,3;

09:09:14 SQL>  select INST_ID,GROUP_NUMBER,NAME,STATE,TOTAL_MB,FREE_MB,
                      COMPATIBILITY,DATABASE_COMPATIBILITY 
from  gv$asm_diskgroup 
where name='PROD_DATA' ;

Monitor and Check Rebalance

00:10:35 SQL> select GROUP_NUMBER,OPERATION,STATE,POWER,ACTUAL,
                     SOFAR,EST_WORK,EST_RATE,EST_MINUTES 
              from   v$asm_operation;

How to Find DBID from OEM Repository

A client has drooped an Oracle database, and client wants to know DBID of the dropped database.

From OEM repository, we can query some tables or views to get DBID for a specific database. for example :

 SQL> select HOST_NAME,TARGET_NAME,TARGET_TYPE,TARGET_GUID,DBID,VERSION        from  MGMT$DB_CPU_USAGE where upper(TARGET_NAME)='DB_NAME';

RMAN command to list dbid of the database:

$ rman catalog rman@rmancatalog
RMAN>  list incarnation of database TESTDB;