“DGM-17016: failed to retrieve status for database” in Data Guard Broker

The following error occurred in DG broker command on primary database side:

DGMGRL> show database verbose 'STDBYDB'

Database - STDBYDB

 Role: PHYSICAL STANDBY
...
..
.
Database Status:
DGM-17016: failed to retrieve status for database "STDBYDB"
ORA-16664: unable to receive the result from a database

Subscribe to get access

Read more of this content when you subscribe today.

ORA-16849 ORA-16816 After Uncompleted Data Guard Switchover

For some reason, DG ( Data Guard ) switchover command is incomplete with following warnings:

Database Error(s):
 ORA-16849: a previous switchover from this database was interrupted 
            and aborted
 ORA-16816: incorrect database role

After detailed checks, it is confirmed that database role has been changed successfully.

-- on old primary database
--
SQL> select open_mode, database_role from v$database;

OPEN_MODE            DATABASE_ROLE
-------------------- ----------------
MOUNTED              PHYSICAL STANDBY

-- on old standby database
--
SQL> select open_mode, database_role from v$database;

OPEN_MODE            DATABASE_ROLE
-------------------- ----------------
MOUNTED              PRIMARY

So the issues is the final role change has not been recognized by the DG broker, so we have to rebuild the Data Guard Broker Configuration when the Databases have been restarted:

DGMGRL> remove configuration;
DGMGRL> create configuration 'PRIMDB' AS PRIMARY DATABASE IS 
       'PRIMDB' CONNECT IDENTIFIER IS 'PRIMDB'; 
DGMGRL> add database 'STDBYDB' AS CONNECT IDENTIFIER IS 'STDBYDB' 
        maintained as physical;
DGMGRL> enable configuration;
DGMGRL> show configuration;
DGMGRL> show database verbose 'PRIMDB';
DGMGRL> show database verbose 'STDBYDB';

Create a 12c RAC Standby Database From Overnight RMAN Tape Backups

Create a RAC Standby Database From Overnight RMAN Tape Backups

Primary RAC servers: racnode1,racnode2
Standby RAC servers: stdbynode1,stdbynode2
Primary Database Name:  RACTEST
Primary Instance Names: RACTEST1/2
Standby Instance Names: STBTEST1/2

RAC Database version: 12.1.x.x

1)On primary database, check most recent RMAN backups are available.

$ rman target /  catalog rman/password@rman
RMAN> list backup of database;

2) On primary, create a standby controlfile.

SQL> alter database create standby controlfile as '/tmp/RACTEST_standby.ctl';
Database altered.

or

RMAN>  backup device type disk format '/tmp/RACTEST_standby.ctl' current controlfile for standby;

3)Copy standby controlfile onto standby server stdbynode1:/tmp.

[RACTEST1] oracle@racnode1:/tmp$ scp RACTEST_standby.ctl stdbynode1:/tmp/

-4) Copy primary database password file onto standby database server as per “How to copy password file from primary to standby database in 12c onwards

5) On primary database, make change for FORCE LOGGING, only if database is not in “force logging” yet.

 SQL> ALTER DATABASE FORCE LOGGING;

6)  On primary database,  change or add below parameters .

DB_UNIQUE_NAME=RACTEST
LOG_ARCHIVE_CONFIG='DG_CONFIG=(STBTEST,RACTEST)'
DB_FILE_NAME_CONVERT='STBTEST','RACTEST'
LOG_FILE_NAME_CONVERT='STBTEST','RACTEST'
STANDBY_FILE_MANAGEMENT=AUTO
FAL_SERVER=STBTEST
FAL_CLIENT=RACTEST
LOG_ARCHIVE_DEST_STATE_1=ENABLE
LOG_ARCHIVE_DEST_STATE_2=ENABLE
log_archive_dest_1='LOCATION=use_db_recovery_file_dest VALID_FOR=( ALL_LOGFILES, ALL_ROLES) DB_UNIQUE_NAME=RACTEST'
LOG_ARCHIVE_DEST_2='SERVICE=STBTEST VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=STBTEST'

7) On standby database, change or add below parameters.

DB_NAME=RACTEST
DB_UNIQUE_NAME=STBTEST
LOG_ARCHIVE_CONFIG='DG_CONFIG=(RACTEST,STBTEST)'
DB_FILE_NAME_CONVERT='RACTEST','STBTEST'
LOG_FILE_NAME_CONVERT='RACTEST','STBTEST'
STANDBY_FILE_MANAGEMENT=AUTO
FAL_SERVER=RACTEST
FAL_CLIENT=STBTEST
*.log_archive_dest_1='LOCATION=use_db_recovery_file_dest VALID_FOR=( ALL_LOGFILES, ALL_ROLES) DB_UNIQUE_NAME=STBTEST'
*.LOG_ARCHIVE_DEST_2='SERVICE=RACTEST LGWR SYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=RACTEST'

8) On standby database, set standby *.control_files=(‘+DATA1′,’+FRA’), and startup database in nomount mode.

SQL> startup nomount;
ORACLE instance started.

Total System Global Area 1.5366E+10 bytes
Fixed Size 2165488 bytes
Variable Size 7650411792 bytes
Database Buffers 7650410496 bytes
Redo Buffers 63135744 bytes

9) On standby database, restore standby control files. Update parameter “control_files” accordingly.

[STBTEST1] oracle@stdbynode1:/tmp$ rman target /
RMAN> restore controlfile to '+FRA' from '/tmp/RACTEST2_standby.ctl';
RMAN> restore controlfile to '+DATA1'  from '/tmp/RACTEST2_standby.ctl';
control_files = '+DATA1/STBTEST/CONTROLFILE/current.343.925553399','+fra/STBTEST/CONTROLFILE/current.642.925553371'

10) On standby database, shutdown and mount standby database.

SQL> shutdown immediate;
ORA-01507: database not mounted


ORACLE instance shut down.

SQL> startup nomount;
ORACLE instance started.
Total System Global Area 1.0262E+11 bytes
Fixed Size 2265224 bytes
Variable Size 1.2885E+10 bytes
Database Buffers 8.9657E+10 bytes
Redo Buffers 74416128 bytes

SQL> alter database mount standby database;
Database altered.

11) On standby database, restore datafiles from overnight RMAN backups.

[STBTEST1] oracle@stdbynode1:/u01/app/oracle/admin/STBTEST/scripts$ cat restore_STBTEST.rman

run {
allocate channel c1 device type sbt_tape parms='ENV=(NB_ORA_CLIENT=racnode1)';
allocate channel c2 device type sbt_tape parms='ENV=(NB_ORA_CLIENT=racnode1)';
allocate channel c3 device type sbt_tape parms='ENV=(NB_ORA_CLIENT=racnode1)';
allocate channel c4 device type sbt_tape parms='ENV=(NB_ORA_CLIENT=racnode1)';
restore database;
release channel c1;
release channel c2;
release channel c3;
release channel c4;
}

[STBTEST1] oracle@stdbynode1:/u01/app/oracle/admin/STBTEST/scripts$ nohup rman target / catalog rman/password@rman cmdfile=/u01/app/oracle/admin/STBTEST/scripts/restore_STBTEST.rman log=/u01/app/oracle/admin/STBTEST/scripts/restore_STBTEST.log &
[1] 19448

12) On standby database, monitor the progressing by querying gv$session_longops.

SQL> select INST_ID,OPNAME,SOFAR*8/1024/ELAPSED_SECONDS,ELAPSED_SECONDS/60,TIME_REMAINING/60 from gv$session_longops where OPNAME like 'RMAN%' and TIME_REMAINING>6 order by 2;
 INST_ID OPNAME SOFAR*8/1024/ELAPSED_SECONDS ELAPSED_SECONDS/60 TIME_REMAINING/60
---------- ---------------------------------------------------------------- ---------------------------- ------------------ -----------------
 1 RMAN: aggregate input 425.803241 20.1666667 102.85
 1 RMAN: full datafile restore 105.103955 21.55 103.016667
 1 RMAN: full datafile restore 112.092109 21.9 94.7833333
 1 RMAN: full datafile restore 113.487487 21.8833333 93.4333333
 1 RMAN: full datafile restore 108.213722 21.8833333 99.35

13) Restore gap archivelogs.

RMAN>restore archivelog from logseq=100 until logseq=200 thread=1;

RMAN>restore archivelog from logseq=80 until logseq=180 thread=2;

14)  Add standby online logfiles ( n + 1 ) into both primary and standby database.

SQL> alter database add standby logfile thread 1 group 11 size 1024m;

Database altered.
....
...
..
.

15) On standby database,  create spfile if not yet.

SQL> create spfile='+DATA1' from pfile='/u01/app/oracle/product/12.1.0/dbhome_1/dbs/initSTBTEST1.ora';

File created.

$ echo spfile='+DATA1/STBTEST/PARAMETERFILE/spfile.1323.925567677' > initSTBTEST1.ora

$ scp initSTBTEST1.ora stdbynode2:/u01/app/oracle/product/12.1.0/dbhome_1/dbs/initSTBTEST2.ora

16) On standby database, put standby database under recovery.

SQL> alter database recover managed standby database cancel;

Database altered.

SQL> shutdown immediate;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.

SQL> startup nomount;
ORACLE instance started.

Total System Global Area 1.0489E+10 bytes
Fixed Size 2162560 bytes
Variable Size 5704253568 bytes
Database Buffers 4764729344 bytes
Redo Buffers 17899520 bytes

SQL> alter database mount standby database;

Database altered.

SQL> alter database recover managed standby database disconnect from session;

Database altered.

17) Add standby database into cluster.

[STBTEST1] oracle@stdbynode1:/u01/app/oracle/product/12.1.0/dbhome_1/dbs$ srvctl add database -n STBTEST -o /u01/app/oracle/product/12.1.0/dbhome_1 -d STBTEST -r physical_standby
[STBTEST1] oracle@stdbynode1:/u01/app/oracle/product/12.1.0/dbhome_1/dbs$ srvctl modify database -d STBTEST -s mount
[STBTEST1] oracle@stdbynode1:/u01/app/oracle/product/12.1.0/dbhome_1/dbs$ srvctl add instance -d STBTEST -i STBTEST1 -n stdbynode1
[STBTEST1] oracle@stdbynode1:/u01/app/oracle/product/12.1.0/dbhome_1/dbs$ srvctl add instance -d STBTEST -i STBTEST2 -n stdbynode2

18) Configure archivelog deletion policy for both primary and standby databases.

a) On primary, log on to RMAN and change the archive log deletion policy by running:

RMAN>CONFIGURE ARCHIVELOG DELETION POLICY TO NONE;

b) On standby, log on to RMAN and change the archive log deletion policy by running:

RMAN>CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON STANDBY;

19) For 12c onwards, no need to add a static registration into LISTENER.ora on all nodes of both primary and standby database. Below is only for under 12c like 11.2.0.4, etc.

#
# DGMGRL>switchover to <standby> Fails with ORA-12514 (Doc ID 1582927.1)
# Each Standby needs a static registration for the switchover to work
#
-- for node racnode1

SID_LIST_LISTENER =
 (SID_LIST =
 (SID_DESC =
 (GLOBAL_DBNAME = STBTEST_DGMGRL)
 (ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1)
 (SID_NAME = RACTEST1)
 )
 )

-- for node racnode2

SID_LIST_LISTENER =
 (SID_LIST =
 (SID_DESC =
 (GLOBAL_DBNAME = STBTEST_DGMGRL)
 (ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1)
 (SID_NAME = RACTEST2)
 )
 )

-- for node stdbynode1

SID_LIST_LISTENER =
 (SID_LIST =
 (SID_DESC =
 (GLOBAL_DBNAME = STBTEST_DGMGRL)
 (ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1)
 (SID_NAME = STBTEST1)
 )
 )

-- for node stdbynode2

SID_LIST_LISTENER =
 (SID_LIST =
 (SID_DESC =
 (GLOBAL_DBNAME = STBTEST_DGMGRL)
 (ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1)
 (SID_NAME = STBTEST2)
 )
 )


DGMGRL> show instance 'STBTEST1' StaticConnectIdentifier
 StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.10)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=STBTEST_DGMGRL)(INSTANCE_NAME=STBTEST1)(SERVER=DEDICATED)))'
DGMGRL> show instance 'STBTEST2' StaticConnectIdentifier
 StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.12)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=STBTEST_DGMGRL)(INSTANCE_NAME=STBTEST2)(SERVER=DEDICATED)))'
DGMGRL> show instance 'RACTEST1' StaticConnectIdentifier
 StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.20)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=RACTEST_DGMGRL)(INSTANCE_NAME=RACTEST1)(SERVER=DEDICATED)))'
DGMGRL> show instance 'RACTEST2' StaticConnectIdentifier
 StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.22)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=RACTEST_DGMGRL)(INSTANCE_NAME=RACTEST2)(SERVER=DEDICATED)))'

root.sh Failed with “CLSRSC-670: No msg for has:clsrsc-670” When Adding a Node

While adding a node by running ‘root.sh’ in 12.1.0.2, the following errors occurred:

# /u01/app/12.1.0.2/grid/root.sh
Check /u01/app/12.1.0.2/grid/install/root_racnode2_2018-01-26_23-56-
01.log for the output of root script

#cat /u01/app/12.1.0.2/grid/install/root_racnode2_2018-01-26_23-56-01.log
Performing root user operation.

The following environment variables are set as:
 ORACLE_OWNER= grid
 ORACLE_HOME= /u01/app/12.1.0.2/grid
 Copying dbhome to /usr/local/bin ...
 Copying oraenv to /usr/local/bin ...
 Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/12.1.0.2/grid/crs/
install/crsconfig_params
2018/01/26 23:56:02 CLSRSC-4001: Installing Oracle Trace File 
Analyzer (TFA) Collector.

2018/01/26 23:56:02 CLSRSC-4002: Successfully installed Oracle Trace 
File Analyzer (TFA) Collector.

2018/01/26 23:56:02 CLSRSC-670: No msg for has:clsrsc-670
Died at /u01/app/12.1.0.2/grid/crs/install/crsinstall.pm line 3800.

The command '/u01/app/12.1.0.2/grid/perl/bin/perl -I/u01/app/12.1.0.2
/grid/perl/lib -I/u01/app/12.1.0.2/grid/crs/install /u01/app/12.1.0.2
/grid/crs/install/rootcrs.pl ' execution failed
#

Check rootcrs log under GI_HOME/cfgtoollogs/crsconfig:

$cd /u01/app/12.1.0.2/grid/cfgtoollogs/crsconfig $ tail -30 rootcrs_racnode2_2018-01-26_11-56-02PM.log > > Installed Build Version: 122120 Build Date: 201708280807 > > TFA-00022: TFA is already running latest version. No need to patch. > >End Command output 
2018-01-26 23:56:02: Executing cmd: /u01/app/12.1.0.2/grid/bin/clsecho -p has -f clsrsc -m 4002 
2018-01-26 23:56:02: Executing cmd: /u01/app/12.1.0.2/grid/bin/clsecho -p has -f clsrsc -m 4002 
2018-01-26 23:56:02: Command output: > CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector. >End Command output 
2018-01-26 23:56:02: CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector. 
2018-01-26 23:56:02: The install script root.sh was run on an upgraded node. 
2018-01-26 23:56:02: Executing cmd: /u01/app/12.1.0.2/grid/bin/clsecho -p has -f clsrsc -m 670 
2018-01-26 23:56:02: Executing cmd: /u01/app/12.1.0.2/grid/bin/clsecho -p has -f clsrsc -m 670 
2018-01-2623:56:02: Command output: > CLSRSC-670: No msg for has:clsrsc-670 >End Command output 
2018-01-26 23:56:02: CLSRSC-670: No msg for has:clsrsc-670 
2018-01-26 23:56:02: ###### Begin DIE Stack Trace ###### 
2018-01-26 23:56:02: Package File Line Calling 
2018-01-26 23:56:02: --------------- -------------------- ---- ---------- 2018-01-26 23:56:02: 1: main rootcrs.pl 267 crsutils::dietrap 
2018-01-26 23:56:02: 2: crsinstall crsinstall.pm 3800 main::__ANON__ 
2018-01-26 23:56:02: 3: crsinstall crsinstall.pm 380 rsinstall::preInstallChecks 
2018-01-26 23:56:02: 4: crsinstall crsinstall.pm 318 crsinstall::CRSInstall 2018-01-26 23:56:02: 5: main rootcrs.pl 410 crsinstall::new 
2018-01-26 23:56:02: ####### End DIE Stack Trace ####### 
2018-01-26 23:56:02: checkpoint has failed

Raised a SR to Oracle Support who advised this is a bug — “Bug 26200970 : LNX64-12202-UD:FAILED TO ADD NODE AFTER UPGRADING FROM 11204 TO 12202”.

Bug 26200970 fixed in 18.1

WORKAROUND

Take backup of crsconfig_params file inside GI_HOME/crs/install,  then change the value for ASM_UPGRADE=false and re-run root.sh successfully.

How to Apply Grid Infrastructure Patches Before Upgrading or Installing Cluster

While upgrading GI from 12.1.0.2 to GI 12.2.0.1, in order to minimize the downtime and reduce the impact on PROD environment, PSU or one off patches can be applied onto 12.2.0.1 image before setting up 12.2.0.1 GI.

Here is the example of applying GI JAN 2018 Release Update ( RU) 12.2.0.1.180116.

1)As the owner of GI HOME, unzip the gold image under new 12.2.0.1 GI HOME.

$cd  /u00/app/12.2.0.1 
$unzip <SW-LOCATION>/linuxx64_12201_grid_home.zip -d ./

2)As the owner of GI HOME, unzip GI JAN 2018 RU patch.

$cd  /tmp
$unzip <SW-LOCATION>/p27100009_122010_GI_RU_180116_Linux-x86-64.zip

3) Apply the patch by using option “-applyPSU”, and confirm the successful patching.

$cd /u01/app/12.2.0.1/grid
$ ./gridSetup.sh -silent -applyPSU /tmp/27100009/
Preparing the home to patch...
Applying the patch /tmp/27100009/...
Successfully applied the patch.
The log can be found at: /u01/app/oraInventory/logs/
GridSetupActions2018-02-01_03-05-11PM/installerPatchActions_2018-02-
01_03-05-11PM.log

Check the log to confirm the patches are applied successfully.

$cd /u01/app/12.2.0.1/grid/cfgtoollogs/opatch
$ grep -i "successfully applied" *
opatch2018-02-01_15-05-36PM_1.log:[Feb 1, 2018 3:05:41 PM] [INFO] Patch 27144050 successfully applied.
opatch2018-02-01_15-05-41PM_1.log:[Feb 1, 2018 3:06:33 PM] [INFO] Patch 27128906 successfully applied.
opatch2018-02-01_15-06-34PM_1.log:[Feb 1, 2018 3:07:20 PM] [INFO] Patch 27335416 successfully applied.
opatch2018-02-01_15-07-20PM_1.log:[Feb 1, 2018 3:07:49 PM] [INFO] Patch 27105253 successfully applied.
opatch2018-02-01_15-07-49PM_1.log:[Feb 1, 2018 3:07:54 PM] [INFO] Patch 26839277 successfully applied.

Or apply the patch one by one by using “-applyOneOffs” option:

$cd /u01/app/12.2.0.1/grid
$./gridSetup.sh -silent -applyOneOffs /tmp/27100009/27144050
$./gridSetup.sh -silent -applyOneOffs /tmp/27100009/27128906
$./gridSetup.sh -silent -applyOneOffs /tmp/27100009/27335416 
$./gridSetup.sh -silent -applyOneOffs /tmp/27100009/27105253 
$./gridSetup.sh -silent -applyOneOffs /tmp/27100009/26839277

4) After the GI upgraded , we can confirm the patches are applied and listed successfully:

$ opatch lspatches
26839277;DBWLM RELEASE UPDATE 12.2.0.1.0(ID:170913) (26839277)
27105253;Database Release Update : 12.2.0.1.180116 (27105253)
27335416;OCW JAN 2018 RELEASE UPDATE 12.2.0.1.180116 (27335416)
27128906;ACFS Release Update : 12.2.0.1.0 (27128906)
27144050;Tomcat Release Update 12.2.0.1.0(ID:171023.0830) (27144050)

OPatch succeeded.

Reference:

How to Apply a Grid Infrastructure Patch Before root script (root.sh or rootupgrade.sh) is Executed? (Doc ID 1410202.1)