ORA-16857: standby disconnected from redo source for longer than specified threshold

One single 11.2.0.4 instance Oracle database alert log shows below information;

 RFS[1]: No standby redo logfiles available for thread 1

Data Guard shows ORA-16857 error:

DGMGRL> show database "TESTSTY";

Database - TESTSTY

Role: PHYSICAL STANDBY
 Intended State: APPLY-ON
 Transport Lag: 10 minutes 32 seconds (computed 48 seconds ago)
 Apply Lag: 10 minutes 32 seconds (computed 48 seconds ago)
 Apply Rate: 39.97 MByte/s
 Real Time Query: OFF
 Instance(s):
 TESTSTY

Database Warning(s):
 ORA-16857: standby disconnected from redo source for longer than 
specified threshold

Database Status:
WARNING

Checked both primary and standby database, the  standby online redo logs have been created. But the size of the standby online redo logs are different from the database redo logs for both primary and secondary database.

— on standby :

SQL> select GROUP#,THREAD# ,BYTES/1024/1024 from v$standby_log;

GROUP#     THREAD#    BYTES/1024/1024
---------- ---------- ---------------
 4         1           50
 5         1           50
 6         1           50
 7         1           50

SQL> select GROUP#,THREAD#,BYTES/1024/1024 from v$log;

GROUP#     THREAD#    BYTES/1024/1024
---------- ---------- ---------------
 1         1          100
 3         1          100
 2         1          100

— On Primary

SQL> select GROUP#,THREAD# ,BYTES/1024/1024 from v$standby_log;

GROUP#     THREAD#    BYTES/1024/1024
---------- ---------- ---------------
 4         1          50
 5         1          50
 6         1          50
 7         1          50

SQL> select GROUP#,THREAD#,BYTES/1024/1024 from v$log;

GROUP#     THREAD#    BYTES/1024/1024
---------- ---------- ---------------
 1         1          100
 2         1          100
 3         1          100

Drop all standby online redo logs on both primary and standby databases, and recreate them again with same size as redo logfiles.

-- for standby db which is under recovery, recovery needs to be stopped first
SQL>alter database recover managed standby database cancel;

SQL>alter database add standby logfile thread 1 group 4 size 100m;

It will create standby online redo logs for one under +FRA, and another one under from “db_create_file_dest” parameter, if “db_create_online_log_dest_x” are not defined.

Finally restart the recovery process, then everything is fine.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE 
THROUGH ALL SWITCHOVER DISCONNECT USING CURRENT LOGFILE;

Database altered.

alert.log:

RFS[1]: Selected log 4 for thread 1 sequence 41436 dbid 1345227970 branch 816878594
Mon Feb 19 14:38:25 2018
..
.
Recovery of Online Redo Log: Thread 1 Group 4 Seq 41436 Reading mem 0
..
.

DGMGRL of DataGuard:

DGMGRL> show database 'TESTSTY';

Database - TESTSTY

Role: PHYSICAL STANDBY
 Intended State: APPLY-ON
 Transport Lag: 0 seconds (computed 0 seconds ago)
 Apply Lag: 0 seconds (computed 0 seconds ago)
 Apply Rate: 208.00 KByte/s
 Real Time Query: OFF
 Instance(s):
 TESTSTY

Database Status:
SUCCESS

Warning: ORA-16857: standby disconnected from redo source for longer than specified threshold

One single 11.2.0.4 instance Oracle database alert log shows following information;

 RFS[1]: No standby redo logfiles available for thread 1

Data Guard shows ORA-16857 error:

DGMGRL> show database "TESTSTY";

Database - TESTSTY

Role: PHYSICAL STANDBY
 Intended State: APPLY-ON
 Transport Lag: 10 minutes 32 seconds (computed 48 seconds ago)
 Apply Lag: 10 minutes 32 seconds (computed 48 seconds ago)
 Apply Rate: 39.97 MByte/s
 Real Time Query: OFF
 Instance(s):
 TESTSTY

Database Warning(s):
 ORA-16857: standby disconnected from redo source for longer than 
specified threshold

Database Status:
WARNING

Checked both primary and standby database, the  standby online redo logs have been created. It is strange to see some of the thread id is different between primary and secondary database.

— on standby :

SQL> select GROUP#,THREAD#,BYTES/1024/1024 ,ARCHIVED,STATUS
       from v$standby_log; 2

GROUP#     THREAD#    BYTES/1024/1024 ARC STATUS
---------- ---------- --------------- --- ----------
 4         1          50              NO UNASSIGNED
 5         0          50              YES UNASSIGNED
 6         0          50              YES UNASSIGNED
 7         0          50              YES UNASSIGNED

— On Primary

SQL> select GROUP#,THREAD#,BYTES/1024/1024 ,ARCHIVED,STATUS 
       from v$standby_log;

GROUP#     THREAD#    BYTES/1024/1024 ARC STATUS
---------- ---------- --------------- --- ----------
 4         0          50              YES UNASSIGNED
 5         0          50              YES UNASSIGNED
 6         0          50              YES UNASSIGNED
 7         0          50              YES UNASSIGNED

Drop all standby online redo logs on both primary and standby databases, and recreate then again by specify “thread 1” explicitly.

-- for standby db which is under recovery, recovery needs to be stopped first
SQL>alter database recover managed standby database cancel;

SQL>alter database add standby logfile thread 1 group 4 size 50m;

It will create standby online redo logs for one under +FRA, and another one under from “db_create_file_dest” parameter, if “db_create_online_log_dest_x” are not defined.

Finally restart the recovery process, everything is fine.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE 
THROUGH ALL SWITCHOVER DISCONNECT USING CURRENT LOGFILE;

Database altered.

No standby redo logfiles available for T-1

It looks like a bug.

One single instance 12c Oracle database alert log shows the following information:

 RFS[1]: No standby redo logfiles available for T-1

While DataGuard shows ORA-16857 error:

DGMGRL> show database verbose "TESTSTY";

Database - TESTSTY

 Role: PHYSICAL STANDBY
 Intended State: APPLY-OFF
 Transport Lag: 7 minutes 9 seconds (computed 50 seconds ago)
 Apply Lag: 7 minutes 9 seconds (computed 50 seconds ago)
 Average Apply Rate: (unknown)
 Active Apply Rate: (unknown)
 Maximum Apply Rate: (unknown)
 Real Time Query: OFF
 Instance(s):
 TESTSTY

Database Warning(s):
 ORA-16857: standby disconnected from redo source for longer than specified threshold

DGMGRL> validate database "TESTSTY"
...
..
.
Warning: standby redo logs not configured for thread 1 on TESTSTY

Checked both primary and standby database, the standby redo logs have been created. It is strange to see the thread id is different between primary and secondary database.

— on standby :

SQL>select GROUP#,THREAD#,BYTES/1024/1024/1024,ARCHIVED,STATUS,CON_ID 
      from v$standby_log;

GROUP#     THREAD#    BYTES/1024/1024/1024 ARC STATUS     CON_ID
---------- ---------- -------------------- --- ---------- ----------
 4         0          1                    YES UNASSIGNED 0
 5         0          1                    YES UNASSIGNED 0
 6         0          1                    YES UNASSIGNED 0
 7         0          1                    YES UNASSIGNED 0

— On Primary

SQL>select GROUP#,THREAD#,BYTES/1024/1024/1024,ARCHIVED,STATUS,CON_ID from v$standby_log;

GROUP#     THREAD#    BYTES/1024/1024/1024 ARC STATUS     CON_ID
---------- ---------- -------------------- --- ---------- ----------
 4         1          1                    YES UNASSIGNED 0
 5         1          1                    YES UNASSIGNED 0
 6         1          1                    YES UNASSIGNED 0
 7         1          1                    YES UNASSIGNED 0

Drop all standby online redo logs on both primary and standby databases, and recreate then again by specify “thread 1” explicitly.

-- for standby db which is under recovery, recovery needs to be stopped first
SQL>alter database recover managed standby database cancel;

SQL>alter database add standby logfile thread 1 group 4 size 1024m;

It will create standby online redo logs for one under +FRA, and another one under from “db_create_file_dest” parameter, if “db_create_online_log_dest_x” are not defined.

Finally restart the recovery process, everything is fine.

SQL>alter database recover managed standby database disconnect nodelay;

Database altered.

Only see this thread issue for single instance database in 12c, which is a new feature or bug ?

ORA-16826: apply service state is inconsistent with the DelayMins property

For DataGuard Real Time Apply”, USING “CURRENT LOGFILE” option for “ALTER DATABASE RECOVER MANAGED STANDBY DATABASE” matches property “DelayMins = ‘0’”.
Non Real Time Apply : alter database recover managed standby database disconnect from session
REAL TIME APPLY : ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION
Achieve real time apply by using online standby redo logs

OEM alert for DataGuard status is “ORA-16826: apply service state is inconsistent with the DelayMins property”.

Oracle official explanation is as below :

$oerr ora 16826
16826, 0000, "apply service state is inconsistent with the DelayMins property"
// *Cause: This warning was caused by one of the following reasons:
// - The apply service was started without specifying the real-time
// apply option or without the NODELAY option when the DelayMins
// property was set to zero.
// - The apply service was started with the real-time apply option or
// with the NODELAY option when the DelayMins property was set to
// a value greater than zero.
// *Action: Reenable the standby database to allow the broker to restart
// the apply service with the apply options that are consistent
// with the specified value of the DelayMins property.

It seems the “DelayMins” property is not matching the option used in “alter database recover managed standby database …..”. Let’s have a detail check.

DGMGRL> show configuration;
..
Protection Mode: MaxPerformance
 Databases:
 PRIMYDB - Primary database
 STDBYDB - Physical standby database
 Warning: ORA-16826: apply service state is inconsistent with the DelayMins property

Fast-Start Failover: DISABLED
Configuration Status:
WARNING

Primary database side is good, the warning is from standby database side.

DGMGRL> show database verbose "STDBYDB";

Database - STDBYDB

 Role: PHYSICAL STANDBY
 Intended State: APPLY-ON
 Transport Lag: 0 seconds (computed 0 seconds ago)
 Apply Lag: 13 minutes 1 second (computed 0 seconds ago)
 Apply Rate: 61.33 MByte/s
 Real Time Query: OFF
 Instance(s):
 STDBYDB1
 STDBYDB2 (apply instance)
 
Database Warning(s):
 ORA-16826: apply service state is inconsistent with the DelayMins property
DGMGRL> show database "STDBYDB" "DelayMins"
 DelayMins = '0'
DGMGRL>

On standby database applying instance STDBYDB2. From alert.log, we can see the standby database “not using Real Time Apply”.

...
..
alter database recover managed standby database disconnect from session
Attempt to start background Managed Standby Recovery process (STDBYDB2)
Thu Jan 12 16:00:06 2017
MRP0 started with pid=55, OS id=66078
MRP0: Background Managed Standby Recovery process started (STDBYDB2)
 started logmerger process
Thu Jan 12 16:00:11 2017
Managed Standby Recovery not using Real Time Apply
Parallel Media Recovery started with 32 slaves
Thu Jan 12 16:00:15 2017
Block change tracking file is current.
Starting background process CTWR
Thu Jan 12 16:00:15 2017
CTWR started with pid=89, OS id=66416
Block change tracking service is active.
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Media Recovery Log +FRA/stdbydb/archivelog/2017_01_12/thread_1_seq_328339.32376.933091003
Media Recovery Log +FRA/stdbydb/archivelog/2017_01_12/thread_2_seq_555739.2592.933091137
Thu Jan 12 16:00:17 2017
Completed: alter database recover managed standby database disconnect from session
...
..

Move on to any  primay database instance and run below query, we can see the same — the standby database “not using Real Time Apply”

SQL> select DEST_NAME,STATUS,DATABASE_MODE,RECOVERY_MODE 
       from V$ARCHIVE_DEST_STATUS 
       where status!='INACTIVE';

DEST_NAME           STATUS      DATABASE_MODE   RECOVERY_MODE 
------------------- ----------- --------------  --------------
LOG_ARCHIVE_DEST_1  VALID       OPEN IDLE
LOG_ARCHIVE_DEST_2  VALID       MOUNTED-STANDBY MANAGED

On standby database, stop the Redo Apply and then start the redo apply in real time mode.

SQL> alter database recover managed standby database cancel;

Database altered.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;

Database altered.

Check standby database alert.log again, we can see standby database in “Real Time Apply” by using standby online logs:

...
..
Thu Jan 12 23:01:56 2017
Managed Standby Recovery Canceled (STDBYDB2)
Completed: alter database recover managed standby database cancel
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION
Attempt to start background Managed Standby Recovery process (STDBYDB2)
Thu Jan 12 23:02:00 2017
MRP0 started with pid=54, OS id=106637
MRP0: Background Managed Standby Recovery process started (STDBYDB2)
 started logmerger process
Thu Jan 12 23:02:05 2017
Managed Standby Recovery starting Real Time Apply
...
..
Thu Jan 12 23:02:09 2017
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION
Thu Jan 12 23:02:39 2017
Media Recovery Log +FRA/stdbydb/archivelog/2017_01_12/thread_4_seq_273573.33789.933116469
Media Recovery Waiting for thread 1 sequence 328381 (in transit)
Recovery of Online Redo Log: Thread 1 Group 101 Seq 328381 Reading mem 0
 Mem# 0: +DATA2/stdbydb/onlinelog/group_101.1037.903651087
....
...

On any primary database instance, run below sql, we can see the same — standby database in “Real Time Apply” by using standby online logs:

SQL> select DEST_NAME,STATUS,DATABASE_MODE,RECOVERY_MODE from V$ARCHIVE_DEST_STATUS where status!='INACTIVE';

DEST_NAME           STATUS      DATABASE_MODE   RECOVERY_MODE 
------------------- ----------- --------------  --------------
LOG_ARCHIVE_DEST_1  VALID       OPEN IDLE
LOG_ARCHIVE_DEST_2  VALID       MOUNTED-STANDBY REAL TIME APPLY

Finally check DataGuard configurations:

DGMGRL> show configuration;

...

Protection Mode: MaxPerformance
 Databases:
 PRIMYDB - Primary database
 STDBYDB - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

DGMGRL> show database verbose "STDBYDB";

Database - STDBYDB

 Role: PHYSICAL STANDBY
 Intended State: APPLY-ON
 Transport Lag: 0 seconds (computed 0 seconds ago)
 Apply Lag: 0 seconds (computed 0 seconds ago)
 Apply Rate: 8.93 MByte/s
 Real Time Query: OFF
 Instance(s):
 STDBYDB1
 STDBYDB2 (apply instance)
 
Database Warning(s):
SUCCESS

How to Relocate Standby Database ASM Datafiles

The alternative for relocating ASM datafiles by RMAN commands :
CATALOG START WITH “+DATA2/../”
SWITCH DATABASE TO COPY.

TASK

This task is to relocate all the datafiles of diskgroup DATA1 to another diskgroup DATA2 for a standby database.

SOULUTION

1)Check the diskgroup space availability.

SQL> set pagesize 120
SQL> set linesize 180
SQL> select NAME,TOTAL_MB/1024,FREE_MB/1024,USABLE_FILE_MB/1024 
       from v$asm_diskgroup;

NAME T    TOTAL_MB/1024 FREE_MB/1024 USABLE_FILE_MB/1024
--------- ------------- ------------ -------------------
DATA1     1024.00977      15.6894531   15.6894531
OCR_VOTE    19.9931641    15.4257813   15.4257813
FRA        500.069336    210.331055   210.331055
DATA2     4000.55469    1816.02344   1816.02344

2)Leave managed standby database recovery still going. Create a RMAN script by using below query:

SQL> select 'backup as copy datafile '||file#||' format ''+DATA2'';' 
       from v$datafile 
      where name like '+DATA1%' order by 1;

'BACKUPASCOPYDATAFILE'||FILE#||'FORMAT''+DATA2'';'
---------------------------------------------------
backup as copy datafile 124 format '+DATA2';
backup as copy datafile 125 format '+DATA2';
backup as copy datafile 126 format '+DATA2';
backup as copy datafile 127 format '+DATA2';
.....
....
...
..
.

The final RMAN script content:

$cat move_DATA1_to_DATA2.rman

run {
allocate channel stdbych1 type disk;
allocate channel stdbych2 type disk;
allocate channel stdbych3 type disk;
allocate channel stdbych4 type disk;
backup as copy datafile 124 format '+DATA2';
backup as copy datafile 125 format '+DATA2';
backup as copy datafile 126 format '+DATA2';
backup as copy datafile 127 format '+DATA2';
....
...
..
.
}

3) Run the RMAN script to make a copy of all datafiles on diskgroup DATA1 onto diskgroup DATA2.

$nohup rman target / cmdfile=/u01/app/oracle/admin/scripts/move_DATA1_to_DATA2.rman  log=/u01/app/oracle/admin/scripts/move_DATA1_To_DATA2.log &

4) Stop the managed standby database recovery.

SQL> alter database recover managed standby database cancel;

Database altered.

5) Switch the datafiles.

RMAN>switch datafile 124 to copy;
RMAN>switch datafile 125 to copy;
RMAN>switch datafile 126 to copy;
RMAN>switch datafile 127 to copy;
....
...
..
.

6) Start the recovery.

SQL> alter database recover managed standby database using current logfile disconnect from session;

Database altered.

5) Archive log switch, check applied from alert.log, etc.

SQL>alter system archive log current ;
System altered.

6) Unmount diskgroup DATA1 on all the nodes, except the current node.

SQL> alter diskgroup data1 dismount;
Diskgroup altered.

7) Drop the diskgroup DATA1.

SQL> drop diskgroup data1;
drop diskgroup data1
*
ERROR at line 1:
ORA-15039: diskgroup not dropped
ORA-15053: diskgroup "DATA1" contains existing files

SQL> drop diskgroup data1 including contents;
Diskgroup dropped.

8) Scan disks and confirm the ASM Disks of DATA1 have been removed from ASM disks.

$su - 

# oracleasm listdisks
ASM_DISK01
ASM_DISK02
ASM_DISK03
ASM_DISK04
ASM_DISK05
ASM_DISK06
ASM_DISK07
ASM_DISK08
ASM_FRA01
ASM_SSD1  <----- DATA1 diskgroup
OCR_VOTE01

# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Cleaning disk "ASM_SSD1"
Scanning system for ASM disks...

# oracleasm listdisks
ASM_DISK01
ASM_DISK02
ASM_DISK03
ASM_DISK04
ASM_DISK05
ASM_DISK06
ASM_DISK07
ASM_DISK08
ASM_FRA01
OCR_VOTE01