ORA-17628 RMAN-03009 duplicate target database for standby from active database

While run RMAN “duplicate target database for standby from active database nofilenamecheck” in 11g database, then got below errors:

...
..
.
set newname for datafile 3 to "/u02/oradata/TESTDBSTY/undotbs01.dbf";
...
..
.
channel ORA_DISK_1: starting datafile copy
input datafile file number=00003 name=/u02/oradata/TESTDB/undotbs01.dbf
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 01/15/2019 15:53:17
ORA-17628: Oracle error 19505 returned by remote Oracle server
continuing other job steps, job failed will not be re-run
...
..
.

Subscribe to get access

Read more of this content when you subscribe today.

SQL*Net message from client

A client asked about why it takes 14.96 seconds for below sql(1):

SQL>select * from dba_tables;
Elapsed: 00:00:14.96

While it takes only 0.59 second for this sql(2):

SQL>select count(*) from dba_tables;
Elapsed: 00:00:00.59

Let’s compare the tkprof summary from sql tracing for those two sqls:

Subscribe to get access

Read more of this content when you subscribe today.

Log file switch (checkpoint incomplete)

“Log file switch (checkpoint incomplete)” waiting events are seen in v$session_wait, and AWR report. This event is caused normally by :
1) Redo log files are too small.
2) More redo groups are needed.
3) Disk I/O is not fast enough.

Normally this issue can be addressed through adding more redo groups and/or increasing redo log file size.

Oracle recommends no more than five log switches per hour.

 

 

 

 

UNDO Tablespace Is Full

Database alert log has many errors like this :

ORA-30036: unable to extend segment by 8 
                                    in undo tablespace 'UNDOTBS2'

1.) Check UNDO tablespace extents.

SQL> select status, count(*) nb, sum(blocks) blocks 
     from dba_undo_extents 
     where TABLESPACE_NAME='UNDOTBS2' 
     group by status;

STATUS     NB         BLOCKS
---------  ---------- ----------
UNEXPIRED  383           3064
EXPIRED    365           2920
ACTIVE    8034        8382352

2) There is no user session consuming big undo space, but all SYS user only uses 1 undo block per session.

SQL> select s.sid,s.username,t.used_ublk
from v$session s, v$transaction t
where s.saddr = t.ses_addr
order by t.used_ublk desc;

SID        USERNAME    USED_UBLK
---------- ----------  ----------
 2601      SYS          1
 2835      SYS          1
 2838      SYS          1
 3072      SYS          1
 3075      SYS          1
 3311      SYS          1
 3544      SYS          1
...
..
.

3) Show which sessions used most undo space so far. Here SID=1 is MMON background process.

SQL>select s.sid, t.name, s.value
    from v$sesstat s, v$statname t
where s.statistic#=t.statistic#
and t.name='undo change vector size'
order by s.value desc; 

SID    NAME                      VALUE
------ ------------------------- ----------
1      undo change vector size   2256552752
3308   undo change vector size   1360455432
6609   undo change vector size    157619916
2604   undo change vector size     10521640
712    undo change vector size       322840
1656   undo change vector size       302768
1892   undo change vector size       302436
1184   undo change vector size       250904
..
..
.

4) Since there is no app user sessions consuming big UNDO space, the undo space should be consumed by database background processes, and they are not recorded in v$transaction view.

Bounced the instance,  used undo space is released.

SQL> select status, count(*) nb, sum(blocks) blocks
     from dba_undo_extents
     where TABLESPACE_NAME='UNDOTBS2'
     group by status;

STATUS        NB         BLOCKS
------------ ---------- ----------
UNEXPIRED    8462       8385392
EXPIRED      368        2944

After another 15 minutes.

 SQL>select status, count(*) nb, sum(blocks) blocks
     from dba_undo_extents
     where TABLESPACE_NAME='UNDOTBS2'
     group by status;

STATUS    NB         BLOCKS
--------- ---------- ----------
UNEXPIRED 405        20520
EXPIRED   461         7280

Same SQL with Significant Different Response Time on RAC Nodes

SITUATION

Same SQL query running on node 1 takes 2.5 times longer time than running on node 2.

Both Unix administrator and DBA have checked and confirmed the work load are same on both nodes. All client connections to RAC databases are load balanced through RAC service.

INVESTIGATIONS

1) Open a new secession, then run the same SQL on node 1. There is no GC events on node 1 and 2:

–check current session SID:

SQL> SELECT SID, SERIAL#
     FROM   V$SESSION
     WHERE  SID = Sys_Context('USERENV', 'SID'); 

SID        SERIAL#
---------- ----------
306         24423

— Check session stats:

SQL> SELECT a.username, c.name, b.value
     FROM   
            v$session a, 
            v$sesstat b, 
            v$statname c
WHERE a.sid = b.sid
and a.sid=306
AND b.statistic# = c.statistic#
order by name; 

USERNAME NAME                                    VALUE
 ------- --------------------------------------  ----------
SYS      CPU used by this session                2419
SYS      CPU used when call started              2418
..
.

SYS     gc CPU used by this session                 0
SYS     gc blocks compressed                        0
SYS     gc blocks corrupt                           0
SYS     gc blocks lost                              0
SYS     gc claim blocks lost                        0
SYS     gc cr block build time                      0
SYS     gc cr block flush time                      0
SYS     gc cr block receive time                    0
SYS     gc cr block send time                       0
SYS     gc cr blocks received                       0
SYS     gc cr blocks served                         0
SYS     gc current block flush time                 0
SYS     gc current block pin time                   0
SYS     gc current block receive time               0
SYS     gc current block send time                  0
SYS     gc current blocks received                  0
SYS     gc current blocks served                    0
SYS     gc force cr disk read                       0
SYS     gc kbytes saved                             0
SYS     gc kbytes sent                              0
SYS     gc local grants                             0
SYS     gc read wait failures                       0
SYS     gc read wait time                           0
SYS     gc read wait timeouts                       0
SYS     gc read waits                               0
SYS     gc reader bypass grants                     0
SYS     gc remote grants                            0

— node 2

USERNAME NAME                                  VALUE
------- -------------------------------------- ----------
SYS     CPU used by this session               1006
SYS     CPU used when call started             1007
..
.

SYS    gc CPU used by this session                0
SYS    gc blocks compressed                       0
SYS    gc blocks corrupt                          0
SYS    gc blocks lost                             0
SYS    gc claim blocks lost                       0
SYS    gc cr block build time                     0
SYS    gc cr block flush time                     0
SYS    gc cr block receive time                   0
SYS    gc cr block send time                      0
SYS    gc cr blocks received                      0
SYS    gc cr blocks served                        0
SYS    gc current block flush time                0
SYS    gc current block pin time                  0
SYS    gc current block receive time              0
SYS    gc current block send time                 0
SYS    gc current blocks received                 0
SYS    gc current blocks served                   0
SYS    gc force cr disk read                      0
SYS    gc kbytes saved                            0
SYS    gc kbytes sent                             0
SYS    gc local grants                            0
SYS    gc read wait failures                      0
SYS    gc read wait time                          0
SYS    gc read wait timeouts                      0
SYS    gc read waits                              0
SYS    gc reader bypass grants                    0
SYS    gc remote grants                           0

2) SET AUTOTRACE ON TRACEONLY | STATISTICS.

consistent gets” means reading from buffer cache.

SQL> SET AUTOTRACE ON STATISTICS
SQL> @sqlscript.sql

no rows selected

Statistics
----------------------------------------------------------
0       recursive calls
0       db block gets
3038815 consistent gets
0       physical reads
0       redo size
5076    bytes sent via SQL*Net to client
4264    bytes received via SQL*Net from client
1       SQL*Net roundtrips to/from client
6       sorts (memory)
0       sorts (disk)
0       rows processed

3)tkprof shows the Fetch time.

–node 1

call    count  cpu      elapsed  disk  query   current    rows
------- ------ -------- -------- ----- ------- ---------- ------
Parse   2      0.10     0.10     0     0       0          0
Execute 3      0.00     0.00     0     0       0          0
Fetch   1     24.29     24.55    0     3038773 0          0
------- ----- -------- -------- ---- ------- -------- ----------
total   6     24.40     24.65    0    3038773  0          0

— node 2

call    count  cpu      elapsed  disk  query   current    rows
------- ------ -------- -------- ----- ------- ---------- -------
Parse   2      0.04     0.04     0     0       0           0
Execute 3      0.00     0.00     0     0       0           0
Fetch   1      9.86     9.91     0    3038773  0           0
------- ----- -------- -------- ---- ------- -------- ----------
total   6      9.91     9.95     0    3038773  0           0

4) Run a shell script to confirm node 1 CPU is not performing good as node 2:

— node 1 ( 136 seconds )

$ date; i=0; while (( i < 10000000 )); do (( i ++ )); done; date

Thu Sep 27 14:51:58 AEST 2018
Thu Sep 27 14:54:14 AEST 2018

— node 2 ( 52 seconds )

$ date; i=0; while (( i < 10000000 )); do (( i ++ )); done; date

Thu Sep 27 14:52:10 AEST 2018
Thu Sep 27 14:53:02 AEST 2018

5) Shutdown both GI and RAC,  the issue still exists.

6) After rebooted the server, everything goes back  to normal.  Both  node 1 and node 2 perform the same.

CAUSES

So it should be OS kernel issue.