Fix Oracle Database Corrupt Blocks by Using RMAN

The following errors appear in alert log:

Thu Sep 23 20:06:34 2021
Hex dump of (file 23, block 774506) in trace file /u01/app/oracle/diag/rdbms/testdb/TETSDB/trace/TETSDB_ora_7558.trc
Corrupt block relative dba: 0x000bd16a (file 23, block 774506)
Bad check value found during backing up datafile
Data in bad block:
 type: 6 format: 2 rdba: 0x000bd16a
 last change scn: 0x071e.81e8b798 seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xb7980602
 check value in block header: 0xe202
 computed block checksum: 0x82fe
Reread of blocknum=774506, file=/u05/oradata/TETSDB/cdr_01.dbf. found same corrupt data
Reread of blocknum=774506, file=/u05/oradata/TETSDB/cdr_01.dbf. found same corrupt data
Reread of blocknum=774506, file=/u05/oradata/TETSDB/cdr_01.dbf. found same corrupt data
Reread of blocknum=774506, file=/u05/oradata/TETSDB/ctl_01.dbf. found same corrupt data
Reread of blocknum=774506, file=/u05/oradata/TETSDB/cdr_01.dbf. found same corrupt data

SOLUTION

The corrupted block can be fixed online without offline datafile or tablespace required.

double confirm the data corrution

Run dbverify utility

$dbv blocksize=8192 file=/u05/oradata/TETSDB/cdr_01.dbf
...
..
.
Corrupt block relative dba: 0x000bd16a (file 23, block 774506)
...
..
.
Total Pages Marked Corrupt : 1

Run RMAN utility

RMAN> connect target /
RMAN> run {
allocate channel c1 type disk;
allocate channel c2 type disk;
allocate channel c3 type disk;
allocate channel c4 type disk;
backup validate check logical database;
close channel c1;
close channel c2;
close channel c3;
close channel c4
}

OR

Just check specified file and blocks:

RMAN> validate check logical datafile 23 block 774505 to 774507;


Check block corruption from database view

SQL> select * from  v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
        23     774506          1                  0 CHECKSUM

Check the impacted object

  SQL> select 
              relative_fno, owner, segment_name, segment_type
         from
              dba_extents
        where
              file_id = 23
         and
              774506 between block_id and block_id + blocks - 1;
  
RELATIVE_FNO OWNER       SEGMENT_NAME       SEGMENT_TYPE
------------ ----------- ------------------ ------------------
        1024  TEST       CDR_IDX1        INDEX

fix the corruption

If it is an object like index, just rebuilt it even online rebuilding the index.

SQL> alter index TEST.CDR_IDX1 rebuild online parallel 8;

Index altered.

SQL>  alter index TEST.CDR_IDX1 parallel 1;

Index altered.
$ rman target / catalog rman/passwd@rman
connected to target database: TESTDB (DBID=1234567890)
connected to recovery catalog database

RMAN> run {
allocate channel ch1 TYPE 'SBT_TAPE';
SEND 'NSR_ENV=(NSR_SERVER=BKP_SERVER,NSR_CLIENT=DB_SERVER)';
BLOCKRECOVER DATAFILE 23 block 774506;
}
...
..
.
archived log for thread 1 with sequence 51992 is already on disk as file /fra/TESTDB/archivelog/2021_09_23/o1_mf_1_51992_jnrmdprf_.arc
archived log for thread 1 with sequence 51993 is already on disk as file /fra/TESTDB/archivelog/2021_09_23/o1_mf_1_51993_jnrpkppq_.arc
media recovery complete, elapsed time: 00:05:08
Finished recover at 23-SEP-21
released channel: ch1

RMAN>

Check the progressing:

SQL> select 
           username,sid, opname, target, sofar, totalwork, units, 
           to_char(start_time,'YYYYMMDD-HH24:MI:SS') StartTime, 
           time_remaining, message  
      from 
           V$SESSION_LONGOPS 
      where 
           time_remaining>60;

Finally check and confirm the corrupted block(s) have been fixed.

SQL> select * from  v$database_block_corruption;

no rows selected                                            

For some database versions, the record in v$database_block_corruption still exists , even there is no corrupted blocks.

To clear v$database_block_corruption orphan records:

RMAN> validate check logical datafile 23 block 774505to 774507;

Starting validate at 04-OCT-21
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=413 device type=DISK
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
input datafile file number=00023 name=/u05/oradata/TESTDB/CDR_01.dbf
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
23   OK     0              0            3               8013155974853
  File Name: /u05/oradata/TESTDB/CDR_01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              3
  Other      0              0

Finished validate at 04-OCT-21

The check v$database_block_corruption is cleared successfully.

SQL> select * from  v$database_block_corruption;

no rows selected

ORA-00600 internal error code arguments [ktbdchk1: bad dscn] after DG switchover

A bug caused error ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn]

ISSUE

After a successful switchover, there are a couple of ORA-00600 errors in alert.log .

Sat Jan 31 22:32:29 2015
Errors in file /u01/app/oracle/diag/rdbms/testdb/TETSDB4/trace/TESTDB4_ora_119810.trc (incident=112769):
ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/testdb/tetsDB4/incident/incdir_112769/TESTDB4_ora_119810_i112769.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

Open incident 112796 trace file, we see the current SQL is just a normal INSERT DML :

*** 2015-01-31 22:32:29.749
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=40c9ma8n8k17u) -----
insert into TEST_TABLE (EVENTSEQNO,
JOBSEQNO, EVENTTYPE, EVENTDESCRIPTION,
JOBSTATUS, EVENTDT, BUSINESSREFID, LOGROLE, EVENTLOGDT, TARGETCOMPNAME, TARGETOPNAME, TARGETENDPOINTREF, EVENTINFORMATION, MSGSIZE, RECORDCOUNT) values ( test_Seq.nextval,:1 ,:2 ,:3 ,:4 ,:5 ,:6 ,:7 ,:8 ,:9 ,:10 ,:11 ,:12 ,:13 ,:14 )

Run DBVERIFY and generates logs as below:

Page 15793939 failed with check code 6056
itl[45] has higher commit scn(0x0001.ff6a9a92) than block scn (0x0001.061dd22a)
Page 15794110 failed with check code 6056
itl[9] has higher commit scn(0x0001.ff71d940) than block scn (0x0001.0656d6b2)
Page 15794200 failed with check code 6056
itl[20] has higher commit scn(0x0001.ff7137bc) than block scn (0x0001.0651314d)
Page 15795600 failed with check code 6056
itl[46] has higher commit scn(0x0001.ff757e17) than block scn (0x0001.066372e8)
Page 15795925 failed with check code 6056

DBVERIFY - Verification complete

Total Pages Examined : 20185088
Total Pages Processed (Data) : 8205902
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 4629967
Total Pages Failing (Index): 2912
Total Pages Processed (Lob) : 5585318
Total Pages Failing (Lob) : 0
Total Pages Processed (Other): 125612
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 1638289
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Total Pages Encrypted : 0
Highest block SCN : 0 (0.0)

Threr is no physical or logical blocks corruption recorded in alert.log or in view v$database_block_corruption or in table INVALID_ROWS after using dbverify and analyze table VALIDATE STRUCTURE cascade and RMAN validate check.

Use sql to identify the affected indexes, then rebuild those indexes and run DBVERIFY, the “Total Pages Failing (Index)” number drops a little bit but not disappearing.¬†Here Page 15795925 is db block#.

SQL>SELECT tablespace_name, segment_type, owner, segment_name
FROM dba_extents
WHERE file_id =8 and 15795925 between block_id AND block_id + blocks-1;
SQL> ! cat /u01/app/oracle/product/11.2.0/dbhome_3/rdbms/admin/utlvalid.sql
rem
Rem Copyright (c) 1990, 1995, 1996, 1998 by Oracle Corporation
Rem NAME
REM UTLVALID.SQL
Rem FUNCTION
Rem Creates the default table for storing the output of the
Rem analyze validate command on a partitioned table
Rem NOTES
Rem MODIFIED
Rem syeung 06/17/98 - add subpartition_name
Rem mmonajje 05/21/96 - Replace timestamp col name with analyze_timestamp
Rem sbasu 05/07/96 - Remove echo setting
Rem ssamu 01/09/96 - new file utlvalid.sql
Rem

create table INVALID_ROWS (
owner_name varchar2(30),
table_name varchar2(30),
partition_name varchar2(30),
subpartition_name varchar2(30),
head_rowid rowid,
analyze_timestamp date
);
SQL> @/u01/app/oracle/product/11.2.0/dbhome_3/rdbms/admin/utlvalid.sql

Table created.
SQL> ANALYZE TABLE esb_tracker.esb_event VALIDATE STRUCTURE CASCADE online;

Table analyzed.

SQL> select * from INVALID_ROWS;

no rows selected
RMAN> validate check logical datafile 8;

Starting validate at 04-FEB-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=771 instance=TESTDB4 device type=DISK
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
input datafile file number=00008 name=+DG1/testdb/datafile/testuser_tb.583.845563563
channel ORA_DISK_1: validation complete, elapsed time: 00:09:45
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
8 OK 0 731827 25034752 4440865831
File Name: +DG1/testdb/datafile/testuser_tb.583.845563563
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 2477746
Index 0 1231472
Other 0 20593707

Finished validate at 04-FEB-15
SQL>Select * from v$database_block_corruption ;

no rows selected

Use the following query to confirm after rebuilding indexes, those impacted pages are returned to FREELIST. 

For example :

SQL>Select * 
      from dba_free_space 
     where file_id=8 and 15768657 between block_id AND block_id + blocks-1;

TABLESPACE_NAME  FILE_ID BLOCK_ID BYTES    BLOCKS RELATIVE_FNO
---------------- ------- -------- -------- ------ -----------
TESTUSER_TB     8        15761536 67108864  8192   1024

SOLUTION

To set hidden parameter “_ktb_debug_flags”=8 to make this bug fixing effective , though Oracle declared it has been fixed since 11.2.0.2 according to 1498717.1.

SQL>alter system set "_ktb_debug_flags"=8 scope=both sid='*';

Reference:

ORA-1555 / ORA-600 [ktbdchk1: bad dscn] ORA-600 [2663] in Physical Standby after switch-over (Doc ID 1498717.1)