Oracle – Page 123 – Make DBA Life Easy

How To Format Empty Failing (Index) Pages Marked by DBVERIFY

Formatting empty failing pages to make clean DBVERIFY results.

ISSUE

Following another post “ORA-00600 internal error code arguments [ktbdchk1: bad dscn] after DG switchover”, there is another smaller file number 8 with 22 indexes pages identified by dbverify as failing pages as below:

Total Pages Failing (Index): 22

Here we are going to demonstrate how to format those failing index pages.

SOLUTION

Identify the index segments and rebuild them online accordingly.

From the dbverify log below and run the query to get index segments owner and names for rebuilding:

itl[20] has higher commit scn(0x0001.ff7137bc) than block scn (0x0001.0651314d)
Page 15795600 failed with check code 6056

SQL>SELECT tablespace_name, segment_type, owner, segment_name
FROM dba_extents
WHERE file_id =8 and 15795600 between block_id AND block_id + blocks-1;

After rebuilt all the impacted indexes and run dbverify and sql query above, all the failing pages(blocks) are returned to freelist queue. They can be verified by query against dba_free_space view as well.

SQL> select name, bytes/1024/1024/1024 from v$datafile where file#=8;

NAME                                          BYTES/1024/1024/1024
--------------------------------------------  --------------------
+DG1/testdb/datafile/users_tbl.583.845563563  188

Create a table to use all free space, big pctfree to speed up inserting to make use of all available free space.

SQL> create table EMPTY( n number ) tablespace users_tbl pctfree 99;

Table created.

To get min/max free block size.

SQL> select TABLESPACE_NAME,FILE_ID,BLOCKS,count(*)
from dba_free_space
where TABLESPACE_NAME='USERS_TBL'
group by TABLESPACE_NAME,FILE_ID,BLOCKS 
order by 2;

TABLESPACE_NAME BLOCKS COUNT(*)
-------------- ------- ----------
USERS_TBL         8       3
USERS_TBL        16       3
USERS_TBL        32       2
USERS_TBL        40       1
USERS_TBL        48       2
.........
..
USERS_TBL     75904       1

21 rows selected.

To get how many times of minimum free space blocks in total available free space.

SQL> select sum(BLOCKS)/8  
       from dba_free_space
      where TABLESPACE_NAME='USERS_TBL';

SUM(BLOCKS)/8
-----------
66944

Turn datafile autoextend off.

SQL> alter database datafile '+DG1/testdb/datafile/USERS_TBL.583.845563563' autoextend off;

Database altered.

Allocate all available free space to table EMPTY with smallest freespace blcoks.

SQL>BEGIN
      for i in 1..66944 loop
        EXECUTE IMMEDIATE 'alter table EMPTY allocate extent ( size 64K) ';
      end loop;
    END;
    /

Insert data into table EMPTY until encountering unable to allocate space ORA- error.

SQL> Begin
       FOR i IN 1..100000000 loop
         for j IN 1..10000 loop
           Insert into EMPTY VALUES(i+j);
        end loop;
        commit;
      END LOOP;
   END;
  /

Now drop table to release the free space.

SQL> drop table EMPTY;

Table dropped.

Turn datafile autoextend on.

SQL> alter database datafile '+DG1/tetsdb/datafile/users_tbl.583.845563563' autoextend on;

Database altered.

Run DBVEIFY to confirm all pages are all right now with ZERO pages failing.

DBVERIFY - Verification starting : FILE = +DG1/testdb/datafile/users_tbl.583.845563563

DBVERIFY - Verification complete
Total Pages Examined : 26869760
Total Pages Processed (Data) : 4391230
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 682945
Total Pages Failing (Index): 0
Total Pages Processed (Lob) : 20668189
Total Pages Failing (Lob) : 0
Total Pages Processed (Other): 49949
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 1077447
Total Pages Marked Corrupt : 0

ORA-00600 internal error code arguments [ktbdchk1: bad dscn] after DG switchover

A bug caused error ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn]

ISSUE

After a successful switchover, there are a couple of ORA-00600 errors in alert.log .

Sat Jan 31 22:32:29 2015
Errors in file /u01/app/oracle/diag/rdbms/testdb/TETSDB4/trace/TESTDB4_ora_119810.trc (incident=112769):
ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/testdb/tetsDB4/incident/incdir_112769/TESTDB4_ora_119810_i112769.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

Open incident 112796 trace file, we see the current SQL is just a normal INSERT DML :

*** 2015-01-31 22:32:29.749
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=40c9ma8n8k17u) -----
insert into TEST_TABLE (EVENTSEQNO,
JOBSEQNO, EVENTTYPE, EVENTDESCRIPTION,
JOBSTATUS, EVENTDT, BUSINESSREFID, LOGROLE, EVENTLOGDT, TARGETCOMPNAME, TARGETOPNAME, TARGETENDPOINTREF, EVENTINFORMATION, MSGSIZE, RECORDCOUNT) values ( test_Seq.nextval,:1 ,:2 ,:3 ,:4 ,:5 ,:6 ,:7 ,:8 ,:9 ,:10 ,:11 ,:12 ,:13 ,:14 )

Run DBVERIFY and generates logs as below:

Page 15793939 failed with check code 6056
itl[45] has higher commit scn(0x0001.ff6a9a92) than block scn (0x0001.061dd22a)
Page 15794110 failed with check code 6056
itl[9] has higher commit scn(0x0001.ff71d940) than block scn (0x0001.0656d6b2)
Page 15794200 failed with check code 6056
itl[20] has higher commit scn(0x0001.ff7137bc) than block scn (0x0001.0651314d)
Page 15795600 failed with check code 6056
itl[46] has higher commit scn(0x0001.ff757e17) than block scn (0x0001.066372e8)
Page 15795925 failed with check code 6056

DBVERIFY - Verification complete

Total Pages Examined : 20185088
Total Pages Processed (Data) : 8205902
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 4629967
Total Pages Failing (Index): 2912
Total Pages Processed (Lob) : 5585318
Total Pages Failing (Lob) : 0
Total Pages Processed (Other): 125612
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 1638289
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Total Pages Encrypted : 0
Highest block SCN : 0 (0.0)

Threr is no physical or logical blocks corruption recorded in alert.log or in view v$database_block_corruption or in table INVALID_ROWS after using dbverify and analyze table VALIDATE STRUCTURE cascade and RMAN validate check.

Use sql to identify the affected indexes, then rebuild those indexes and run DBVERIFY, the “Total Pages Failing (Index)” number drops a little bit but not disappearing. Here Page 15795925 is db block#.

SQL>SELECT tablespace_name, segment_type, owner, segment_name
FROM dba_extents
WHERE file_id =8 and 15795925 between block_id AND block_id + blocks-1;

SQL> ! cat /u01/app/oracle/product/11.2.0/dbhome_3/rdbms/admin/utlvalid.sql
rem
Rem Copyright (c) 1990, 1995, 1996, 1998 by Oracle Corporation
Rem NAME
REM UTLVALID.SQL
Rem FUNCTION
Rem Creates the default table for storing the output of the
Rem analyze validate command on a partitioned table
Rem NOTES
Rem MODIFIED
Rem syeung 06/17/98 - add subpartition_name
Rem mmonajje 05/21/96 - Replace timestamp col name with analyze_timestamp
Rem sbasu 05/07/96 - Remove echo setting
Rem ssamu 01/09/96 - new file utlvalid.sql
Rem

create table INVALID_ROWS (
owner_name varchar2(30),
table_name varchar2(30),
partition_name varchar2(30),
subpartition_name varchar2(30),
head_rowid rowid,
analyze_timestamp date
);

SQL> @/u01/app/oracle/product/11.2.0/dbhome_3/rdbms/admin/utlvalid.sql

Table created.

SQL> ANALYZE TABLE esb_tracker.esb_event VALIDATE STRUCTURE CASCADE online;

Table analyzed.

SQL> select * from INVALID_ROWS;

no rows selected

RMAN> validate check logical datafile 8;

Starting validate at 04-FEB-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=771 instance=TESTDB4 device type=DISK
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
input datafile file number=00008 name=+DG1/testdb/datafile/testuser_tb.583.845563563
channel ORA_DISK_1: validation complete, elapsed time: 00:09:45
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
8 OK 0 731827 25034752 4440865831
File Name: +DG1/testdb/datafile/testuser_tb.583.845563563
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 2477746
Index 0 1231472
Other 0 20593707

Finished validate at 04-FEB-15

SQL>Select * from v$database_block_corruption ;

no rows selected

Use the following query to confirm after rebuilding indexes, those impacted pages are returned to FREELIST.

For example :

SQL>Select * 
      from dba_free_space 
     where file_id=8 and 15768657 between block_id AND block_id + blocks-1;

TABLESPACE_NAME  FILE_ID BLOCK_ID BYTES    BLOCKS RELATIVE_FNO
---------------- ------- -------- -------- ------ -----------
TESTUSER_TB     8        15761536 67108864  8192   1024

SOLUTION

To set hidden parameter “_ktb_debug_flags”=8 to make this bug fixing effective , though Oracle declared it has been fixed since 11.2.0.2 according to 1498717.1.

SQL>alter system set "_ktb_debug_flags"=8 scope=both sid='*';

Reference:

ORA-1555 / ORA-600 [ktbdchk1: bad dscn] ORA-600 [2663] in Physical Standby after switch-over (Doc ID 1498717.1)

ACFS Filesystem Resource Offline and Volume Device is Missing in /dev/asm/

Always enable Oracle ADVM volumes in mounted disk groups to make acfs filesystem available.

ISSUE

After rebooted Oracle clusterware, acfs filesystem resource is offline and correspondent volume device is missing in /dev/asm/ on node3 :

ora.SATA.ORADBA.advm
 ONLINE ONLINE racnode1 Volume device /dev/a
 sm/oradba-241 is onl
 ine,STABLE
 ONLINE ONLINE racnode2 Volume device /dev/a
 sm/oradba-241 is onl
 ine,STABLE
 OFFLINE OFFLINE racnode3 Volume device /dev/a
 sm/oradba-241 is off
 line,STABLE
 ONLINE ONLINE racnode4 Volume device /dev/a
 sm/oradba-241 is onl
 ine,STABLE

ASMCMD> volinfo -G SATA -a
Diskgroup Name: SATA
 Volume Name: ORADBA
 Volume Device: /dev/asm/oradba-241
 State: DISABLED
 Size (MB): 307200
 Resize Unit (MB): 32
 Redundancy: UNPROT
 Stripe Columns: 4
 Stripe Width (K): 128
 Usage: ACFS
 Mountpath: /oradba

The ACFS volume “ORADBA” was DISABLED.

SOLUTION

ASMCMD> volenable -G SATA ORADBA
ASMCMD> volinfo -G SATA -a
Diskgroup Name: SATA
 Volume Name: ORADBA
 Volume Device: /dev/asm/oradba-241
 State: ENABLED
 Size (MB): 307200
 Resize Unit (MB): 32
 Redundancy: UNPROT
 Stripe Columns: 4
 Stripe Width (K): 128
 Usage: ACFS
 Mountpath: /oradba

ASMCMD> volstat -G SATA
DISKGROUP NUMBER / NAME: 5 / SATA
---------------------------------------
 VOLUME_NAME
 READS BYTES_READ READ_TIME READ_ERRS
 WRITES BYTES_WRITTEN WRITE_TIME WRITE_ERRS
 -------------------------------------------------------------
 ORADBA
 203 114688 251 0
 11 12800 25 0

# mount -t acfs /dev/asm/oradba-241 /oradba

# df -h /oradba

Filesystem          Size Used Avail Use% Mounted on
/dev/asm/oradba-241 300G 203G 98G    68% /oradba

Also attach Oracle ADVM Volume Manager commands for reference:

Command Description
------------------------------------------------------------------
volcreate Creates an Oracle ADVM volume in a disk group.
voldelete Deletes an Oracle ADVM volume.
voldisable Disables Oracle ADVM volumes in mounted disk groups.
volenable Enables Oracle ADVM volumes in mounted disk groups.
volinfo Displays information about Oracle ADVM volumes.
volresize Resizes an Oracle ADVM volume.
volset Sets attributes of an Oracle ADVM volume in mounted disk groups.
volstat Reports volume I/O statistics.

CRS-2674 Clusterware Network Resource ora.net1.network Could not Start

Network configurations should be consistent between OS and Clusterware.

ISSUES

After network changes for public interface bond0 netmask from 255.255.255.0 to 255.255.254.0, clusterware network resource ora.net1.network can’t be started. Hence VIP,SCANS,local listeners, SCAN listeners, Services are down.

[+ASM1] grid@racnode1:/u01/app/11.2.0.4/grid/bin$ ./srvctl start nodeapps -n racnode1

PRCR-1013 : Failed to start resource ora.net1.network
PRCR-1064 : Failed to start resource ora.net1.network on node racnode1
CRS-2674: Start of 'ora.net1.network' on 'racnode1' failed

<GRID_HOME>/log/racnode1/crsd/crsd.log shows:

2015-01-29 17:25:02.552: [ CRSPE][1170397536]{1:2737:1255} CRS-2674: Start of 'ora.net1.network' on 'racnode1' failed
2015-01-29 17:25:02.552: [ CRSRPT][1172498784]{1:2737:1255} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.net1.network
2015-01-29 17:25:02.552: [UiServer][1172498784]{1:2737:1255} Container [ Name: ORDER
MESSAGE:
TextMessage[CRS-2674: Start of 'ora.net1.network' on 'racnode1' failed]
MSGTYPE:
TextMessage[1]
OBJID:
TextMessage[ora.LISTENER_SCAN1.lsnr]
WAIT:
TextMessage[0]
]

From srvctl query :

[+ASM1] grid@racnode1:/u01/app/11.2.0.4/grid/log/racnode1$ srvctl config nodeapps
Network exists: 1/10.10.12.0/255.255.255.0/bond0, type static
VIP exists: /racnode1-vip/10.10.12.130/10.10.12.0/255.255.255.0/bond0, hosting node racnode1

GSD exists
ONS exists: Local port 6100, remote port 6200, EM port 2016

[+ASM1] grid@racnode1:/u01/app/11.2.0.4/grid/bin$ srvctl config nodeapps -a

Network exists: 1/10.10.12.0/255.255.255.0/bond0, type static
VIP exists: /racnode1-vip/10.10.12.130/10.10.12.0/255.255.255.0/bond0, hosting node racnode1

[+ASM1] grid@racnode1:~$ srvctl config network
 Network exists: 1/10.10.12.0/255.255.255.0/bond0, type static

From ocrdump, the netmask is “255.255.255.0”

[+ASM1] grid@racnode1:/u01/app/11.2.0.4/grid/srvm$ ocrdump ~/myfile
 ... 
... 
... 
[DATABASE.NODEAPPS.racnode1.VIP.NETMASK] ORATEXT : 255.255.255.0

[+ASM1] grid@racnode1:~$ crsctl stat res ora.net1.network -f
NAME=ora.net1.network
TYPE=ora.network.type
....
....
USR_ORA_IF=bond0
USR_ORA_NETMASK=255.255.255.0
USR_ORA_SUBNET=10.10.12.0
VERSION=11.2.0.2.0

From ifconfig, the netmask has been changed to “255.255.254.0”.

[+ASM1] grid@racnode1:/u01/app/11.2.0.4/grid/log/racnode1$ /sbin/ifconfig -a
bond0 Link encap:Ethernet HWaddr 00:21:5E:97:22:D8
inet addr:10.10.12.120 Bcast:10.10.13.255 Mask:255.255.254.0

[+ASM1] grid@racnode1:~$ cat /etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
ONBOOT=yes
BOOTPROTO=none
IPADDR=10.10.12.120
GATEWAY=10.10.12.1
NETMASK=255.255.254.0
USERCTL=no
TYPE=Ethernet

SOLUTION

In this situation, we need change OCR entry NETMASK to match the OS setting. This command takes several minutes, so be patient.
Once network resource is modified, node VIP netmask will be changed automatically. After this, clusterware network resource ora.net1.network ,VIP,SCANS,local listeners, SCAN listeners and Services are all up.

[root@racnode1 bin]# ./srvctl modify network -k 1 -S 10.10.12.0/255.255.254.0/bond0

Above command can also be used to modify subnet IP if it is set wrongly, for example:

# srvctl modify network -k 1 -S 10.10.12.0/255.255.254.0/bond0

if CRS is before 11.2.0.2, Run this command as root user as well:

# srvctl modify nodeapps -n racnode1 -A racnode1-vip/255.255.254.0/bond0

Use crsctl stat res -t to check all resources are up accordingly.

[+ASM1] grid@racnode1:/u01/app/11.2.0.4/grid/bin$ srvctl config nodeapps -a
Network exists: 1/10.10.12.0/255.255.254.0/bond0, type static
VIP exists: /racnode1-vip/10.10.12.130/10.10.12.0/255.255.254.0/bond0, hosting node racnode1

[+ASM1] grid@racnode1:/u01/app/11.2.0.4/grid/bin$ srvctl config network
Network exists: 1/10.10.12.0/255.255.254.0/bond0, type static

Please note:

If OCR entry is right, then network/system administrator only need modify NETMASK for bond0 physical interface in “/etc/sysconfig/network-scripts/ifcfg-bond0” to match OCR entries.

Reference :

How to Modify Public Network Information including VIP in Oracle Clusterware (Doc ID 276434.1)
CRS-2674 Clusterware Network Resource ora.net1.network Could not Start (Doc ID 1270186.1)

OID/OVD Metrics Unavailable in Enterprise Manager 12c Cloud Control

To monitor targets, monitoring Credentials should be set up correctly.

ISSUE

OID/OVD metrics are unavailable on OEM 12c Cloud Control. But other targets metrics are shown fine . The incident is created by OID/OVD instance with message:

The following exception has occurred:
Can’t resolve a non-optional query descriptor property [password] (password)

SOLUTION

The “Monitoring Credentials” is not configured or invalid. Set the Monitoring Credentials in EM Cloud Control, via Setup > Security > Monitoring Credentials:

1) Select “Oracle Internet Directory”

2) Click “Manage Monitoring Credential”

3) Select “LDAP Credentials” for the respective OID

4) Click “Set Credential”

5) Enter the password/confirm password.