Blog

How to run an ILOM Snapshot on a Sun/Oracle X86 System from the service processor CLI interface

ILOM snapshot is the first thing Oracle support normally asks for.

This post demonstrates how to run an ILOM snapshot on a cell server ( xx.xx.xx.xx ), on which one or more flashdisks have failed. This snapshot is required by Oracle support for diagnosing the issue.

1) Log in to the ILOM CLI interface.

# ssh xx.xx.xx.xx-ilom
Password:

Oracle(R) Integrated Lights Out Manager
Version 3.1.2.20.c r86871
Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved.

->

2) After the ‘->’ prompt, type command in below:

-> set /SP/diag/snapshot dataset=normal
Set 'dataset' to 'normal'

3) Type the following command:

Here password contains special characters, so  double quotes around the password are needed.

IP ( 10.10.10.17 ) could be either cell server or database server IP, as long as the user/password works on it.

-> set /SP/diag/snapshot dump_uri=sftp://testuser:"Password#"@10.10.10.17/tmp
Set 'dump_uri' to 'sftp://testuser:Password#@10.10.10.17/tmp'

4) cd to the snapshot directory and view the status. It shows “Running” initially.

-> cd /SP/diag/snapshot
/SP/diag/snapshot

-> show

/SP/diag/snapshot
 Targets:

Properties:
 dataset = normal
 dump_uri = (Cannot show property)
 encrypt_output = false
 result = Running

Commands:
 cd
 set
 show

5) Wait for the snapshot process to complete. It may take several minutes.
Continue to check until the status is shows ‘Snapshot Complete’
Do not use, access, view, copy or move the snapshot file until it has completed.

-> show

/SP/diag/snapshot
 Targets:

Properties:
 dataset = normal
 dump_uri = (Cannot show property)
 encrypt_output = false
 result = Collecting data into 
 sftp://testuser:*****@10.10.10.17/tmp/xx.xx.xx.xx-ilom_1152FMM0C1
 _2016-04-17T10-59-15.zip
 Snapshot Complete.
 Done.

Commands:
 cd
 set
 show

6) exit the CLI interface and find your snapshot in the directory you specified.

->exit
Connection to xx.xx.xx.xx-ilom closed.

$ssh testuser@10.10.10.17
testuser@10.10.10.17's password: 

$ls -ltr /tmp/xx.xx.xx.xx-ilom_1152FMM0C1_2016-04-17T10-59-15.zip

-rw-r--r-- 1 testuser dba 1129640 Apr 17 04:03 xx.xx.xx.xx-ilom_1152FMM0C1_2016-04-17T10-59-15.zip

Reference :

How to run an ILOM Snapshot on a Sun/Oracle X86 System (Doc ID 1448069.1)

How to Move OCR, Voting Disk File, ASM SPILE to a New Diskgroup ( 12.1.0.2 )

It is a good practice to have different dedicated disk groups for databases, OCR/VOTE and cluster health monitor( chm ) repository respectively.

Active Session History (ASH) Performed an Emergency Flush

Set proper ASH size to avoid emergency flushes.

The alert log with following message now and then :

"Active Session History (ASH) performed an emergency flush. This
 may mean that ASH is undersized. If emergency flushes are a 
recurring issue, you may consider increasing ASH size by setting 
the value of _ASH_SIZE to a sufficiently large value. Currently, 
ASH size is 134217728 bytes. Both ASH size and the total number 
of emergency flushes since instance startup can be monitored by 
running the following query:  

select total_size, awr_flush_emergency_count from v$ash_info;"

Query the current ASH size :

SQL> select total_size from v$ash_info;

TOTAL_SIZE
----------
 134217728

Current size is 134M according both the query and the messages in alert.log. We can add another 50% by running the following sql. The size should be appropriate if without any more similar messages in alert.log. Otherwise increase it up to maximum 254M.

SQL>alter system set "_ash_size"=200M;

How to Move 12c Grid Infrastructure Management Repository ( GIMR ) to Another Diskgroup

Grid Infrastructure Management Repository ( GIMR ) should be relocated to a disk group other than default disk group OCR/VOTE.

First, create a disk group GIMR for storing new GIMR data. At the moment, it is stored in disk group OCR_VOTE. Please refer to section “Create New Disks and Create New Diskgroup FRA” for creating a disk group.

1. Stop and disable ora.crf resource on every node

[root@racnode1 bin]# pwd
/u01/app/12.1.0/grid/bin

[root@racnode1 bin]# ./crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'racnode1'
CRS-2677: Stop of 'ora.crf' on 'racnode1' succeeded

[root@racnode1 bin]# ./crsctl modify res ora.crf -attr ENABLED=0 -init
[root@racnode2 bin]#

Do not stop ora.mgmtlsnr or ora.mgmtdb resource, otherwise Step 2 will fail with the following:

Oracle Grid Management database is running on node “racnode1”. Run dbca on node “racnode1” to delete the database.

2.  Use DBCA to delete the management database

As user grid, find on which node the mgmtdb is running on :

[grid@racnode1 ~]$ srvctl status mgmtdb
Database is enabled
Instance -MGMTDB is running on node racnode1

[grid@racnode1 ~]$ /u01/app/12.1.0/grid/bin/dbca -silent -deleteDatabase -sourceDB -MGMTDB
Connecting to database
4% complete
9% complete
14% complete
19% complete
23% complete
28% complete
47% complete
Updating network configuration files
48% complete
52% complete
Deleting instance and datafiles
76% complete
100% complete
Look at the log file "/u01/app/grid/cfgtoollogs/dbca/_mgmtdb.log" for further details.

[grid@racnode1 ~]$ cat "/u01/app/grid/cfgtoollogs/dbca/_mgmtdb.log"
The Database Configuration Assistant will delete the Oracle instance and datafiles for your database. All information in the database will be destroyed. Do you want to proceed?
Connecting to database
DBCA_PROGRESS : 4%
DBCA_PROGRESS : 9%
DBCA_PROGRESS : 14%
DBCA_PROGRESS : 19%
DBCA_PROGRESS : 23%
DBCA_PROGRESS : 28%
DBCA_PROGRESS : 47%
Updating network configuration files
DBCA_PROGRESS : 48%
DBCA_PROGRESS : 52%
Deleting instance and datafiles
DBCA_PROGRESS : 76%
DBCA_PROGRESS : 100%
Database deletion completed.
[grid@racnode1 ~]$

3. Recreate the 12.1.0.2 MGMTDB 

a. As Grid User on any node execute the following DBCA command with the desired <DG Name>:

[grid@racnode1 ~]$ /u01/app/12.1.0/grid/bin/dbca -silent -createDatabase -sid -MGMTDB -createAsContainerDatabase true -templateName MGMTSeed_Database.dbc -gdbName _mgmtdb -storageType ASM -diskGroupName +GIMR -datafileJarLocation /u01/app/12.1.0/grid/assistants/dbca/templates -characterset AL32UTF8 -autoGeneratePasswords -skipUserTemplateCheck
Registering database with Oracle Grid Infrastructure
5% complete
Copying database files
7% complete
9% complete
16% complete
23% complete
30% complete
37% complete
41% complete
Creating and starting Oracle instance
43% complete
48% complete
49% complete
50% complete
55% complete
60% complete
61% complete
64% complete
Completing Database Creation
68% complete
79% complete
89% complete
100% complete
Look at the log file "/u01/app/grid/cfgtoollogs/dbca/_mgmtdb/_mgmtdb0.log" for further details.
[grid@racnode1 ~]$

b. Create a PDB within the MGMTDB by using DBCA.

As Grid User on any node execute the following DBCA command:

NOTE: The CLUSTER_NAME needs to have any hyphens (“-“) replaced with underscores (“_”)

[grid@racnode1 ~]$ /u01/app/12.1.0/grid/bin/dbca -silent -createPluggableDatabase -sourceDB -MGMTDB -pdbName RACNODE_CLUSTER -createPDBFrom RMANBACKUP -PDBBackUpfile /u01/app/12.1.0/grid/assistants/dbca/templates/mgmtseed_pdb.dfb -PDBMetadataFile /u01/app/12.1.0/grid/assistants/dbca/templates/mgmtseed_pdb.xml -createAsClone true
Creating Pluggable Database
4% complete
12% complete
21% complete
38% complete
55% complete
85% complete
Completing Pluggable Database Creation
100% complete
Look at the log file "/u01/app/grid/cfgtoollogs/dbca/_mgmtdb/RACNODE_CLUSTER/_mgmtdb.log" for further details.
[grid@racnode1 ~]$

4.  Secure that the Management Database credential

[grid@racnode1 ~]$ srvctl status MGMTDB
Database is enabled
Instance -MGMTDB is running on node racnode1
[grid@racnode1 ~]$

[grid@racnode1 ~]$ /u01/app/12.1.0/grid/bin/mgmtca <--no output
[grid@racnode1 ~]$

5. Enable and start ora.crf resource on every node

[root@racnode1 ~]# /u01/app/12.1.0/grid/bin/crsctl modify res ora.crf -attr ENABLED=1 -init

[root@racnode1 ~]# /u01/app/12.1.0/grid/bin/crsctl start res ora.crf -init
CRS-2672: Attempting to start 'ora.crf' on 'racnode1'
CRS-2676: Start of 'ora.crf' on 'racnode1' succeeded
[root@racnode1 ~]#

[root@racnode2 bin]# /u01/app/12.1.0/grid/bin/crsctl modify res ora.crf -attr ENABLED=1 -init

[root@racnode2 bin]# /u01/app/12.1.0/grid/bin/crsctl start res ora.crf -init
CRS-2672: Attempting to start 'ora.crf' on 'racnode2'
CRS-2676: Start of 'ora.crf' on 'racnode2' succeeded

[root@racnode1 bin]# ./crsctl stat res -t -init
--------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------
ora.asm
 1 ONLINE ONLINE racnode1 Started,STABLE
ora.cluster_interconnect.haip
 1 ONLINE ONLINE racnode1 STABLE
ora.crf
 1 ONLINE ONLINE racnode1 STABLE
ora.crsd
 1 ONLINE ONLINE racnode1 STABLE
ora.cssd
 1 ONLINE ONLINE racnode1 STABLE
ora.cssdmonitor
 1 ONLINE ONLINE racnode1 STABLE
ora.ctssd
 1 ONLINE ONLINE racnode1 ACTIVE:0,STABLE
ora.diskmon
 1 OFFLINE OFFLINE STABLE
ora.evmd
 1 ONLINE ONLINE racnode1 STABLE
ora.gipcd
 1 ONLINE ONLINE racnode1 STABLE
ora.gpnpd
 1 ONLINE ONLINE racnode1 STABLE
ora.mdnsd
 1 ONLINE ONLINE racnode1 STABLE
ora.storage
 1 ONLINE ONLINE racnode1 STABLE
--------------------------------------------------------------------------------

6. Check management database and management listener

[grid@racnode1 ~]$ srvctl status MGMTDB
Database is enabled
Instance -MGMTDB is running on node racnode1

[grid@racnode1 ~]$ ps -eaf | grep tns
root 15 2 0 20:05 ? 00:00:00 [netns]
grid 1168 892 0 22:37 pts/0 00:00:00 grep --color=auto tns
grid 3974 1 0 20:06 ? 00:00:00 /u01/app/12.1.0/grid/bin/tnslsnr MGMTLSNR -no_crs_notify -inherit
grid 4064 1 0 20:06 ? 00:00:00 /u01/app/12.1.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
grid 4092 1 0 20:06 ? 00:00:00 /u01/app/12.1.0/grid/bin/tnslsnr LISTENER_SCAN2 -no_crs_notify -inherit
grid 4103 1 0 20:06 ? 00:00:00 /u01/app/12.1.0/grid/bin/tnslsnr LISTENER_SCAN3 -no_crs_notify -inherit

[grid@racnode1 ~]$ lsnrctl status MGMTLSNR
LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 14-MAR-2016 22:38:03

Copyright (c) 1991, 2014, Oracle. All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=MGMTLSNR)))
STATUS of the LISTENER
------------------------
Alias MGMTLSNR
Version TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date 14-MAR-2016 20:06:33
Uptime 0 days 2 hr. 31 min. 29 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/12.1.0/grid/network/admin/listener.ora
Listener Log File /u01/app/grid/diag/tnslsnr/racnode1/mgmtlsnr/alert/log.xml
Listening Endpoints Summary...
 (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=MGMTLSNR)))
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.16.100.61)(PORT=1521)))
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=169.254.146.121)(PORT=1521)))
Services Summary...
Service "-MGMTDBXDB" has 1 instance(s).
 Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "_mgmtdb" has 1 instance(s).
 Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "racnode_cluster" has 1 instance(s).
 Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
The command completed successfully

How To Manage the Cluster Health Monitor ( CHM ) Repository

Cluster Health Monitor ( CHM ) Repository size should be reviewed periodically to meet business needs and OCR/VOTE disk availability.

Where is Cluster Health Monitor (CHM) Repository ?

In 11.2, the CHM repository is stored in a Berkley Database . The default location of the CHM repository is $GI_HOME/crf/db.

In 12.1, the CHM repository is hosted in the Grid Infrastructure Management Repository (GIMR). The default location for GIMR is stored in the ASM diskgroup which stores the OCR and voting disk .

What is the recommended CHM data retention ?

Oracle Support recommends that the CHM repository be sized according to 72 hours ( 259,200 seconds )(three days) of data retention (e.g.., one weekend worth).

What is the minimum size of  CHM repository ?

For 11.2 GI, one day of data retention for each node requires  867 MB around. So the size of the CHM repository needed to retain 72 hours of data would be as follows:

~72 hours of CHM data retention = NumberOfNodes * 3Days * 867 MB

So for a 2 nodes cluster :

~72 hours of CHM data retention = 2 ( nodes ) * 3 ( days ) * 867 ( per day per node )(5202 MB)

For 12.1, one day of data retention for each node requires 750 MB around, so the size of the CHM repository needed to retain 72 hours of data would be as follows:

~72 hours of CHM data retention = NumberOfNodes * 3Days * 750 MB

So for a 2 node cluster

~72 hours of CHM data retention = 2 ( nodes ) * 3( days ) * 750 ( per day per node ) (4500 MB)

How to see the current CHM repository retention in seconds ?

[grid@racnode1 ~]$ /u01/app/12.1.0/grid/bin/oclumon manage -get repsize

CHM Repository Size = 272580 seconds

How to resize the CHM Repository retention ?

For 11.2 GI:

To determine the current location of the CHM repository:

$oclumon manage -get reppath
 To move and resize the CHM repository for 3 days retention for a 2 nodes cluster:

$ oclumon manage -repos reploc path* -maxspace 5202


* where path = directory path for new location of the CHM repository

For 12.1:

To resize the CHM Repository with one command to result in 3 days retention, eg., for a 2  nodes cluster:

$ oclumon manage -repos changerepossize 4500

How to verify the change in repository size has met the desired retention ?

In 12.1.0.1

$ oclumon manage -repos changeretentiontime 260000

This command does not make any changes. It is more like a “what-if”, ie., what if I wanted to change the retention time, how much space would be required ?

In 12.1.0.2 the syntax was changed and should be used as follows :

[grid@racnode1 ~]$ oclumon manage -repos checkretentiontime 260000

The Cluster Health Monitor repository can support the desired retention for 2 hosts