How To Manage the Cluster Health Monitor ( CHM ) Repository

Cluster Health Monitor ( CHM ) Repository size should be reviewed periodically to meet business needs and OCR/VOTE disk availability.

Where is Cluster Health Monitor (CHM) Repository ?

In 11.2, the CHM repository is stored in a Berkley Database . The default location of the CHM repository is $GI_HOME/crf/db.

In 12.1, the CHM repository is hosted in the Grid Infrastructure Management Repository (GIMR). The default location for GIMR is stored in the ASM diskgroup which stores the OCR and voting disk .

What is the recommended CHM data retention ?

Oracle Support recommends that the CHM repository be sized according to 72 hours ( 259,200 seconds )(three days) of data retention (e.g.., one weekend worth).

What is the minimum size of  CHM repository ?

For 11.2 GI, one day of data retention for each node requires  867 MB around. So the size of the CHM repository needed to retain 72 hours of data would be as follows:

~72 hours of CHM data retention = NumberOfNodes * 3Days * 867 MB

So for a 2 nodes cluster :

~72 hours of CHM data retention = 2 ( nodes ) * 3 ( days ) * 867 ( per day per node )(5202 MB)

For 12.1, one day of data retention for each node requires 750 MB around, so the size of the CHM repository needed to retain 72 hours of data would be as follows:

~72 hours of CHM data retention = NumberOfNodes * 3Days * 750 MB

So for a 2 node cluster

~72 hours of CHM data retention = 2 ( nodes ) * 3( days ) * 750 ( per day per node ) (4500 MB)

How to see the current CHM repository retention in seconds ?

[grid@racnode1 ~]$ /u01/app/12.1.0/grid/bin/oclumon manage -get repsize

CHM Repository Size = 272580 seconds

How to resize the CHM Repository retention ?

For 11.2 GI:

To determine the current location of the CHM repository:

$oclumon manage -get reppath
 To move and resize the CHM repository for 3 days retention for a 2 nodes cluster:

$ oclumon manage -repos reploc path* -maxspace 5202


* where path = directory path for new location of the CHM repository

For 12.1:

To resize the CHM Repository with one command to result in 3 days retention, eg., for a 2  nodes cluster:

$ oclumon manage -repos changerepossize 4500

How to verify the change in repository size has met the desired retention ?

In 12.1.0.1

$ oclumon manage -repos changeretentiontime 260000

This command does not make any changes. It is more like a “what-if”, ie., what if I wanted to change the retention time, how much space would be required ?

In 12.1.0.2 the syntax was changed and should be used as follows :

[grid@racnode1 ~]$ oclumon manage -repos checkretentiontime 260000

The Cluster Health Monitor repository can support the desired retention for 2 hosts

Grid Infrastructure Management Repository (GIMR)

Since -MGMT DB uses OCR/Voting disk by default, It is strongly recommended to check MGMT database tablespace usage. In some GI versions, we see MGMT database used all OCR/Voting disk.

What is Management Repository?

Grid Infrastructure Management Repository ( GIMR ) is a single instance database managed by GI. It will be up and running on one node in the cluster. If the hosting node is down, the database will be automatically failed over to other node.

What’s the purpose of Management Database?

GIMR will be the central repository to store Cluster Health Monitor (aka CHM/OS, ora.crf) and other data in 12c.

Where does Management Database store it’s datafiles?

In 12R1, by default, Management database uses the same shared storage as OCR/Voting File.

Can Management Database  be turned on/off  when you want ?

In 12.1.0.1, GIMR is optional, if Management Database is not selected to be configured during installation/upgrade OUI, all features (Cluster Health Monitor (CHM/OS) etc) that depend on it will be disabled.

Note: there’s no supported procedure to enable Management Database once the GI stack is configured

This changed in 12.1.0.2, it’s mandatory to have GIMR and must not be turned off

What are the resources associated with Management Database?

The following resources from “crsctl stat res -t” are for Management Database:

ora.mgmtdb
1 ONLINE ONLINE racnode1 Open,STABLE
ora.MGMTLSNR
1 ONLINE ONLINE racnode1 169.254.146.121 172.16.100.61,STABLE

On OS level, the database “-MGMTDB” and listener MGMTLSNR are for Management Database:

[grid@racnode1 ~]$ ps -ef| grep pmon_-MGMTDB
grid 4210 1 0 12:07 ? 00:00:00 mdb_pmon_-MGMTDB

[grid@racnode1 ~]$ ps -ef| grep MGMTLSNR
grid 4015 1 0 12:07 ? 00:00:00 /u01/app/12.1.0/grid/bin/tnslsnr MGMTLSNR -no_crs_notify -inherit

How to start/stop Management Database ?

If Management Database is  down for some reason, the following srvctl command can be used to start it:

Usage: srvctl start mgmtdb [-startoption <start_option>] [-node <node_name>]
Usage: srvctl start mgmtlsnr [-node <node_name>]
[grid@racnode1 ~]$ srvctl status MGMTDB
Database is enabled
Instance -MGMTDB is running on node racnode1

[grid@racnode1 ~]$ srvctl status MGMTLSNR
Listener MGMTLSNR is enabled
Listener MGMTLSNR is running on node(s): racnode1

[grid@racnode1 ~]$ srvctl stop MGMTDB
[grid@racnode1 ~]$ srvctl stop MGMTLSNR

[grid@racnode1 ~]$ srvctl status MGMTDB
Database is enabled
Database is not running.
[grid@racnode1 ~]$ srvctl status MGMTLSNR
Listener MGMTLSNR is enabled
Listener MGMTLSNR is not running

[grid@racnode1 ~]$ srvctl start MGMTDB
[grid@racnode1 ~]$ srvctl start MGMTLSNR
PRCC-1014 : MGMTLSNR was already running
PRCR-1004 : Resource ora.MGMTLSNR is already running
PRCR-1079 : Failed to start resource ora.MGMTLSNR
CRS-5702: Resource 'ora.MGMTLSNR' is already running on 'racnode1'

[grid@racnode1 ~]$ srvctl status MGMTDB
Database is enabled
Instance -MGMTDB is running on node racnode1

[grid@racnode1 ~]$ srvctl status MGMTLSNR
Listener MGMTLSNR is enabled
Listener MGMTLSNR is running on node(s): racnode1

How to access  to Management Database trace file etc?

Since the database name starts with “-” sign, “./” needs to be specified to avoid error: 

[grid@racnode1 grid]$ cd $ORACLE_BASE
[grid@racnode1 grid]$ cd diag
[grid@racnode1 rdbms]$ cd _mgmtdb
[grid@racnode1 _mgmtdb]$ ls -ltr
total 4
drwxr-x--- 16 grid oinstall 4096 Feb 16 21:59 -MGMTDB
-rw-r----- 1 grid oinstall 0 Feb 16 21:59 i_1.mif

[grid@racnode1 _mgmtdb]$ cd -MGMTDB
-bash: cd: -M: invalid option
cd: usage: cd [-L|[-P [-e]]] [dir]

[grid@racnode1 _mgmtdb]$ cd ./-MGMTDB
[grid@racnode1 trace]$ view -MGMTDB_mmon_26447.trc
VIM - Vi IMproved 7.4 (2013 Aug 10, compiled May 4 2014 20:16:04)
Unknown option argument: "-MGMTDB_mmon_26447.trc"
More info with: "vim -h"

[grid@racnode1 trace]$ view ./-MGMTDB_mmon_26447.trc

Is there any need to manually backup or tune Management Database?

As of now, there’s no such need.

How much (shared) disk space should be allocated for the Management Database?

For Oracle Cluster Registry (OCR) with external redundancy and the Grid Infrastructure Management Repository

Minimum:  At least 5.2 GB for the OCR volume that contains the Grid Infrastructure Management Repository (4.5 GB + 300 MB voting files + 400 MB OCR), plus 500 MB for each node for clusters greater than four nodes. For example, a six-node cluster allocation should be 6.2 GB.

Reference:  http://docs.oracle.com/database/121/CWLIN/storage.htm#CHDDCAHD