ORA-12516 from “oclumon manage -repos ” command

SYMPTOM

While running oclumon command in 12.2 GI to check CHM retention, and get ORA-12516 error:

$ oclumon manage -repos checkretentiontime 259200
Failed change retention. Error returned ORA-12516: TNS:listener could not find available handler with matching protocol stack

INVESTIGATION

Check MGMTLSNR is enabled and running

$ srvctl status mgmtlsnr
Listener MGMTLSNR is enabled
Listener MGMTLSNR is running on node(s): racnode1

Services are all registered on MGMTLSNR

Both private and private HAIP are registered in listener.

$ lsnrctl status MGMTLSNR
...
..
.
Listening Endpoints Summary…
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=MGMTLSNR)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.1.1.11)(PORT=1526)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=169.254.105.132)(PORT=1526)))
..
.
Service "_mgmtdb" has 1 instance(s).
Instance "-MGMTDB", status READY, has 1 handler(s) for this service…
Service "gimr_dscrep_10" has 1 instance(s).
Instance "-MGMTDB", status READY, has 1 handler(s) for this service…
The command completed successfully

MGMTDB is DISABLED

$ srvctl status mgmtdb
Database is disabled
Instance -MGMTDB is running on node racnode1

strace log

Log shows private IP access issue.

66684 [00007f94efff1687] getsockname(50, {sa_family=AF_INET, sin_port=htons(48322), sin_addr=inet_addr("169.254.105.132")}, [16]) = 0
66684 [00007f94efff1657] getpeername(50, {sa_family=AF_INET, sin_port=htons(61021), sin_addr=inet_addr("169.254.105.132")}, [16]) = 0
66684 [00007f94f2603aeb] recvfrom(50, 0x1a63888, 10240, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
66684 [00007f94f2603aeb] recvfrom(50, 0x1a63888, 10240, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
66684 [00007f94f2603aeb] recvfrom(50, 0x1a63888, 10240, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
66684 [00007f94f2603aeb] recvfrom(50, 0x1a63888, 10240, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
66684 [00007f94f2603aeb] recvfrom(50, 0x1a63888, 10240, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)

LOCAL_LISTENER

local_listener shows without private IP.

SQL> show parameter local_listener

NAME                 TYPE     VALUE
-------------------  ------  -------------------------------
local_listener       string   (ADDRESS=(PROTOCOL=TCP)(HOST=
                                     10.1.1.11)(PORT=1526))

CAUSES

local_listener includes private IP only, but private HAIP is missing.

SOLUTION

ENABLE MGMTDB

$ srvctl enable mgmtdb

$ srvctl status mgmtdb
Database is enabled
Instance -MGMTDB is running on node racnode1

Add private HAIP into local_listener

SQL> alter system set local_listener='(ADDRESS=(PROTOCOL=TCP)(HOST=10.1.1.11)(PORT=1526))','(ADDRESS=(PROTOCOL=TCP)(HOST=169.254.105.132)(PORT=1526))' scope=both;

System altered.

SQL> show parameter local_listener

NAME          TYPE    VALUE
------------- ------- ------------------------------
local_listener string (ADDRESS=(PROTOCOL=TCP)(HOST=10.1.1.11)
                      (PORT=1526)), (ADDRESS=(PROTOCOL=TCP)
                      (HOST=169.254.105.132)(PORT=1526))

Run oclumon command again successfully.

$ oclumon manage -repos checkretentiontime 259200
The Cluster Health Monitor repository can support the desired retention for 2 hosts

Grid Infrastructure Management Repository (GIMR)

Since -MGMT DB uses OCR/Voting disk by default, It is strongly recommended to check MGMT database tablespace usage. In some GI versions, we see MGMT database used all OCR/Voting disk.

What is Management Repository?

Grid Infrastructure Management Repository ( GIMR ) is a single instance database managed by GI. It will be up and running on one node in the cluster. If the hosting node is down, the database will be automatically failed over to other node.

What’s the purpose of Management Database?

GIMR will be the central repository to store Cluster Health Monitor (aka CHM/OS, ora.crf) and other data in 12c.

Where does Management Database store it’s datafiles?

In 12R1, by default, Management database uses the same shared storage as OCR/Voting File.

Can Management Database  be turned on/off  when you want ?

In 12.1.0.1, GIMR is optional, if Management Database is not selected to be configured during installation/upgrade OUI, all features (Cluster Health Monitor (CHM/OS) etc) that depend on it will be disabled.

Note: there’s no supported procedure to enable Management Database once the GI stack is configured

This changed in 12.1.0.2, it’s mandatory to have GIMR and must not be turned off

What are the resources associated with Management Database?

The following resources from “crsctl stat res -t” are for Management Database:

ora.mgmtdb
1 ONLINE ONLINE racnode1 Open,STABLE
ora.MGMTLSNR
1 ONLINE ONLINE racnode1 169.254.146.121 172.16.100.61,STABLE

On OS level, the database “-MGMTDB” and listener MGMTLSNR are for Management Database:

[grid@racnode1 ~]$ ps -ef| grep pmon_-MGMTDB
grid 4210 1 0 12:07 ? 00:00:00 mdb_pmon_-MGMTDB

[grid@racnode1 ~]$ ps -ef| grep MGMTLSNR
grid 4015 1 0 12:07 ? 00:00:00 /u01/app/12.1.0/grid/bin/tnslsnr MGMTLSNR -no_crs_notify -inherit

How to start/stop Management Database ?

If Management Database is  down for some reason, the following srvctl command can be used to start it:

Usage: srvctl start mgmtdb [-startoption <start_option>] [-node <node_name>]
Usage: srvctl start mgmtlsnr [-node <node_name>]
[grid@racnode1 ~]$ srvctl status MGMTDB
Database is enabled
Instance -MGMTDB is running on node racnode1

[grid@racnode1 ~]$ srvctl status MGMTLSNR
Listener MGMTLSNR is enabled
Listener MGMTLSNR is running on node(s): racnode1

[grid@racnode1 ~]$ srvctl stop MGMTDB
[grid@racnode1 ~]$ srvctl stop MGMTLSNR

[grid@racnode1 ~]$ srvctl status MGMTDB
Database is enabled
Database is not running.
[grid@racnode1 ~]$ srvctl status MGMTLSNR
Listener MGMTLSNR is enabled
Listener MGMTLSNR is not running

[grid@racnode1 ~]$ srvctl start MGMTDB
[grid@racnode1 ~]$ srvctl start MGMTLSNR
PRCC-1014 : MGMTLSNR was already running
PRCR-1004 : Resource ora.MGMTLSNR is already running
PRCR-1079 : Failed to start resource ora.MGMTLSNR
CRS-5702: Resource 'ora.MGMTLSNR' is already running on 'racnode1'

[grid@racnode1 ~]$ srvctl status MGMTDB
Database is enabled
Instance -MGMTDB is running on node racnode1

[grid@racnode1 ~]$ srvctl status MGMTLSNR
Listener MGMTLSNR is enabled
Listener MGMTLSNR is running on node(s): racnode1

How to access  to Management Database trace file etc?

Since the database name starts with “-” sign, “./” needs to be specified to avoid error: 

[grid@racnode1 grid]$ cd $ORACLE_BASE
[grid@racnode1 grid]$ cd diag
[grid@racnode1 rdbms]$ cd _mgmtdb
[grid@racnode1 _mgmtdb]$ ls -ltr
total 4
drwxr-x--- 16 grid oinstall 4096 Feb 16 21:59 -MGMTDB
-rw-r----- 1 grid oinstall 0 Feb 16 21:59 i_1.mif

[grid@racnode1 _mgmtdb]$ cd -MGMTDB
-bash: cd: -M: invalid option
cd: usage: cd [-L|[-P [-e]]] [dir]

[grid@racnode1 _mgmtdb]$ cd ./-MGMTDB
[grid@racnode1 trace]$ view -MGMTDB_mmon_26447.trc
VIM - Vi IMproved 7.4 (2013 Aug 10, compiled May 4 2014 20:16:04)
Unknown option argument: "-MGMTDB_mmon_26447.trc"
More info with: "vim -h"

[grid@racnode1 trace]$ view ./-MGMTDB_mmon_26447.trc

Is there any need to manually backup or tune Management Database?

As of now, there’s no such need.

How much (shared) disk space should be allocated for the Management Database?

For Oracle Cluster Registry (OCR) with external redundancy and the Grid Infrastructure Management Repository

Minimum:  At least 5.2 GB for the OCR volume that contains the Grid Infrastructure Management Repository (4.5 GB + 300 MB voting files + 400 MB OCR), plus 500 MB for each node for clusters greater than four nodes. For example, a six-node cluster allocation should be 6.2 GB.

Reference:  http://docs.oracle.com/database/121/CWLIN/storage.htm#CHDDCAHD