“Fatal NI Connect Error 12516″ and ”TNS-12564: TNS:connection refused” In Clusterware Alert.log when ologgerd Connects to GIMR

ISSUES

There are repeating following errors in 19c CRS alert.log by ologgered trying to connect to GIMR.

Fatal NI connect error 12516, connecting to:
(DESCRIPTION=(CONNECT_TIMEOUT=60)(ADDRESS=(PROTOCOL=tcp)(HOST=racnode-cluster-scan.virtuallab)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=GIMR_DSCREP_10)(CID=(PROGRAM=ologgerd)(HOST=racnode1.virtuallab)(USER=root)))(SECURITY=(ENCRYPTION_CLIENT=requested)))
VERSION INFORMATION:
TNS for Linux: Version 19.0.0.0.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 19.0.0.0.0 - Production

Version 19.8.0.0.0
Time: 08-AUG-2020 18:15:01
Tracing not turned on.
Tns error struct:
ns main err code: 12564

TNS-12564: TNS:connection refused
ns secondary err code: 12560
nt main err code: 524

TNS-00524: Current operation is still in progress
nt secondary err code: 115
nt OS err code: 0

INVESTIGATION

Check mgmtlsnr or SCAN listeners, and found -MGMTDB PDB service ” gimr_dscrep_10″ is not registered properly.

[grid@racnode1 trace]$ lsnrctl services listener_scan2
...
..
.
Service "gimr_dscrep_10" has 1 instance(s).
Instance "-MGMTDB", status READY, has 1 handler(s) for this service…
Handler(s):
"DEDICATED" established:0 refused:0 state:blocked
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=172.16.100.62)(PORT=1526))

Subscribe to get access

Read more of this content when you subscribe today.

ORA-12516 from “oclumon manage -repos ” command

SYMPTOM

While running oclumon command in 12.2 GI to check CHM retention, and get ORA-12516 error:

$ oclumon manage -repos checkretentiontime 259200
Failed change retention. Error returned ORA-12516: TNS:listener could not find available handler with matching protocol stack

INVESTIGATION

Check MGMTLSNR is enabled and running

$ srvctl status mgmtlsnr
Listener MGMTLSNR is enabled
Listener MGMTLSNR is running on node(s): racnode1

Services are all registered on MGMTLSNR

Both private and private HAIP are registered in listener.

$ lsnrctl status MGMTLSNR
...
..
.
Listening Endpoints Summary…
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=MGMTLSNR)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.1.1.11)(PORT=1526)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=169.254.105.132)(PORT=1526)))
..
.
Service "_mgmtdb" has 1 instance(s).
Instance "-MGMTDB", status READY, has 1 handler(s) for this service…
Service "gimr_dscrep_10" has 1 instance(s).
Instance "-MGMTDB", status READY, has 1 handler(s) for this service…
The command completed successfully

MGMTDB is DISABLED

$ srvctl status mgmtdb
Database is disabled
Instance -MGMTDB is running on node racnode1

strace log

Log shows private IP access issue.

66684 [00007f94efff1687] getsockname(50, {sa_family=AF_INET, sin_port=htons(48322), sin_addr=inet_addr("169.254.105.132")}, [16]) = 0
66684 [00007f94efff1657] getpeername(50, {sa_family=AF_INET, sin_port=htons(61021), sin_addr=inet_addr("169.254.105.132")}, [16]) = 0
66684 [00007f94f2603aeb] recvfrom(50, 0x1a63888, 10240, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
66684 [00007f94f2603aeb] recvfrom(50, 0x1a63888, 10240, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
66684 [00007f94f2603aeb] recvfrom(50, 0x1a63888, 10240, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
66684 [00007f94f2603aeb] recvfrom(50, 0x1a63888, 10240, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
66684 [00007f94f2603aeb] recvfrom(50, 0x1a63888, 10240, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)

LOCAL_LISTENER

local_listener shows without private IP.

SQL> show parameter local_listener

NAME                 TYPE     VALUE
-------------------  ------  -------------------------------
local_listener       string   (ADDRESS=(PROTOCOL=TCP)(HOST=
                                     10.1.1.11)(PORT=1526))

CAUSES

local_listener includes private IP only, but private HAIP is missing.

SOLUTION

ENABLE MGMTDB

$ srvctl enable mgmtdb

$ srvctl status mgmtdb
Database is enabled
Instance -MGMTDB is running on node racnode1

Add private HAIP into local_listener

SQL> alter system set local_listener='(ADDRESS=(PROTOCOL=TCP)(HOST=10.1.1.11)(PORT=1526))','(ADDRESS=(PROTOCOL=TCP)(HOST=169.254.105.132)(PORT=1526))' scope=both;

System altered.

SQL> show parameter local_listener

NAME          TYPE    VALUE
------------- ------- ------------------------------
local_listener string (ADDRESS=(PROTOCOL=TCP)(HOST=10.1.1.11)
                      (PORT=1526)), (ADDRESS=(PROTOCOL=TCP)
                      (HOST=169.254.105.132)(PORT=1526))

Run oclumon command again successfully.

$ oclumon manage -repos checkretentiontime 259200
The Cluster Health Monitor repository can support the desired retention for 2 hosts