emctl status agent : Status agent Failure:unable to connect to http server at [handshake has no peer]

OEM agent is unhealthy with the following errors:

$ emctl status agent
Oracle Enterprise Manager Cloud Control 13c Release 4
Copyright (c) 1996, 2020 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Status agent Failure:unable to connect to http server at https://racnode1:3872/emd/lifecycle/main/. [handshake has no peer]

$ emctl stop agent
Oracle Enterprise Manager Cloud Control 13c Release 4
Copyright (c) 1996, 2020 Oracle Corporation.  All rights reserved.

$ emctl pingOMS
Oracle Enterprise Manager Cloud Control 13c Release 4
Copyright (c) 1996, 2020 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
EMD pingOMS error: unable to connect to http server at https://racnode1:3872/emd/main/. [handshake has no peer]

Error stack observed from <Agent_Inst>/sysman/log/emagent.nohup file:

—– 2024-04-13 10:03:14,913::8313::Checking status of EMAgent : 10965 —–
—– 2024-04-13 10:03:14,913::8313::Hang detected for EMAgent : 10965 —–
—– 2024-04-13 10:03:14,913::8313::Debugging component EMAgent —–
—– 2024-04-13 10:03:14,913::generate first thread dump file for diagnosis —–
—– 2024-04-13 10:03:27,394::generate second thread dump file for diagnosis —–
—– 2024-04-13 10:03:27,592::generate Threads.10965lsof.1 for diagnosis —–
—– Attempting to kill EMAgent : 10965 —–
—– 2024-04-13 10:03:32,660::8313::EMAgent exited at 2024-04-13 10:03:32,660 with signal 9 —–
—– 2024-04-13 10:03:32,660::8313::EMAgent either hung or in abnormal state. —–
—– 2024-04-13 10:03:32,660::8313::EMAgent will be restarted/thrashed. —–
—– 2024-04-13 10:03:32,660::8313::writeAbnormalExitTimestampToAgntStmp: exitCause=ABNORMAL : restartRequired=1 —–
—– 2024-04-13 10:03:32,660::8313::Restarting EMAgent. —–
—– 2024-04-13 10:03:32,875::8313::Auto tuning the agent at time 2024-04-13 10:03:32,875 —–
—– 2024-04-13 10:03:40,237::8313::Finished auto tuning the agent at time 2024-04-13 10:03:40,237 —–
—– 2024-04-13 10:03:40,238::8313::Launching the JVM with following options: -Xmx128M -XX:MaxMetaspaceSize=224M -server -Djava.security.egd=file:///dev/./urandom -Dsun.lang.ClassLoader.allowArraySyntax=true -XX:-UseLargePages -XX:+UseLinuxPosixThreadCPUClocks -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+UseCompressedOops -DHTTPClient.dontSeekTerminatingChunk=true —–
—– 2024-04-13 10:03:40,239::8313::Agent Launched with PID 81582 at time 2024-04-13 10:03:40,239 —–
—– 2024-04-13 10:03:40,239::81582::Execing EMAgent process is taking longer than expected 120 secs —–
—– 2024-04-13 10:03:40,239::81582::Time elapsed between Launch of Watchdog process and execing EMAgent is 727958 secs —–

SOLUTION

1. Take the backup of the <Agent_Inst>/sysman/config/emd.properties file.

2. Update the following property as below:

agentJavaDefines=-Xmx128M -XX:MaxMetaspaceSize=224M

agentJavaDefines=-Xmx512M -XX:MaxMetaspaceSize=224M

3. Save the file.

4. Try restarting the agent.
$ emctl stop agent
$ emctl start agent

How to Debug RunInstaller

$ runInstaller -debug -logLevel finest >inst1.out 2>inst2.out

How to Debug ‘cluvfy” or “runcluvfy.sh”

Debug “cluvfy” or “runcluvfy.sh”:

$ rm -rf /tmp/cvutrace
$ mkdir /tmp/cvutrace
$ export CV_TRACELOC=/tmp/cvutrace
$ export SRVM_TRACE=true
$ export SRVM_TRACE_LEVEL=1

$ cluvfy stage -pre dbinst -allnodes -r 12.2 -d /u01/app/oracle/product/12.2.0/dbhome_1

$ ls -ltr  /tmp/cvutrace
total 1960
-rw-r--r-- 1 grid oinstall       0 Sep  8 21:46 cvutrace.log.0.lck
-rw-r--r-- 1 grid oinstall       0 Sep  8 21:47 cvuhelper.log.0.lck
-rw-r--r-- 1 grid oinstall    1586 Sep  8 21:47 cvuhelper.log.0
-rw-r--r-- 1 grid oinstall 2000962 Sep  8 21:47 cvutrace.log.0

Oracle Golden Gate Missing Archive Log Files

OGG extract process is abended due to the missing archive logs as showing in GG error log:

2023-01-21T01:19:23.197+1100  ERROR   OGG-01028  Oracle GoldenGate Capture for Oracle, CAP01.prm:  Could not find archived log for sequence 99 thread 2 under default destinations SQL <SELECT  name   FROM v$archived_log   WHERE sequence# = :1 AND         thread# = :2 AND         resetlogs_id = :3 AND         archived = 'YES' AND         deleted = 'NO'         AND standby_dest = 'NO'         order by name DESC>, error retrieving redo file name for sequence 99, archived = 1, use_alternate = 0.

You can restore the missing archive logs using RMAN from tape backup if they are still available, then restart the extract process.

Otherwise we have to skip the archive log, certainly the data might be missed due to skipping the log files. In this case, let OGG to skip logfile by issuing following command:

GGSCI >ALTER EXTRACT CAP01, extseqno 100

Sometimes we can skip the transactions based on date and time, like :

GGSCI> ALTER EXTRACT CAP01 BEGIN 2024-02-24 15:00

ERROR OGG-01172 Discard file (/home/oracle/ggs/dirrpt/REP01.dsc) exceeded max bytes (3000000)

The replicate process ABENDING due to the following errors exist in replicate process report:

ERROR   OGG-01172  Discard file (/home/oracle/ggs/dirrpt/REP01.dsc) exceeded max bytes (3000000).

SOLUTION

Increase the discardfile maximum size in replicate parameter file from default 3M to 100M:

discardfile /home/oracle/ggs/dirrpt/REP01.dsc, purge, megabytes 100