The agent is overloaded [current requests: 128]

Java layer deadlock —“Dead Lock detected!!”, bounce the agent, and then everything is working fine.

SITUATION

The following alerts are received from racnode1 -“The agent is overloaded [current requests: 128]”

From: oracle 
Sent: Friday, 4 August 2017 7:07 PM
Cc: 
Subject: EM Event: Warning: racnode1 - Agent Unreachable (REASON = The agent is overloaded [current requests: 128]). Host is reachable.

...
..
.
Categories=Availability 
Message=Agent Unreachable (REASON = The agent is overloaded [current requests: 128]). Host is reachable. 
Severity=Warning 
Event reported time=Aug 4, 2017 7:06:27 PM AEST
...
..
.

INVESTIGATING

1)   Check agent status

Agent is running
Agent upload is not working
Agent reload is not working
OMS heartbeat is not working

$ emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 12.1.0.5.0
OMS Version : 13.2.0.0.0
Protocol Version : 12.1.0.1.0
..
.
Last Reload : 2017-08-04 11:28:59
Last successful upload : 2017-08-04 14:51:03  <--- 5 hours ago
Last attempted upload : 2017-08-04 14:51:03
..
.
Last attempted heartbeat to OMS : 2017-08-04 14:50:23
Last successful heartbeat to OMS : 2017-08-04 14:50:23
Next scheduled heartbeat to OMS : 2017-08-04 14:51:23

2) Upload agent
$ emctl upload agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD upload error:The agent is overloaded [current requests: 128]
3) Reload agent
$ emctl reload agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD reload error:The agent is overloaded [current requests: 128]
4) “emagent_perl.trc” file has no information updated since agent restarted
5) Check “gcagent.log”

Java layer deadlock —“Dead Lock detected!!”

2017-08-04 19:28:59,071 [43:GCThread-13] ERROR -
Dead Lock detected!!
Participating threads:Thread Info Dump:
=================
"HTTP Listener-3592 - /emd/main/ (~Task-free~ OMS.pbs@16398@omsnode=>[150183756670001])" tid=3592 WAITING
 > Accumulated wait time (msec): 1372208 (1 times)

"HTTP Listener-2141 - /emd/main/ (~Task-free~ OMS.pbs@13103@omsnode=>[150182243190001])" tid=2141 BLOCKED
 > Accumulated wait time (msec): 11036289 (76 times)
 > Accumulated blocked time (msec): 16506994 (4 times)

"oracle.dfw.impl.incident.DiagnosticsDataExtractorImpl - Incident Dump Executor (created: Fri Aug 04 14:51:06 EST 2017)" tid=3088 BLOCKED
 > Accumulated blocked time (msec): 16672145 (7 times)

"HTTP Listener-1022 - /emd/main/ (~Task-free~ OMS.pbs@16398@omsnode=>[150181021899001])" tid=1022 WAITING
 > Accumulated wait time (msec): 28746227 (37 times)
 > Accumulated blocked time (msec): 133 (12 times)

"HTTP Listener-1078 - /emd/main/ (DispatchRequests OMS.console@16398@omsnode=>[150181015881006])" tid=1078 WAITING
 > Accumulated wait time (msec): 28719225 (44 times)

=================
Thread Info Dump:
=================
"HTTP Listener-3592 - /emd/main/ (~Task-free~ OMS.pbs@16398@omsnode=>[150183756670001])" tid=3592 WAITING
 sun.misc.Unsafe.park(Native Method)
 - waiting on <0x149717ec> (a java.util.concurrent.locks.ReentrantLock$NonfairSync), which is owned by "HTTP Listener-2141 - /emd/main/ (~Task-free~ OMS.pbs@13103@omsnode=>[150182243190001])" (tid=2141)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
 java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
...
..
.

SOLUTION

1) Stop agent
$ emctl stop agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
Stopping agent ...
 stopped.
2) Start agent
$ emctl start agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
Starting agent ............................................ started.
3) Upload agent successfully
$ emctl upload agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD upload completed successfully
4) Reload agent successfully
$ emctl reload agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD reload completed successfully
5)Check agent status successfully
$ emctl status agent
...
..
Last attempted heartbeat to  OMS : 2017-08-04 19:53:31
Last successful heartbeat to OMS : 2017-08-04 19:53:31
Next scheduled heartbeat to  OMS : 2017-08-04 19:54:32

---------------------------------------------------------------
Agent is Running and Ready

OEM Agent Throws “Internal error detected: java.lang.IllegalStateException:oracle.sysman.gcagent.target.interaction.execution.ConfigStateMgr:827”

There are some issues for agent to run host_storage collection every 24 hours.

Some of the OEM agents throw the internal errors in OEM alerts:

From: oracle 
Sent: Tuesday, 14 February 2017 6:08 PM
To: 
Subject: EM Event: Critical:racnode1:3872 - Internal error detected: java.lang.IllegalStateException:oracle.sysman.gcagent.target.interaction.execution.ConfigStateMgr:827.
...
..
.

Error messages from agent log file:

2017-02-14 18:07:32,288 [265:GC.Executor.6 (host:racmode1:host_storage) (host:racnode1:host_storage:storage_reporting_data)] ERROR - null
javax.xml.bind.UnmarshalException
 - with linked exception:
[org.xml.sax.SAXParseException: <Line 1, Column 9207>: XML-20221: (Fatal Error) Invalid char in text.]

Workaround:

1) Remove <agent_inst>/sysman/emd/state/configstate/host/racnode1/storage_*.xml files.

2) Restart the agent.

This will restart the collection of the storage data. As these data are collected every 24 hours, please wait for at least 24 hours to check the report.

You can also run the collection manually:

$ emctl status agent scheduler | grep host | grep -i storage
2017-02-16 18:25:52.117 : host:racnode1:host_storage
2017-02-16 18:34:37.318 : host:racnode1:HostStorageSupport

$ emctl control agent runCollection racnode1:host host_storage
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD runCollection completed successfully


$ emctl control agent runCollection racnode1:host HostStorageSupport
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD runCollection completed successfully

OEM Agent Throws Error “Internal error detected: oracle.sysman.gcagent.task.TaskZombieException:oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask:620”

Apply some hidden parameters into emd.properties file, then bounce the agent. It might fix some internal issues.

A couple of OEM agents reported some internal errors  :

From: oracle 
Sent: Monday, 13 February 2017 10:41 AM
To: 
Subject: EM Event: Critical:ractest:3872 - Internal error detected: oracle.sysman.gcagent.task.TaskZombieException:oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask:620.

Host=ractest
Target type=Agent 
Target name=ractest:3872 
Categories=Diagnostics 
Message=Internal error detected: oracle.sysman.gcagent.task.TaskZombieException:oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask:620. 
Severity=Critical 
...
..
.
Update Details:
Internal error detected: oracle.sysman.gcagent.task.TaskZombieException:oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask:620.

Extracted from gcagent.log :

2017-02-13 10:10:00,329 [256:7761FFD4] WARN - IntervalSchedule: Skip schedule [rac_database:RACTEST:observer_11g] - skipping due to execution time exceeding interval
2017-02-13 10:10:00,329 [257:B85D2018:GC.Executor.6 (oracle_database:RACTEST:Response)] WARN - IntervalSchedule: Skip schedule [oracle_database:RACTEST:Response] - skipping due to execution time exceeding interval
2017-02-13 10:10:00,329 [258:5A4306F7:GC.Executor.7 (rac_database:RACTEST:observer_11g)] WARN - Action result processing failure rac_database.RACTEST::observer_11g
java.lang.InterruptedException
 at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1302)
...
..
.
oracle.sysman.gcagent.task.TaskZombieException: task declared as a zombie
 at oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask.accountedCall(TaskFutureImpl.java:620)
 at oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask.call(TaskFutureImpl.java:643)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at oracle.sysman.gcagent.task.TaskFutureImpl.run1(TaskFutureImpl.java:380)
...
..
. 

 at java.lang.Thread.run(Thread.java:662)
2017-02-13 10:10:00,331 [257:GC.Executor.6] ERROR - Critical error:
oracle.sysman.gcagent.task.TaskZombieException: task declared as a zombie
 at oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask.accountedCall(TaskFutureImpl.java:620)
 at oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask.call(TaskFutureImpl.java:643)

WORKAROUND

Add below parameters into “emd.properties”, then bounce the agent.

_zombieSuspensions=true
_canceledThreadWait=900
_zombieThreadPercentThreshold=0
_zombieCreateIncident=false

Reference:

EM12c: Incident constantly raised for Oracle.sysman.gcagent.task.TaskZombieException: task declared as a zombie (Doc ID 2116834.1)

Plugin versions on agent does not support target type oracle_pdb

It seems a bug.

Tried to add a CDB database with PDBS into 12c OEM, and got the following errors.

Failed RACTEST : Plugin versions on agent https://oemnode1:3872/emd/main/  does not support target type oracle_pdb

 Actually there is another CDB database of same cluster has been added onto OEM months ago. So the error message is misleading.

Below actions were taken, then CDB database was added onto OEM successfully.

  1.  Shutdown and startup CDB database with all PDBS  open.
  2. Bounced OEM agent.
  3. Manually added cluster DB, cluster DB instances, and pluggable databases successfully.

OEM Cluster Database and Database System Targets are in Pending Status

In Oracle Enterprise Manager( OEM ) 12c and 13c, there are a couple of newly added cluster databases targets and database system targets are in “PENDING” status, though all the cluster database instances are showing up. 

Subscribe to get access

Read more of this content when you subscribe today.