ORA-01013: user requested cancel of current operation

Following OEM alerts are received from databases of 11R2 to 12R2:

...
..
.
Target name=RACTEST
Incident creation time=28/11/2017 9:48:17 PM AEDT 
Last updated time=28/11/2017 9:48:17 PM AEDT 
Message= ORA-01013: user requested cancel of current operation 
Severity=Critical 
Incident ID=294770 
...
..
.

CAUSES

SQL ID 4d43by1zzjfna runs slowly until failed with ‘ORA-01013’ error.

SOLUTION

As per “low Performance Oracle Enterprise Manager SQL Metrics SQL statement (SQL ID: 4d43by1zzjfna) Automatically Run (Doc ID 2266608.1)”, the fololowing actions are taken :

(a) Stopping EM agent monitoring the RAC instance(s).

$AGENT_HOME/bin/emctl stop agent

(b) Test the current execution time ( 36 s ),  and record the execution plan via SQL PLUS:

SQL> set timing on
SQL> set autotrace on
SQL> WITH last_run AS
 (
 SELECT all_runs.OWNER ,
 all_runs.JOB_NAME,
 all_runs.STATUS
 FROM DBA_SCHEDULER_JOB_RUN_DETAILS all_runs,
 (
 SELECT OWNER ,
 JOB_NAME,
 MAX(ACTUAL_START_DATE) AS START_DATE
 FROM DBA_SCHEDULER_JOB_RUN_DETAILS
 GROUP BY OWNER,
 JOB_NAME
 )
 latest_runs
 WHERE all_runs.OWNER = latest_runs.OWNER
 AND all_runs.JOB_NAME = latest_runs.JOB_NAME
 AND all_runs.ACTUAL_START_DATE=latest_runs.START_DATE
...
..
.

SUM(BROKEN) SUM(FAILED)
----------- -----------
 0          0

Elapsed: 00:00:36.49

Execution Plan
----------------------------------------------------------
Plan hash value: 204035645
...
..
.
Statistics
----------------------------------------------------------
 338   recursive calls
 0     db block gets
 58842 consistent gets
 0     physical reads
 0     redo size
 603   bytes sent via SQL*Net to client
 524   bytes received via SQL*Net from client
 2     SQL*Net roundtrips to/from client
 8671  sorts (memory)
 0     sorts (disk)
 1 rows processed

c)Purge any job history that is no longer needed to reduce the size of job related tables:

SQL> exec DBMS_SCHEDULER.PURGE_LOG(7,which_log=>'JOB_LOG');

PL/SQL procedure successfully completed.
Elapsed: 00:00:04.95

SQL> exec DBMS_SCHEDULER.PURGE_LOG(7,which_log=>'WINDOW_LOG');

PL/SQL procedure successfully completed.
Elapsed: 00:00:00.03

SQL> exec DBMS_SCHEDULER.PURGE_LOG(log_history => 7, job_name => 'job1, sys.class2');

PL/SQL procedure successfully completed.
Elapsed: 00:00:00.04

SQL> exec DBMS_SCHEDULER.SET_SCHEDULER_ATTRIBUTE('log_history','7');

PL/SQL procedure successfully completed.
Elapsed: 00:00:00.02

d) Regather statistics to ensure that the CBO has the right understanding of cardinality/cost for execution plans involving JOB related tables:

SQL> EXEC DBMS_STATS.GATHER_FIXED_OBJECTS_STATS;

PL/SQL procedure successfully completed.
Elapsed: 00:01:26.37

SQL> EXEC DBMS_STATS.GATHER_DICTIONARY_STATS;

PL/SQL procedure successfully completed.
Elapsed: 00:01:35.82

SQL> EXEC DBMS_STATS.GATHER_SCHEMA_STATS ('SYS');

PL/SQL procedure successfully completed.
Elapsed: 00:10:53.24

e) rerun the same sql_id=’4d43by1zzjfna’. We can see there is a new execution plan. The running time is shorted from 36 seconds from 1 second.

SQL> WITH last_run AS
 2 (
 3 SELECT all_runs.OWNER ,
 4 all_runs.JOB_NAME,
 5 all_runs.STATUS
 6 FROM DBA_SCHEDULER_JOB_RUN_DETAILS all_runs,
 7 (
 8 SELECT OWNER ,
 9 JOB_NAME,
 10 MAX(ACTUAL_START_DATE) AS START_DATE
 11 FROM DBA_SCHEDULER_JOB_RUN_DETAILS
 12 GROUP BY OWNER,
 13 JOB_NAME
 14 )
 15 latest_runs
 16 WHERE all_runs.OWNER = latest_runs.OWNER
 17 AND all_runs.JOB_NAME = latest_runs.JOB_NAME
 18 AND all_runs.ACTUAL_START_DATE=latest_runs.START_DATE
 19 )
 20 SELECT SUM(broken),
 21 SUM(failed)
 22 FROM
 23 (
 24 SELECT DECODE(broken, 'N', 0, 1) broken,
 25 DECODE(NVL(failures,0), 0, 0, 1) failed
 26 FROM dba_jobs
 27 UNION ALL
 28 SELECT DECODE(STATE,'BROKEN',1,0) broken,
 29 DECODE(STATUS ,'FAILED',DECODE(STATE,'BROKEN',0,'DISABLED',0,1),0) failed
 30 FROM
 31 (
 32 SELECT all_jobs.OWNER ,
 33 all_jobs.JOB_NAME,
 34 last_run.STATUS ,
 35 all_jobs.STATE
 36 FROM last_run,
 37 DBA_SCHEDULER_JOBS all_jobs
 38 WHERE last_run.OWNER =all_jobs.OWNER
 39 AND last_run.JOB_NAME=all_jobs.JOB_NAME
 40 )
 41 )
 42 /

SUM(BROKEN) SUM(FAILED)
----------- -----------
 0           0

Elapsed: 00:00:01.00

Execution Plan
----------------------------------------------------------
Plan hash value: 2148836375
...
..
.
Statistics
----------------------------------------------------------
 50   recursive calls
 0    db block gets
 3585 consistent gets
 0    physical reads
 0    redo size
 603  bytes sent via SQL*Net to client
 524  bytes received via SQL*Net from client
 2    SQL*Net roundtrips to/from client
 51   sorts (memory)
 0    sorts (disk)
 1 rows processed

(f) Re-enable EM Agents monitoring the RAC instance(s):

$AGENT_HOME/bin/emctl start agent

The agent is overloaded [current requests: 128]

Java layer deadlock —“Dead Lock detected!!”, bounce the agent, and then everything is working fine.

SITUATION

The following alerts are received from racnode1 -“The agent is overloaded [current requests: 128]”

From: oracle 
Sent: Friday, 4 August 2017 7:07 PM
Cc: 
Subject: EM Event: Warning: racnode1 - Agent Unreachable (REASON = The agent is overloaded [current requests: 128]). Host is reachable.

...
..
.
Categories=Availability 
Message=Agent Unreachable (REASON = The agent is overloaded [current requests: 128]). Host is reachable. 
Severity=Warning 
Event reported time=Aug 4, 2017 7:06:27 PM AEST
...
..
.

INVESTIGATING

1)   Check agent status

Agent is running
Agent upload is not working
Agent reload is not working
OMS heartbeat is not working

$ emctl status agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 12.1.0.5.0
OMS Version : 13.2.0.0.0
Protocol Version : 12.1.0.1.0
..
.
Last Reload : 2017-08-04 11:28:59
Last successful upload : 2017-08-04 14:51:03  <--- 5 hours ago
Last attempted upload : 2017-08-04 14:51:03
..
.
Last attempted heartbeat to OMS : 2017-08-04 14:50:23
Last successful heartbeat to OMS : 2017-08-04 14:50:23
Next scheduled heartbeat to OMS : 2017-08-04 14:51:23

2) Upload agent
$ emctl upload agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD upload error:The agent is overloaded [current requests: 128]
3) Reload agent
$ emctl reload agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD reload error:The agent is overloaded [current requests: 128]
4) “emagent_perl.trc” file has no information updated since agent restarted
5) Check “gcagent.log”

Java layer deadlock —“Dead Lock detected!!”

2017-08-04 19:28:59,071 [43:GCThread-13] ERROR -
Dead Lock detected!!
Participating threads:Thread Info Dump:
=================
"HTTP Listener-3592 - /emd/main/ (~Task-free~ OMS.pbs@16398@omsnode=>[150183756670001])" tid=3592 WAITING
 > Accumulated wait time (msec): 1372208 (1 times)

"HTTP Listener-2141 - /emd/main/ (~Task-free~ OMS.pbs@13103@omsnode=>[150182243190001])" tid=2141 BLOCKED
 > Accumulated wait time (msec): 11036289 (76 times)
 > Accumulated blocked time (msec): 16506994 (4 times)

"oracle.dfw.impl.incident.DiagnosticsDataExtractorImpl - Incident Dump Executor (created: Fri Aug 04 14:51:06 EST 2017)" tid=3088 BLOCKED
 > Accumulated blocked time (msec): 16672145 (7 times)

"HTTP Listener-1022 - /emd/main/ (~Task-free~ OMS.pbs@16398@omsnode=>[150181021899001])" tid=1022 WAITING
 > Accumulated wait time (msec): 28746227 (37 times)
 > Accumulated blocked time (msec): 133 (12 times)

"HTTP Listener-1078 - /emd/main/ (DispatchRequests OMS.console@16398@omsnode=>[150181015881006])" tid=1078 WAITING
 > Accumulated wait time (msec): 28719225 (44 times)

=================
Thread Info Dump:
=================
"HTTP Listener-3592 - /emd/main/ (~Task-free~ OMS.pbs@16398@omsnode=>[150183756670001])" tid=3592 WAITING
 sun.misc.Unsafe.park(Native Method)
 - waiting on <0x149717ec> (a java.util.concurrent.locks.ReentrantLock$NonfairSync), which is owned by "HTTP Listener-2141 - /emd/main/ (~Task-free~ OMS.pbs@13103@omsnode=>[150182243190001])" (tid=2141)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
 java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
...
..
.

SOLUTION

1) Stop agent
$ emctl stop agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
Stopping agent ...
 stopped.
2) Start agent
$ emctl start agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
Starting agent ............................................ started.
3) Upload agent successfully
$ emctl upload agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD upload completed successfully
4) Reload agent successfully
$ emctl reload agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD reload completed successfully
5)Check agent status successfully
$ emctl status agent
...
..
Last attempted heartbeat to  OMS : 2017-08-04 19:53:31
Last successful heartbeat to OMS : 2017-08-04 19:53:31
Next scheduled heartbeat to  OMS : 2017-08-04 19:54:32

---------------------------------------------------------------
Agent is Running and Ready

“SELECT MEDIA FROM V$BACKUP_PIECE_DETAILS” Takes Long Time and Uses Huge Temp Space

It is a good practice for DBA to periodically collect stats of fixed objects, data dictionary, SYS and SYSTEM schemas,

The following query from OEM agent keeps failing, after upgrading Enterprise Manager Cloud Control and a monitoring agent to 13.2, this query is run in the 11.2.0.4 target database:

SELECT MEDIA
 FROM V$BACKUP_PIECE_DETAILS
 WHERE SESSION_KEY=:B3 AND SESSION_RECID=:B2 AND SESSION_STAMP=:B1 
   AND DEVICE_TYPE = 'SBT_TAPE' AND ROWNUM = 1

Also it uses huge temp space :

SQL> select USERNAME,TABLESPACE,BLOCKS*8/1024/1024 
      from V$TEMPSEG_USAGE 
  order by BLOCKS desc ;

USERNAME   TABLESPACE                       BLOCKS*8/1024/1024
---------- ------------------------------- ------------------
DBSNMP     TEMP                             30.649414

It is the exact same issue as per (Doc ID 2201982.1). so the following actions are taken :

SQL> exec dbms_stats.gather_fixed_objects_stats;

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_schema_stats ('SYSTEM');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_schema_stats ('SYS');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_dictionary_stats;

PL/SQL procedure successfully completed.

Finally flush the cursor out of memory for next running :

SQL> select address, hash_value,PLAN_HASH_VALUE, executions, 
            loads, version_count, invalidations, parse_calls 
      from v$sqlarea 
     where sql_id='dx4nqvbtu06bx';

ADDRESS          HASH_VALUE PLAN_HASH_VALUE EXECUTIONS LOADS    
---------------- ---------- --------------- ---------- -----
VERSION_COUNT INVALIDATIONS PARSE_CALLS
------------- ------------- -----------
00000001418FA048 4087355773 1896464546      213        5          
1             1             213

SQL> exec dbms_shared_pool.purge('00000001418FA048,4087355773','C');

PL/SQL procedure successfully completed.

SQL> select address, hash_value,PLAN_HASH_VALUE, executions, loads, 
            version_count, invalidations, parse_calls 
       from v$sqlarea 
      where sql_id='dx4nqvbtu06bx';

no rows selected

Have a check of the new execution time. We can see the new execution plan is created, and the execution time is only 0.25 second.

SQL> select address, hash_value,PLAN_HASH_VALUE, executions,
            ELAPSED_TIME,ELAPSED_TIME/executions/1000000 TimePerExecution
      from v$sqlarea 
     where sql_id='dx4nqvbtu06bx';

ADDRESS          HASH_VALUE  PLAN_HASH_VALUE EXECUTIONS ELAPSED_TIME 
---------------- ----------- --------------- ---------- ------------ 
TIMEPEREXECUTION
----------------
00000001418FA048   4087355773   75529090       2          504858       .252429

OEM Agent Throws “Internal error detected: java.lang.IllegalStateException:oracle.sysman.gcagent.target.interaction.execution.ConfigStateMgr:827”

There are some issues for agent to run host_storage collection every 24 hours.

Some of the OEM agents throw the internal errors in OEM alerts:

From: oracle 
Sent: Tuesday, 14 February 2017 6:08 PM
To: 
Subject: EM Event: Critical:racnode1:3872 - Internal error detected: java.lang.IllegalStateException:oracle.sysman.gcagent.target.interaction.execution.ConfigStateMgr:827.
...
..
.

Error messages from agent log file:

2017-02-14 18:07:32,288 [265:GC.Executor.6 (host:racmode1:host_storage) (host:racnode1:host_storage:storage_reporting_data)] ERROR - null
javax.xml.bind.UnmarshalException
 - with linked exception:
[org.xml.sax.SAXParseException: <Line 1, Column 9207>: XML-20221: (Fatal Error) Invalid char in text.]

Workaround:

1) Remove <agent_inst>/sysman/emd/state/configstate/host/racnode1/storage_*.xml files.

2) Restart the agent.

This will restart the collection of the storage data. As these data are collected every 24 hours, please wait for at least 24 hours to check the report.

You can also run the collection manually:

$ emctl status agent scheduler | grep host | grep -i storage
2017-02-16 18:25:52.117 : host:racnode1:host_storage
2017-02-16 18:34:37.318 : host:racnode1:HostStorageSupport

$ emctl control agent runCollection racnode1:host host_storage
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD runCollection completed successfully


$ emctl control agent runCollection racnode1:host HostStorageSupport
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD runCollection completed successfully

OEM Agent Throws Error “Internal error detected: oracle.sysman.gcagent.task.TaskZombieException:oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask:620”

Apply some hidden parameters into emd.properties file, then bounce the agent. It might fix some internal issues.

In OEM 13c, a couple of OEM agents reported some internal errors  like :

Internal error detected: oracle.sysman.gcagent.task.TaskZombieException:oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask:620

Internal error detected: oracle.sysman.gcagent.task.TaskZombieException:oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask:621

From: oracle 
Sent: Monday, 13 February 2017 10:41 AM
To: 
Subject: EM Event: Critical:ractest:3872 - Internal error detected: oracle.sysman.gcagent.task.TaskZombieException:oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask:620.

Host=ractest
Target type=Agent 
Target name=ractest:3872 
Categories=Diagnostics 
Message=Internal error detected: oracle.sysman.gcagent.task.TaskZombieException:oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask:620. 
Severity=Critical 
...
..
.
Update Details:
Internal error detected: oracle.sysman.gcagent.task.TaskZombieException:oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask:620.

Extracted from gcagent.log :

2017-02-13 10:10:00,329 [256:7761FFD4] WARN - IntervalSchedule: Skip schedule [rac_database:RACTEST:observer_11g] - skipping due to execution time exceeding interval
2017-02-13 10:10:00,329 [257:B85D2018:GC.Executor.6 (oracle_database:RACTEST:Response)] WARN - IntervalSchedule: Skip schedule [oracle_database:RACTEST:Response] - skipping due to execution time exceeding interval
2017-02-13 10:10:00,329 [258:5A4306F7:GC.Executor.7 (rac_database:RACTEST:observer_11g)] WARN - Action result processing failure rac_database.RACTEST::observer_11g
java.lang.InterruptedException 
 at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1302)
...
..
.
oracle.sysman.gcagent.task.TaskZombieException: task declared as a zombie
 at oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask.accountedCall(TaskFutureImpl.java:620)
 at oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask.call(TaskFutureImpl.java:643)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at oracle.sysman.gcagent.task.TaskFutureImpl.run1(TaskFutureImpl.java:380)
...
..
. 

 at java.lang.Thread.run(Thread.java:662)
2017-02-13 10:10:00,331 [257:GC.Executor.6] ERROR - Critical error:
oracle.sysman.gcagent.task.TaskZombieException: task declared as a zombie
 at oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask.accountedCall(TaskFutureImpl.java:620)
 at oracle.sysman.gcagent.task.TaskFutureImpl$WrappedTask.call(TaskFutureImpl.java:643)

WORKAROUND

Subscribe to get access

Read more of this content when you subscribe today.