Schedule Jobs Not Running after Instance Restarted

Oracle database scheduler jobs have been working fine. Since the database was upgraded, every time when database was bounced, then scheduler jobs stopped running.

INVESTIGATION and SOLUTION

Subscribe to get access

Read more of this content when you subscribe today.

Advertisement

ORA-02800: Requests timed out from Oracle Scheduler Job

Get errors from running an Oracle scheduler job:

SQL> exec dbms_scheduler.run_job(job_name=>'BATCH_JOB', 
                                 use_current_session=>false);
BEGIN dbms_scheduler.run_job(job_name=>'BATCH_JOB', 
                                 use_current_session=>false); END;

*
ERROR at line 1:
ORA-02800: Requests timed out
ORA-06512: at "SYS.DBMS_ISCHED", line 196
ORA-06512: at "SYS.DBMS_SCHEDULER", line 486
ORA-06512: at line 1

WORKAROUND

1)check parameter to make sure not zero.

SQL> show parameter job_queue_processes

NAME                                 TYPE     VALUE
-------------------------------- ------------ ------------------
job_queue_processes                integer    100

2) Bounce the database due to the bugs.

3) Rerun the job successfully.

SQL> exec dbms_scheduler.run_job(job_name=>'BATCH_JOB', 
use_current_session=>false);

PL/SQL procedure successfully completed.

Too many AQ$_PLSQL_NTFN_XXXXX jobs In Oracle Database

SYMPTOM

There are more than one million  AQ$_PLSQL_NTFN_xxxxxxxx scheduler jobs in oracle database.  

sql hangs for querying dba_scheduler_jobs.

Please note there are a couple of bugs on this issue, the steps are not necessarily working for every database versions.

SQL>select count(*) from dba_scheduler_jobs;

WORKAROUND

1)set job_queue_processes =0

SQL>alter system set job_queue_processes=0 scope=memory;

2)Stop and Drop all those AQ$_PLSQL_NTFN scheduler jobs:

— add rownum<100000 into below sqls, if the sql hangs.

SQL>select 'EXEC DBMS_SCHEDULER.STOP_JOB ('''|| job_name ||''', force => TRUE);' 
    from dba_scheduler_jobs 
    where job_name like 'AQ$_PLSQL_NTFN_%'
    and state='RUNNING';


SQL> select ' exec dbms_scheduler.drop_job('''||job_name ||''');' 
     from dba_scheduler_jobs 
     where job_name like 'AQ$_PLSQL_NTFN_%';

3)

SQL>select count(*) from sys.aq$AQ_SRVNTFN_TABLE_1 
    where queue='AQ$_AQ_SRVNTFN_TABLE_1_E';

DECLARE 
po dbms_aqadm.aq$_purge_options_t; 
BEGIN 
po.block := TRUE; 
DBMS_AQADM.PURGE_QUEUE_TABLE( 
queue_table => 'SYS.AQ_SRVNTFN_TABLE_1', 
purge_condition => 'qtview.queue = ''AQ$_AQ_SRVNTFN_TABLE_1_E''', 
purge_options => po); 
END; 
/

select count(*) from sys.aq$AQ_SRVNTFN_TABLE_1 
where queue='AQ$_AQ_SRVNTFN_TABLE_1_E';

4) Collect stats.

SQL> exec dbms_stats.gather_fixed_objects_stats;

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_schema_stats ('SYSTEM');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_schema_stats ('SYS');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_dictionary_stats;

PL/SQL procedure successfully completed.

5) Rebuild indexes of DBA_SCHEDULER_JOBS

6) Bounce the instance.

SQL>shutdown immediate ;
SQL>startup;
SQL>alter system set job_queue_processes=1000 scope=memory;

ORA-04021: timeout occurred while waiting to lock object

Application job failed with trace file contents:

*** 2018-06-16 11:03:17.044
*** SESSION ID:(2437.25769) 2018-06-16 11:03:17.044
*** CLIENT ID:() 2018-06-16 11:03:17.044
*** SERVICE NAME:(BATCH) 2018-06-16 11:03:17.044
*** MODULE NAME:(DBMS_SCHEDULER) 2018-06-16 11:03:17.044
*** ACTION NAME:(IMMEDIATE16_06_2018_10_10_01) 2018-06-16 11:03:17.044

ORA-12012: error on auto execute of job "BATCH"."IMMEDIATE16_06_2018_10_10_01"
ORA-04021: timeout occurred while waiting to lock object
ORA-06512: at "BATCH.AP_DATA_PROCESS", line 652
ORA-06512: at line 1

Check the session details from ASH history:

set pagesize 100
set linesize 250
col INSTANCE_NUMBER format 99
col SESSION_ID format 99999
col BLOCKING_INST_ID format 99
col event format a20
col WAIT_CLASS format a15
col PROGRAM format a20
col MODULE format a16
col p3 format 99999999999999999999
select sql_id,to_char(SAMPLE_TIME,'YYYYMMDD-HH24:MI:SS'),
       INSTANCE_NUMBER,SESSION_ID,PROGRAM,MODULE,EVENT,
       CURRENT_OBJ#,WAIT_CLASS,BLOCKING_INST_ID,
       BLOCKING_SESSION, p3
from dba_hist_active_sess_history
where INSTANCE_NUMBER= 3 
and SESSION_ID = 2437 
and sample_time between to_date('16-06-18 11:00:00','DD-MM-YY HH24:MI:SS')
and to_date('16-06-18 11:04:00','DD-MM-YY HH24:MI:SS')
and event ='library cache lock'
order by sample_id;

SQL_ID TO_CHAR(SAMPLE_TI INSTANCE_NUMBER SESSION_ID PROGRAM MODULE EVENT CURRENT_OBJ# WAIT_CLASS BLOCKING_INST_ID BLOCKING_SESSION P3
------------- ----------------- --------------- ---------- -------------------- ---------------- -------------------- ------------ --------------- ---------------- ---------------- ---------------------
...
..
6cp7dp2xgxr6m 20180616-11:02:52 3 2437 oracle@racnode3 (J001) DBMS_SCHEDULER library cache lock 27307391 Concurrency 2 5205 76355447552016387
6cp7dp2xgxr6m 20180616-11:03:02 3 2437 oracle@racnode3 (J001) DBMS_SCHEDULER library cache lock 27307391 Concurrency 2 5205 76355447552016387
6cp7dp2xgxr6m 20180616-11:03:12 3 2437 oracle@rcanode3 (J001) DBMS_SCHEDULER library cache lock 27307391 Concurrency 2 5205 76355447552016387

20 rows selected.

The OBJ# matches the DDL command:

ALTER INDEX BATCH.CDR_IND REBUILD SUBPARTITION CDR_IND_SUB_213;
SQL>select OWNER,OBJECT_NAME,SUBOBJECT_NAME,
           OBJECT_TYPE,STATUS,LAST_DDL_TIME 
     from  dba_objects 
    where  OBJECT_ID=27307391;

OWNER OBJECT_NAME SUBOBJECT_NAME  OBJECT_TYPE        STATUS LAST_DDL_
----- ----------- --------------  ------------------ ------ --------
BATCH CDR_IND     CDR_IND_SUB_213 INDEX SUBPARTITION VALID  20180616-12:39:22

The P3 shows the waiting object details, we can see the session failed because of waiting for object CDR,  which is holding by blocking session 5205 on node 2.

SQL> select lpad('Library cache P3 value: ',50,'.')||76355447552016387 from dual;

..........................Library cache P3 value: 76355447552016387
SQL> select lpad('Library cache P3 value HEX: ',50,'.')||to_char(76355447552016387,'xxxxxxxxxxxxxxxxxxxxx') from dual;

.....................Library cache P3 value HEX: 10f44e000010003
SQL> select lpad('Object ID: ',50,'.')||to_number(substr('10f44e000010003',1,length('10f44e000010003')-8 ),'xxxxxxxxxxxxxx') from dual;

...........Object ID: 17777888
SQL> select lpad('Namespace: ',50,'.')||to_number(substr('10f44e000010003',-8,4),'xxxxxxxxxxxxxx') from dual;

.......................................Namespace: 1
SQL> select lpad('RequestMode: ',50,'.')||decode(to_number(substr('10f44e000010003',-4),'xxxxxxxxxxxxxx'),3,'exclusive mode',2,'share mode',1,'null mode',0,'no lock/pin held','unknown') from dual;

.....................................RequestMode: exclusive mode
SQL> SELECT indx,kglstdsc FROM x$kglst WHERE kglsttyp='NAMESPACE' and indx=1;

1 TABLE/PROCEDURE
SQL>select OWNER,OBJECT_NAME,SUBOBJECT_NAME,OBJECT_TYPE,NAMESPACE 
from dba_objects where object_id=17777888;

OWNER     OBJECT_NAME   SUBOBJECT_NAME      OBJECT_TYPE     NAMESPACE
---------------- -----------------------  ---------------------------------  ------------------------  -----------------
BATCH        CDR                                                               TABLE                   1

Check blocking session details on racnode2, and found it is running DBMS_STATS against waiting object CDR from racnode3 .

b6usrg82hwsa3 20180616-11:02:56 2 5205 oracle@racnode2 (J032) DBMS_SCHEDULER row cache lock 18 Concurrency 0 2999959 GLOBAL
b6usrg82hwsa3 20180616-11:03:06 2 5205 oracle@racnode2 (J032) DBMS_SCHEDULER row cache lock 18 Concurrency 0 0       GLOBAL
b6usrg82hwsa3 20180616-11:03:16 2 5205 oracle@racnode2 (J032) DBMS_SCHEDULER row cache lock 18 Concurrency 0 1953712 GLOBAL
4m477zx25xrgq 20180616-11:03:26 2 5205 oracle@racnode2 (J032) DBMS_SCHEDULER                27307236       321 0     NOT IN WAIT
4m477zx25xrgq 20180616-11:03:36 2 5205 oracle@racnode2 (J032) DBMS_SCHEDULER                27307242       1582 0    NOT IN WAIT
4m477zx25xrgq 20180616-11:03:46 2 5205 oracle@racnode2 (J032) DBMS_SCHEDULER                27307219       1537 0    NOT IN WAIT


SQL> select sql_text from v$sql where sql_id='b6usrg82hwsa3';

SQL_TEXT
-------------------------------------------------------------------
call dbms_stats.gather_database_stats_job_proc ( )

SQL>se long 10000
SQL> select SQL_FULLTEXT from v$sql where sql_id='4m477zx25xrgq';
SQL_FULLTEXT
--------------------------------------------------------------------
/* SQL Analyze(1) */ select /*+ full(t) no_parallel(t) no_parallel_index(t)
dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monito
....
from "BATCH"."CDR" t
 where TBL$OR$IDX$PART$NUM("BATCH"."CDR",0,3,0,"ROWID") = :objn /*
 NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL
,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV
,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,ACL,NIL,NIL*/

The blocking session ID 5205 on node 2 is running a weekend maintenance job –“Optimizer Statistics Gathering”. it is analysing partition table  CDR.

There are a lot of discussions about “library cache lock” from DBMS_STATS gather of stats for a subpartition.

20180204002507

19790972 – “library cache lock” waits due to DBMS_STATS gather of stats for a sub partition

Workaround

Under the approval of Oracle Support,  you can try to change below parameter value from 15 mins to longer time.

SQL>alter system set "_kgl_time_to_wait_for_locks'=60 scope=spfile;

SQL>shutdown immediate;

SQL>startup

ORMB Batch Scheduling Job Hangs At F1-FLUSH with “PL/SQL Lock Timer” Wait Event

Overnight ORMB Batch Scheduling Job has been running for more than twenty hours, user wants to know why ?

1) Check which job at what step is still running and hanging.

SQL> select OWNER,JOB_NAME,JOB_SUBNAME,SESSION_ID,ELAPSED_TIME,CPU_USED 
     from dba_scheduler_running_jobs;

OWNER   JOB_NAME   JOB_SUBNAME  SESSION_ID ELAPSED_TIME  CPU_USED
------- ---------- ------------ ---------- ----------- ---------------
SCHADM  DAILY_JOBS F1_FLUSH_JOB 1006       +000 21:47:00.08 +00000:00:22.15

2) Check session status.

SQL> select USERNAME,STATUS,EVENT,sql_id,LAST_CALL_ET 
      from gv$session where sid=1006;

USERNAME  STATUS EVENT               SQL_ID           LAST_CALL_ET
--------- ------ ------------------  ---------------- -----------
SCHADM    ACTIVE PL/SQL lock timer   fnfa105xssacx    78949

3) Check which sql is the running with ‘PL/SQL lock timer’.

SQL> set long 20000
SQL> select SQL_FULLTEXT from v$sql where sql_id='fnfa105xssacx';

SQL_FULLTEXT
---------------------------------------------------------------------
DECLARE job BINARY_INTEGER := :job; next_date TIMESTAMP WITH TIME ZONE := :myda
te; broken BOOLEAN := FALSE; job_name VARCHAR2(30) := :job_name; job_subname
VARCHAR2(30) := :job_subname; job_owner VARCHAR2(30) := :job_owner; job_start
TIMESTAMP WITH TIME ZONE := :job_start; job_scheduled_start TIMESTAMP WITH TIME
 ZONE := :job_scheduled_start; window_start TIMESTAMP WITH TIME ZONE := :window
_start; window_end TIMESTAMP WITH TIME ZONE := :window_end; chain_id VARCHAR2(
14) := :chainid; credential_owner varchar2(30) := :credown; credential_name
varchar2(30) := :crednam; destination_owner varchar2(30) := :destown; destinat
ion_name varchar2(30) := :destnam; job_dest_id varchar2(14) := :jdestid; log_i
d number := :log_id; BEGIN DECLARE
 BATCH_CODE CHAR(8);
 USER_ID CHAR(8);
 THREAD_COUNT NUMBER;
 BUSINESS_DATE DATE;
 MAX_TIMEOUT_MINS NUMBER;
 RERUN_NUMBER NUMBER;
 THREAD_POOL VARCHAR2(254);
 MAIL_ID VARCHAR2(254);
 MIN_RUN_STATUS CHAR(2);
 RAISE_ERROR BOOLEAN;
 THREAD_NOTIFICATIONS BOOLEAN;
 SOFT_PARM_MAP CISADM.OUAF_BATCH.PARM_MAP_T;
 SOFT_PARM_LIST CISADM.OUAF_BATCH.PARM_LIST_T;
BEGIN
 BATCH_CODE := 'F1-FLUSH';
 USER_ID := 'JAMES';
 --THREAD_COUNT := 0;
 --BUSINESS_DATE := to_date('2016-01-31','yyyy-MM-dd');
 MAX_TIMEOUT_MINS := 0;
 RERUN_NUMBER := 0;
 THREAD_POOL := 'ORMB_BATCH';
 --MIN_RUN_STATUS := NULL;
 --RAISE_ERROR := NULL;
 -- THREAD_NOTIFICATIONS := true;
 --SOFT_PARM_MAP := NULL;
 --SOFT_PARM_LIST := CISADM.OUAF_BATCH.PARM_LIST_T('DIST-THD-POOL','');
 OUAF_BATCH.SUBMIT_JOB(
 BATCH_CODE => BATCH_CODE,
 USER_ID => USER_ID,
 THREAD_COUNT => THREAD_COUNT,
 BUSINESS_DATE => BUSINESS_DATE,
 MAX_TIMEOUT_MINS => MAX_TIMEOUT_MINS,
 RERUN_NUMBER => RERUN_NUMBER,
 THREAD_POOL => THREAD_POOL,
 MIN_RUN_STATUS => MIN_RUN_STATUS,
 RAISE_ERROR => RAISE_ERROR,
 THREAD_NOTIFICATIONS => THREAD_NOTIFICATIONS,
 SOFT_PARM_MAP => SOFT_PARM_MAP,
 SOFT_PARM_LIST => SOFT_PARM_LIST
 );
END; :mydate := next_date; IF broken THEN :b := 1; ELSE :b := 0; END IF; END;

4) ‘PL/SQL Lock Timer’ event ( Doc ID 1476088.1 )

"PL/SQL Lock Timer"

This event is called through the DBMSLOCK.SLEEP procedure or USERLOCK.SLEEP procedure. This event is most likely to come from user written stored procedures. The ‘idle’ event ‘PL/SQL lock timer’ is worth watching because it points us in the direction of application response, throughput, and possible coding issues. The PL/SQL lock timer event is, as the command that issues it states, a sleep mandated by the application code. The application is idle and doing nothing. This means that if the application sleeps for a combined interval of 5 minutes, the response to the user or complete run of the application will take at least 5 minutes longer.
While this event does not require resources and thus is not an Oracle resource contention issue, it is our job to provide the quickest response to users and push applications through our system to increase the user’s perceived performance. Most often, sleeps are put into an application for serialization of transactional events or spinning on queues until something happens. When the event occurs or a queue is populated, then the application continues its work. We should ask ourselves why an application is sleeping for any amount of time and provide alternatives if necessary to reduce this idle event. You will gain quicker responses if you trigger an action instead of the action waiting for an event.

This event is called through the DBMSLOCK.SLEEP procedure or 
USERLOCK.SLEEP procedure. This event is most likely to come from 
user written stored procedures.

5) Sent the details to vendor why F1-Flush has “PL/SQL lock timer’  wait event forever.