SITUATION
Same SQL query running on node 1 takes 2.5 times longer time than running on node 2.
Both Unix administrator and DBA have checked and confirmed the work load are same on both nodes. All client connections to RAC databases are load balanced through RAC service.
INVESTIGATIONS
1) Open a new secession, then run the same SQL on node 1. There is no GC events on node 1 and 2:
–check current session SID:
SQL> SELECT SID, SERIAL#
FROM V$SESSION
WHERE SID = Sys_Context('USERENV', 'SID');
SID SERIAL#
---------- ----------
306 24423
— Check session stats:
SQL> SELECT a.username, c.name, b.value
FROM
v$session a,
v$sesstat b,
v$statname c
WHERE a.sid = b.sid
and a.sid=306
AND b.statistic# = c.statistic#
order by name;
USERNAME NAME VALUE
------- -------------------------------------- ----------
SYS CPU used by this session 2419
SYS CPU used when call started 2418
..
.
SYS gc CPU used by this session 0
SYS gc blocks compressed 0
SYS gc blocks corrupt 0
SYS gc blocks lost 0
SYS gc claim blocks lost 0
SYS gc cr block build time 0
SYS gc cr block flush time 0
SYS gc cr block receive time 0
SYS gc cr block send time 0
SYS gc cr blocks received 0
SYS gc cr blocks served 0
SYS gc current block flush time 0
SYS gc current block pin time 0
SYS gc current block receive time 0
SYS gc current block send time 0
SYS gc current blocks received 0
SYS gc current blocks served 0
SYS gc force cr disk read 0
SYS gc kbytes saved 0
SYS gc kbytes sent 0
SYS gc local grants 0
SYS gc read wait failures 0
SYS gc read wait time 0
SYS gc read wait timeouts 0
SYS gc read waits 0
SYS gc reader bypass grants 0
SYS gc remote grants 0
— node 2
USERNAME NAME VALUE
------- -------------------------------------- ----------
SYS CPU used by this session 1006
SYS CPU used when call started 1007
..
.
SYS gc CPU used by this session 0
SYS gc blocks compressed 0
SYS gc blocks corrupt 0
SYS gc blocks lost 0
SYS gc claim blocks lost 0
SYS gc cr block build time 0
SYS gc cr block flush time 0
SYS gc cr block receive time 0
SYS gc cr block send time 0
SYS gc cr blocks received 0
SYS gc cr blocks served 0
SYS gc current block flush time 0
SYS gc current block pin time 0
SYS gc current block receive time 0
SYS gc current block send time 0
SYS gc current blocks received 0
SYS gc current blocks served 0
SYS gc force cr disk read 0
SYS gc kbytes saved 0
SYS gc kbytes sent 0
SYS gc local grants 0
SYS gc read wait failures 0
SYS gc read wait time 0
SYS gc read wait timeouts 0
SYS gc read waits 0
SYS gc reader bypass grants 0
SYS gc remote grants 0
2) SET AUTOTRACE ON TRACEONLY | STATISTICS.
“consistent gets” means reading from buffer cache.
SQL> SET AUTOTRACE ON STATISTICS
SQL> @sqlscript.sql
no rows selected
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3038815 consistent gets
0 physical reads
0 redo size
5076 bytes sent via SQL*Net to client
4264 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
0 rows processed
3)tkprof shows the Fetch time.
–node 1
call count cpu elapsed disk query current rows
------- ------ -------- -------- ----- ------- ---------- ------
Parse 2 0.10 0.10 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 1 24.29 24.55 0 3038773 0 0
------- ----- -------- -------- ---- ------- -------- ----------
total 6 24.40 24.65 0 3038773 0 0
— node 2
call count cpu elapsed disk query current rows
------- ------ -------- -------- ----- ------- ---------- -------
Parse 2 0.04 0.04 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 1 9.86 9.91 0 3038773 0 0
------- ----- -------- -------- ---- ------- -------- ----------
total 6 9.91 9.95 0 3038773 0 0
4) Run a shell script to confirm node 1 CPU is not performing good as node 2:
— node 1 ( 136 seconds )
$ date; i=0; while (( i < 10000000 )); do (( i ++ )); done; date
Thu Sep 27 14:51:58 AEST 2018
Thu Sep 27 14:54:14 AEST 2018
— node 2 ( 52 seconds )
$ date; i=0; while (( i < 10000000 )); do (( i ++ )); done; date
Thu Sep 27 14:52:10 AEST 2018
Thu Sep 27 14:53:02 AEST 2018
5) Shutdown both GI and RAC, the issue still exists.
6) After rebooted the server, everything goes back to normal. Both node 1 and node 2 perform the same.
CAUSES
So it should be OS kernel issue.