Same SQL with Significant Different Response Time on RAC Nodes

SITUATION

Same SQL query running on node 1 takes 2.5 times longer time than running on node 2.

Both Unix administrator and DBA have checked and confirmed the work load are same on both nodes. All client connections to RAC databases are load balanced through RAC service.

INVESTIGATIONS

1) Open a new secession, then run the same SQL on node 1. There is no GC events on node 1 and 2:

–check current session SID:

SQL> SELECT SID, SERIAL#
     FROM   V$SESSION
     WHERE  SID = Sys_Context('USERENV', 'SID'); 

SID        SERIAL#
---------- ----------
306         24423

— Check session stats:

SQL> SELECT a.username, c.name, b.value
     FROM   
            v$session a, 
            v$sesstat b, 
            v$statname c
WHERE a.sid = b.sid
and a.sid=306
AND b.statistic# = c.statistic#
order by name; 

USERNAME NAME                                    VALUE
 ------- --------------------------------------  ----------
SYS      CPU used by this session                2419
SYS      CPU used when call started              2418
..
.

SYS     gc CPU used by this session                 0
SYS     gc blocks compressed                        0
SYS     gc blocks corrupt                           0
SYS     gc blocks lost                              0
SYS     gc claim blocks lost                        0
SYS     gc cr block build time                      0
SYS     gc cr block flush time                      0
SYS     gc cr block receive time                    0
SYS     gc cr block send time                       0
SYS     gc cr blocks received                       0
SYS     gc cr blocks served                         0
SYS     gc current block flush time                 0
SYS     gc current block pin time                   0
SYS     gc current block receive time               0
SYS     gc current block send time                  0
SYS     gc current blocks received                  0
SYS     gc current blocks served                    0
SYS     gc force cr disk read                       0
SYS     gc kbytes saved                             0
SYS     gc kbytes sent                              0
SYS     gc local grants                             0
SYS     gc read wait failures                       0
SYS     gc read wait time                           0
SYS     gc read wait timeouts                       0
SYS     gc read waits                               0
SYS     gc reader bypass grants                     0
SYS     gc remote grants                            0

— node 2

USERNAME NAME                                  VALUE
------- -------------------------------------- ----------
SYS     CPU used by this session               1006
SYS     CPU used when call started             1007
..
.

SYS    gc CPU used by this session                0
SYS    gc blocks compressed                       0
SYS    gc blocks corrupt                          0
SYS    gc blocks lost                             0
SYS    gc claim blocks lost                       0
SYS    gc cr block build time                     0
SYS    gc cr block flush time                     0
SYS    gc cr block receive time                   0
SYS    gc cr block send time                      0
SYS    gc cr blocks received                      0
SYS    gc cr blocks served                        0
SYS    gc current block flush time                0
SYS    gc current block pin time                  0
SYS    gc current block receive time              0
SYS    gc current block send time                 0
SYS    gc current blocks received                 0
SYS    gc current blocks served                   0
SYS    gc force cr disk read                      0
SYS    gc kbytes saved                            0
SYS    gc kbytes sent                             0
SYS    gc local grants                            0
SYS    gc read wait failures                      0
SYS    gc read wait time                          0
SYS    gc read wait timeouts                      0
SYS    gc read waits                              0
SYS    gc reader bypass grants                    0
SYS    gc remote grants                           0

2) SET AUTOTRACE ON TRACEONLY | STATISTICS.

“consistent gets” means reading from buffer cache.

SQL> SET AUTOTRACE ON STATISTICS
SQL> @sqlscript.sql

no rows selected

Statistics
----------------------------------------------------------
0       recursive calls
0       db block gets
3038815 consistent gets
0       physical reads
0       redo size
5076    bytes sent via SQL*Net to client
4264    bytes received via SQL*Net from client
1       SQL*Net roundtrips to/from client
6       sorts (memory)
0       sorts (disk)
0       rows processed

3)tkprof shows the Fetch time.

–node 1

call    count  cpu      elapsed  disk  query   current    rows
------- ------ -------- -------- ----- ------- ---------- ------
Parse   2      0.10     0.10     0     0       0          0
Execute 3      0.00     0.00     0     0       0          0
Fetch   1     24.29     24.55    0     3038773 0          0
------- ----- -------- -------- ---- ------- -------- ----------
total   6     24.40     24.65    0    3038773  0          0

— node 2

call    count  cpu      elapsed  disk  query   current    rows
------- ------ -------- -------- ----- ------- ---------- -------
Parse   2      0.04     0.04     0     0       0           0
Execute 3      0.00     0.00     0     0       0           0
Fetch   1      9.86     9.91     0    3038773  0           0
------- ----- -------- -------- ---- ------- -------- ----------
total   6      9.91     9.95     0    3038773  0           0

4) Run a shell script to confirm node 1 CPU is not performing good as node 2:

— node 1 ( 136 seconds )

$ date; i=0; while (( i < 10000000 )); do (( i ++ )); done; date

Thu Sep 27 14:51:58 AEST 2018
Thu Sep 27 14:54:14 AEST 2018

— node 2 ( 52 seconds )

$ date; i=0; while (( i < 10000000 )); do (( i ++ )); done; date

Thu Sep 27 14:52:10 AEST 2018
Thu Sep 27 14:53:02 AEST 2018

5) Shutdown both GI and RAC, the issue still exists.

6) After rebooted the server, everything goes back to normal. Both node 1 and node 2 perform the same.

CAUSES

So it should be OS kernel issue.

SITUATION

INVESTIGATIONS

CAUSES

Share this:

Related

Leave a comment Cancel reply