Server Side Transparent Application Failover for Oracle RAC

TAF setting on service overrides any TAF settings on the client side configurations.

Concept & Overview 

The TAF setting on a service overrides any TAF settings in the client connection definition that might be configured in TNS connect descriptors. If the failover type is set on the server side, then the failover method will default to BASIC.

Oracle Call Interface applications with TAF enabled should use FAN high availability events for fast connection failover ( FCF ).

TAF is a feature of the OCI driver, and TAF cannot be used with thin driver

There are two types of TAF available, SESSION and SELECT.

SESSION: Session Failover re-creates the connections and sessions to the surviving instance.

SELECT: In addition to recreating the session, Select Failover also replays the queries that were in progress.

There are two methods in which TAF establishes the failover connection, BASIC and PRECONNECT.

BASIC: The second connection is re-established only after the first connection to the instance failed.

PRECONNECT: Two connections are established when the client logs in to the database. A login to database will create two connection at the same time. For this to work, clusterware actually starts two services. One main service and another shadow service.

The tests are held under the environment as below,  both 11gR2 database and client are used to test the behaviors of server side TAF by using SCAN.

                     GI: 12.1.0.2.0
4 nodes RAC: 11.2.0.4
Oracle Client: 11.2.0.4

The service name is RAC_TEST running on preferred nodes node1 and node4, while available nodes are node2 and node3.

$ srvctl status service -s RAC_TEST -d TESTDB

Service RAC_TEST is running on instance(s) TESTDB1,TESTDB4

Steps for Testing Server Side TAF

1) Create a service if not yet, here we modify it because the service has already existed.

$ srvctl modify service -s RAC_TEST -d TESTDB -P BASIC -e SELECT -m BASIC -w 120 -z 5

2) Start the service if not yet.

$ srvctl start service -s RAC_TEST -d TESTDB

3) Check service is running

$srvctl status service -s RAC_TEST -d TESTDB
Service RAC_TEST is running on instance(s) TESTDB1,TESTDB4

4) Check service from database query.

SQL>col name format a15
col failover_method format a11 heading 'METHOD'
col failover_type format a10 heading 'TYPE'
col failover_retries format 9999999 heading 'RETRIES'
col goal format a10
col clb_goal format a8
col AQ_HA_NOTIFICATIONS format a5 heading 'AQNOT'

SQL> select name, failover_method, failover_type,
failover_retries,goal, clb_goal,aq_ha_notifications
from dba_services
where name='RAC_TEST';

NAME METHOD TYPE RETRIES GOAL CLB_GOAL AQNOT
-------- ------- ------ -------- ----- -------- -----
RAC_TEST BASIC SELECT 5 NONE LONG NO

5) Create a service name into TNS entry.

SERVER_TAF =
(DESCRIPTION = (LOAD_BALANCE = YES )
(ADDRESS = (PROTOCOL = TCP)(HOST =CLU-SCAN)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = RAC_TEST)
)
)

6) Make a connection to database by using above TNS entry.

SQL> connect testuser@server_taf
Enter password:
Connected.

SQL> select host_name,instance_name from v$instance;

HOST_NAME INSTANCE_NAME
----------- ----------------
racnode4 TESTDB4
SQL> select INST_ID,sid,serial#,USERNAME,FAILOVER_TYPE,
FAILOVER_METHOD,FAILED_OVER,status  
from gv$session
where username='TESTUSER';

INST_ID SID SERIAL# USERNAME TYPE METHOD FAI STATUS
-------- ---- ------- --------- ------ ------- ----- --------
4 380 4569 TESTUSER SELECT BASIC NO INACTIVE

7) Shutdown the instance or the service on the node the connection has made to, then we can see session failed over successfully.

SQL> select host_name,instance_name from v$instance;

HOST_NAME INSTANCE_NAME
--------- ----------------
racnode1  TESTDB1

SQL>select INST_ID,sid,serial#,USERNAME,FAILOVER_TYPE,
FAILOVER_METHOD,FAILED_OVER,status
from gv$session
where username='TESTUSER';

INST_ID SID SERIAL# USERNAME TYPE METHOD FAI STATUS
--------- ---- ------- -------- ------ ------- --- --------
1 250 971 TESTUSER SELECT BASIC YES INACTIVE

Client Side Transparent Application Failover for Oracle RAC

TAF – Transparent Application Failover is a very useful feature in mission critical environment.

TAF – – Transparent Application Failover is a feature of the OCI driver, and TAF cannot be used with thin driver. The client side TAF can be configured through client connection definitions in TNS connect descriptors.

The tests are held under the following environment,  both 11gR2 database and client are used to test the behaviors of client side TAF by using SCAN.

                    GI: 12.1.0.2.0
4 nodes RAC: 11.2.0.4
Oracle Client: 11.2.0.4

The service name is RAC_TEST running on node1 and node4, while available nodes are node2 and node3.

$ srvctl status service -s RAC_TEST -d TESTDB
Service RAC_TEST is running on instance(s) TESTDB1,TESTDB4

There are two TYPES of TAF available, SESSION and SELECT.

SESSION: Session Failover re-creates the connections and sessions to the surviving instance.

SELECT: In addition to recreating the session, Select Failover also replays the queries that were in progress.

There are two METHODS in which TAF establishes the failover connection, BASIC and PRECONNECT.

BASIC: The second connection is re-established only after the first connection to the instance or service failed.

PRECONNECT: Two connections are established when the client logs in to the database. A login to database will create two connection at the same time. For this to work, clusterware actually starts two services. One main service and another shadow service.

Client Side TAF Configuration — BASIC

For non-TAF database connection, by query gv$session to get client side TAF connection details, which demonstrates the server side TAF is not configured. That is what we expected in this case.

SQL>select INST_ID,USERNAME,FAILOVER_TYPE,FAILOVER_METHOD,FAILED_OVER
from gv$session
where username='TESTUSER';

INST_ID USERNAME FAILOVER_TYPE FAILOVER_M FAILED_OVER ------- --------- -------------- ----------- -------------
4 TESTUSER NONE NONE NO

Use the following TNS entry to test client side TAF with BASIC method.

CLIENT_TAF =
( DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST =CLU-SCAN)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = RAC_TEST)
(FAILOVER_MODE= (TYPE=select)(METHOD=basic)
(RETRIES=20)(DELAY=15))
)
)

1) Make a database connection by using TNS entry CLIENT_TAF, and run a long query :

SQL>select INST_ID,sid,serial#,USERNAME,FAILOVER_TYPE,
           FAILOVER_METHOD,FAILED_OVER 
      from gv$session 
where username='TESTUSER';

INST_ID SID SERIAL# USERNAME FAILOVER_TYPE FAILOVER_M FAI -------- ----- -------- --------- ------------- ----------- --- 4 70 11171 TESTUSER SELECT BASIC NO

2) Manually kill the session ( only works when session is ACTIVE, otherwise it is not working for killing from testing ).

SQL> alter system kill session '70,11171,@4';

System altered.

 Query the session status, we can see a new failed over session with different ‘SID, serial#’ created, and the query is still going until completed.

 SQL> select INST_ID,sid,serial#,USERNAME,FAILOVER_TYPE,
             FAILOVER_METHOD,FAILED_OVER 
from gv$session
where username='TESTUSER';

INST_ID SID SERIAL# USERNAME FAILOVER_TYPE FAILOVER_M FAILED_OVER -------- ---- ------- -------- ------------- --------- ------------ 4 366 14271 TESTUSER SELECT BASIC YES

3) Instead of killing session in step 2, we manually stop the service on node 4 to simulate instance crash. (  it works for both ACTIVE and INACTIVE sessions ).

INST_ID SID SERIAL# USERNAME FAILOVER_TYPE  FAILOVER_M FAI
------- --- ------- -------- -------------- ---------- ---
4 70 11185 TETSUSER SELECT BASIC NO

Now stop the service on node4, expect the session will failover onto survival node1 .

$ srvctl stop service -s RAC_TEST -i TESTDB4 -d TESTDB -f
SQL> select INST_ID,sid,serial#,USERNAME,FAILOVER_TYPE,
FAILOVER_METHOD,FAILED_OVER
from gv$session
where username='TESTUSER';

INST_ID SID SERIAL# USERNAME FAILOVER_TYPE FAILOVER_M FAI
------- --- -------- --------- -------------- ---------- ---
1 367 8663 TETSUSER SELECT BASIC YES

restart the service on node4 for more tests.

$ srvctl start service -s RAC_TEST -i TESTDB4 -d TESTDB

Client Side TAF Configuration — PRECONNECT

Two tnsname entries are prepared with following contents:

PRIMARY=
( DESCRIPTION=(ADDRESS=
(PROTOCOL=tcp)(HOST=clu-scan)(PORT=1521))
(CONNECT_DATA=(SERVICE_NAME=RAC_TEST)
(FAILOVER_MODE=(BACKUP=SECONDARY)(TYPE=select)
(METHOD=preconnect))
)
)

SECONDARY=(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=clu-scan)(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=RAC_TEST)
(FAILOVER_MODE=
(BACKUP=PRIMARY)
(TYPE=select)
(METHOD=preconnect))
)
)

1) Make a database connection by using TNS entry above, and run a long query :

SQL> select INST_ID,sid,serial#,USERNAME,FAILOVER_TYPE,
FAILOVER_METHOD,FAILED_OVER,status
from gv$session
where username='TESTUSER';

INST_ID SID SERIAL# USERNAME FAILOVER_TYPE FAILOVER_M FAI STATUS
------- ---- ------ ---------- ------------ ----------- --- -------- 1 42 8485 TESTUSER NONE NONE NO ACTIVE 4 366 14375 TESTUSER SELECT PRECONNECT NO INACTIVE

2) Manually stop service RAC_TEST on node 4.

$ srvctl stop service -s RAC_TEST -i TESTDB4 -d TESTDB -f

3) The preconnected session takes over the session and continue the query until completion.

SQL> select INST_ID,sid,serial#,USERNAME,FAILOVER_TYPE,
FAILOVER_METHOD,FAILED_OVER,status
from gv$session
where username='TESTUSER';

INST_ID SID SERIAL# USERNAME FAILOVER_TYPE FAILOVER_M FAI STATUS ------- --- ------- ---------- ------------ ------------- --- -------- 1 42 8485 TESTUSER SELECT PRECONNECT YES ACTIVE

Restart service on node4 for next test.

$ srvctl start service -s RAC_TEST -i -i TESTDB4 -d TESTDB

4) if you kill this PRECONNECT session again , TAF still works again as preconnect.

SQL> alter system kill session '42,8485,@1';

System altered.
SQL> select INST_ID,sid,serial#,USERNAME,FAILOVER_TYPE,
FAILOVER_METHOD,FAILED_OVER,status
from gv$session
where username='TESTUSER';

INST_ID SID SERIAL# USERNAME FAILOVER_TYPE FAILOVER_M FAI STATUS
------- --- ------- --------- ------------ ----------- --- ------
4 265 1563 TESTUSER SELECT PRECONNECT YES ACTIVE

TAF – Transparent Application Failover is a very useful feature in mission critical environment, and it should be used whenever it is available and possible.