Blog

CRS-4535: Cannot communicate with Cluster Ready Services

Two nodes 12.2.0.1 GI clusterware is not healthy with warning as below, databases and services are all still available for applications to connect to.

$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

$ crsctl check cluster -all
**************************************************************
racnode1:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
racnode2:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

crsd process is still running on both nodes:

-- on racnode1
$ ps -eaf | grep -i crsd
root 5482 1 0 Mar01 ? 00:00:00 /u01/app/12.2.0.1/grid/bin/crsd.bin reboot

-- on racnode2
$ ps -eaf | grep -i crsd
root 66210 1 0 23:39 ? 00:00:02 /u01/app/12.2.0.1/grid/bin/crsd.bin reboot

Subscribe to get access

Read more of this content when you subscribe today.

How to Download Oracle Patches From Oracle Support By Using WGET or CURL

1) Set up global variables :

export MOS_USER=testuser@domain.com
export MOS_PASSWORD="Password"

export PROXYUSER="testuser"
export PROXYPASSWD="Password123"
export USERAGENT="Mozilla/5.0"
export use_proxy=on
export http_proxy="http://proxy.domain.com:80/"
export https_proxy="https://proxy.domain.com:80/"
export COOK=$HOME/$.cookie

2) The following command to authenticate uses HTTP/HTTPS:

$wget --proxy-user=${PROXYUSER} --proxy-password=${PROXYPASSWD} 
--http-user=${MOS_USER} --http-password=${MOS_PASSWORD} 
--save-cookies=$COOK --keep-session-cookies --no-check-certificate 
"https://updates.oracle.com/Orion/Services/download" 
-no-verbose
2018-02-26 14:35:17 URL:https://updates.oracle.com/Orion/Services/download 
[118] -> "download.1" [1]

3) Get all supported platforms and language codes:

a) Output the query result into a temp file:

$wget --proxy-user=${PROXYUSER} --proxy-password=${PROXYPASSWD} 
--no-check-certificate --load-cookies=$COOK 
"https://updates.oracle.com/Orion/SavedSearches/switch_to_simple" 
-O $HOME/output.tmp -q

$ ls -l $HOME/output.tmp
-rw-r----- 1 testuser users 4528477 Feb 26 14:41 /home/testuser/output.tmp

b) Extract the Platform and Language Code, here we are only interested in platform “226P<—>Linux x86-64″ with default English language:

$ grep -A200 "<select name=plat_lang"  /home/testuser/output.tmp |
 grep "^<option"|awk -F "[\">]" '{print $2 "<--->" $4}' |
 |grep -v "<--->selected"

537P<--->Acme Packet 1100
529P<--->Acme Packet 3820
...
..
.
541P<--->Linux ARM 64-bit
214P<--->Linux Itanium
525P<--->Linux SPARC
46P<--->Linux x86
226P<--->Linux x86-64
912P<--->Microsoft Windows (32-bit)
208P<--->Microsoft Windows Itanium (64-bit)
539P<--->Microsoft Windows Phone
233P<--->Microsoft Windows x64 (64-bit)
...
..
.
117L<--->Traditional Chinese (ZHT)
116L<--->Turkish (TR)
37L<--->UK English (GB)
39L<--->Ukrainian (UK)
43L<--->Vietnamese (VN)
999L<--->Worldwide Spanish (ESW)

4) Get URLs of patch 6880880 for Linux X86-64 platform.

$wget --proxy-user=${PROXYUSER} --proxy-password=${PROXYPASSWD} 
--no-check-certificate --load-cookies=$COOK 
"https://updates.oracle.com/Orion/SimpleSearch/process_form?search_type
=patch&patch_number=6880880&plat_lang=226P" -O $HOME/output1.tmp -q

$ ls -ltr $HOME/output1.tmp
-rw-r----- 1 tetsuser users 4544310 Feb 26 15:44 /home/testuser/output1.tmp

$ grep "Download/process_form" output1.tmp | sed 's/ //g' | sed "s/.*href=\"//g" | sed "s/\".*//g"
https://updates.oracle.com/Orion/Download/process_form/p6880880_139000_Generic.zip?aru=21939900&file_id=98828928&patch_file=p6880880_139000_Generic.zip&
https://updates.oracle.com/Orion/Download/process_form/p6880880_112000_Linux-x86-64.zip?aru=21895918&file_id=64217272&patch_file=p6880880_112000_Linux-x86-64.zip&
https://updates.oracle.com/Orion/Download/process_form/p6880880_121010_Linux-x86-64.zip?aru=21886824&file_id=65461237&patch_file=p6880880_121010_Linux-x86-64.zip&
https://updates.oracle.com/Orion/Download/process_form/p6880880_122010_Linux-x86-64.zip?aru=21885985&file_id=96948775&patch_file=p6880880_122010_Linux-x86-64.zip&
https://updates.oracle.com/Orion/Download/process_form/p6880880_132000_Generic.zip?aru=17856597&file_id=72275045&patch_file=p6880880_132000_Generic.zip&
https://updates.oracle.com/Orion/Download/process_form/p6880880_111000_Linux-x86-64.zip?aru=19416466&file_id=26541776&patch_file=p6880880_111000_Linux-x86-64.zip&
https://updates.oracle.com/Orion/Download/process_form/p6880880_131000_Generic.zip?aru=16531511&file_id=62900088&patch_file=p6880880_131000_Generic.zip&
https://updates.oracle.com/Orion/Download/process_form/p6880880_101000_Linux-x86-64.zip?aru=13915384&file_id=42098007&patch_file=p6880880_101000_Linux-x86-64.zip&
https://updates.oracle.com/Orion/Download/process_form/p6880880_102000_Linux-x86-64.zip?aru=13116068&file_id=34545782&patch_file=p6880880_102000_Linux-x86-64.zip&

5) Download the patch by using URL from step 4):

curl:

$ curl -b $COOK -c $COOK --insecure --output p6880880_122010_Linux-x86-64.zip 
-L "https://updates.oracle.com/Orion/Download/process_form/p6880880_122010_Linux-x86-64.zip?aru=21885985&file_id=96948775&patch_file=p6880880_122010_Linux-x86-64.zip&"

  % Total % Received % Xferd Average Speed Time Time Time Current
 Dload Upload Total Spent Left Speed
100 90.8M 100 90.8M 0 0 5008k 0 0:00:18 0:00:18 --:--:-- 20.7M

wget:

$ wget --load-cookies=$COOK --save-cookies=$COOK --keep-session-cookies 
--no-check-certificate -O p6880880_122010_Linux-x86-64.zip 
"https://updates.oracle.com/Orion/Download/process_form/p6880880_122010_Linux-x86-64.zip?aru=21885985&file_id=96948775&patch_file=p6880880_122010_Linux-x86-64.zip&"
...
..
.
Proxy request sent, awaiting response... 200 OK
Length: 95262503 (91M) [application/zip]
Saving to: `p6880880_122010_Linux-x86-64.zip'

100%[=================================================================================================>] 95,262,503 21.9M/s in 16s

2018-02-26 17:57:20 (5.65 MB/s) - `p6880880_122010_Linux-x86-64.zip' saved [95262503/95262503]

6) Validate the download zip file:

OPatch patch of version 12.2.0.1.12 for Oracle software releases 12.1.0.x 
(installer) and 12.2.0.x (JAN 2018) (Patch)

p6880880_122010_Linux-x86-64.zip90.8 MB(95262503 bytes)
 
MD508D733176A76D99547CDC5ABF7DEF192
 
SHA-14B4EE360C1AF6515CC18F9C36B3AD06EF64B5E0D
 
SHA-2565BD98A31C8E134DFF1DE833FFA0834D62C606036A1626AF6ED529854D215707F

a) “unzip -t”

$ unzip -t p6880880_122010_Linux-x86-64.zip
Archive: p6880880_122010_Linux-x86-64.zip
 testing: OPatch/ OK
 testing: OPatch/operr.bat OK
 testing: OPatch/opatch_env.sh OK
...
..
.
No errors detected in compressed data of p6880880_122010_Linux-x86-64.zip.

b) MD5 “md5sum”

$ md5sum p6880880_122010_Linux-x86-64.zip
08d733176a76d99547cdc5abf7def192 p6880880_122010_Linux-x86-64.zip

c) SHA-1 “sha1sum”

$ sha1sum p6880880_122010_Linux-x86-64.zip
4b4ee360c1af6515cc18f9c36b3ad06ef64b5e0d p6880880_122010_Linux-x86-64.zip

d)SHA-256 “sha256sum”

$ sha256sum p6880880_122010_Linux-x86-64.zip
5bd98a31c8e134dff1de833ffa0834d62c606036a1626af6ed529854d215707f p6880880_122010_Linux-x86-64.zip

CLSRSC-540: The root script failed to get a unique name of the OCR backup file on the ASM diskgroup

SYMPTOM

Rootupgrade.sh failed to upgrade GI from 12.1.0.2 to 12.2.0.1.

# /u01/app/12.2.0.1/grid/rootupgrade.sh
..
.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/12.2.0.1/grid/crs/install/crsconfig_params
The log of current session can be found at:
 /u01/app/grid/crsdata/racnode1/crsconfig/rootcrs_racnode1_2018-02-05_12-10-50AM.log
2018/02/05 12:11:10 CLSRSC-595: Executing upgrade step 1 of 19: 'UpgradeTFA'.
2018/02/05 12:11:10 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.
2018/02/05 12:12:44 CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.
2018/02/05 12:12:44 CLSRSC-595: Executing upgrade step 2 of 19: 'ValidateEnv'.
2018/02/05 12:13:00 CLSRSC-595: Executing upgrade step 3 of 19: 'GenSiteGUIDs'.
2018/02/05 12:13:03 CLSRSC-595: Executing upgrade step 4 of 19: 'GetOldConfig'.
2018/02/05 12:13:03 CLSRSC-464: Starting retrieval of the cluster configuration data
2018/02/05 12:13:16 CLSRSC-515: Starting OCR manual backup.
2018/02/05 12:13:38 CLSRSC-540: The root script failed to get a unique name of the OCR backup file on the ASM diskgroup using CLSTEST_backup12.1.0.2.0.ocr.
Died at /u01/app/12.2.0.1/grid/crs/install/oraocr.pm line 1353.
The command '/u01/app/12.2.0.1/grid/perl/bin/perl -I/u01/app/12.2.0.1/grid/perl/lib -I/u01/app/12.2.0.1/grid/crs/install /u01/app/12.2.0.1/grid/crs/install/rootcrs.pl -upgrade' execution failed

‘rootcrs_racnode1_2018-02-05_12-10-50AM’ under ‘/u01/app/grid/crsdata/racnode1/crsconfig’

2018-02-05 12:13:37: ORACLE_SID = +ASM1
2018-02-05 12:13:37: ORACLE_HOME = /u01/app/12.1.0.2/grid
2018-02-05 12:13:37: leftVersion=12.1.0.2.0; rightVersion=12.1.0.0.0
2018-02-05 12:13:37: [12.1.0.2.0] is higher than [12.1.0.0.0]
2018-02-05 12:13:37: Running as user grid: /u01/app/12.1.0.2/grid/bin/asmcmd --nocp find --type "OCRBACKUP" "*" "*CLSTEST_backup12.1.0.2.0.ocr*"
2018-02-05 12:13:37: s_run_as_user2: Running /bin/su grid -c ' echo CLSRSC_START; /u01/app/12.1.0.2/grid/bin/asmcmd --nocp find --type "OCRBACKUP" "*" "*CLSTEST_backup12.1.0.2.0.ocr*" '
2018-02-05 12:13:38: Removing file /tmp/rpd0KsVwL2
2018-02-05 12:13:38: Successfully removed file: /tmp/rpd0KsVwL2
2018-02-05 12:13:38: pipe exit code: 0
2018-02-05 12:13:38: /bin/su successfully executed

2018-02-05 12:13:38: Return code: 0 
Output: +OCR_VOTE1/CLSTEST/OCRBACKUP/clstest_backup12.1.0.2.0.ocr.276.967291997

2018-02-05 12:13:38: Reset the environment variable 'ORACLE_HOME' as: /u01/app/12.2.0.1/grid
2018-02-05 12:13:38: Failed to get the OMF name of the OCR backup file on DG.
2018-02-05 12:13:38: Executing cmd: /u01/app/12.2.0.1/grid/bin/clsecho -p has -f clsrsc -m 540 "CLSTEST_backup12.1.0.2.0.ocr"
2018-02-05 12:13:38: Command output:
> CLSRSC-540: The root script failed to get a unique name of the OCR backup file on the ASM diskgroup using CLSTEST_backup12.1.0.2.0.ocr. 
>End Command output
2018-02-05 12:13:38: CLSRSC-540: The root script failed to get a unique name of the OCR backup file on the ASM diskgroup using CLSTEST_backup12.1.0.2.0.ocr.
2018-02-05 12:13:38: ###### Begin DIE Stack Trace ######
2018-02-05 12:13:38: Package File Line Calling 
2018-02-05 12:13:38: --------------- -------------------- ---- ----------
2018-02-05 12:13:38: 1: main rootcrs.pl 287 crsutils::dietrap
2018-02-05 12:13:38: 2: oraClusterwareComp::oraocr oraocr.pm 1353 main::__ANON__
2018-02-05 12:13:38: 3: crsupgrade crsupgrade.pm 1370 oraClusterwareComp::oraocr::backupOcr4Restore
2018-02-05 12:13:38: 4: crsupgrade crsupgrade.pm 817 crsupgrade::get_oldconfig_info
2018-02-05 12:13:38: 5: crsupgrade crsupgrade.pm 485 crsupgrade::CRSUpgrade
2018-02-05 12:13:38: 6: main rootcrs.pl 296 crsupgrade::new
2018-02-05 12:13:38: ####### End DIE Stack Trace #######

CAUSE

It is a bug.  Apply patch 27440094: 12.1.0.2 TO 12.2.0.1 :UPGRADE IS FAILING WITH CLSRSC-540.

WORKAROUD

There are two workarounds for this issue.

a)Apply patch 27440094: 12.1.0.2 TO 12.2.0.1 :UPGRADE IS FAILING WITH CLSRSC-540.

or

b)Reboot all the cluster nodes, and rerun “rootupgrade.sh” on all RAC nodes again.

After either a) or b), then manually complete as per “How to Complete Grid Infrastructure Configuration Assistant(Plug-in) if OUI is not Available (Doc ID 1360798.1)”

ORA-16857: standby disconnected from redo source for longer than specified threshold

One single 11.2.0.4 instance Oracle database alert log shows below information;

 RFS[1]: No standby redo logfiles available for thread 1

Data Guard shows ORA-16857 error:

DGMGRL> show database "TESTSTY";

Database - TESTSTY

Role: PHYSICAL STANDBY
 Intended State: APPLY-ON
 Transport Lag: 10 minutes 32 seconds (computed 48 seconds ago)
 Apply Lag: 10 minutes 32 seconds (computed 48 seconds ago)
 Apply Rate: 39.97 MByte/s
 Real Time Query: OFF
 Instance(s):
 TESTSTY

Database Warning(s):
 ORA-16857: standby disconnected from redo source for longer than 
specified threshold

Database Status:
WARNING

Checked both primary and standby database, the  standby online redo logs have been created. But the size of the standby online redo logs are different from the database redo logs for both primary and secondary database.

— on standby :

SQL> select GROUP#,THREAD# ,BYTES/1024/1024 from v$standby_log;

GROUP#     THREAD#    BYTES/1024/1024
---------- ---------- ---------------
 4         1           50
 5         1           50
 6         1           50
 7         1           50

SQL> select GROUP#,THREAD#,BYTES/1024/1024 from v$log;

GROUP#     THREAD#    BYTES/1024/1024
---------- ---------- ---------------
 1         1          100
 3         1          100
 2         1          100

— On Primary

SQL> select GROUP#,THREAD# ,BYTES/1024/1024 from v$standby_log;

GROUP#     THREAD#    BYTES/1024/1024
---------- ---------- ---------------
 4         1          50
 5         1          50
 6         1          50
 7         1          50

SQL> select GROUP#,THREAD#,BYTES/1024/1024 from v$log;

GROUP#     THREAD#    BYTES/1024/1024
---------- ---------- ---------------
 1         1          100
 2         1          100
 3         1          100

Drop all standby online redo logs on both primary and standby databases, and recreate them again with same size as redo logfiles.

-- for standby db which is under recovery, recovery needs to be stopped first
SQL>alter database recover managed standby database cancel;

SQL>alter database add standby logfile thread 1 group 4 size 100m;

It will create standby online redo logs for one under +FRA, and another one under from “db_create_file_dest” parameter, if “db_create_online_log_dest_x” are not defined.

Finally restart the recovery process, then everything is fine.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE 
THROUGH ALL SWITCHOVER DISCONNECT USING CURRENT LOGFILE;

Database altered.

alert.log:

RFS[1]: Selected log 4 for thread 1 sequence 41436 dbid 1345227970 branch 816878594
Mon Feb 19 14:38:25 2018
..
.
Recovery of Online Redo Log: Thread 1 Group 4 Seq 41436 Reading mem 0
..
.

DGMGRL of DataGuard:

DGMGRL> show database 'TESTSTY';

Database - TESTSTY

Role: PHYSICAL STANDBY
 Intended State: APPLY-ON
 Transport Lag: 0 seconds (computed 0 seconds ago)
 Apply Lag: 0 seconds (computed 0 seconds ago)
 Apply Rate: 208.00 KByte/s
 Real Time Query: OFF
 Instance(s):
 TESTSTY

Database Status:
SUCCESS

Warning: ORA-16857: standby disconnected from redo source for longer than specified threshold

One single 11.2.0.4 instance Oracle database alert log shows following information;

 RFS[1]: No standby redo logfiles available for thread 1

Data Guard shows ORA-16857 error:

DGMGRL> show database "TESTSTY";

Database - TESTSTY

Role: PHYSICAL STANDBY
 Intended State: APPLY-ON
 Transport Lag: 10 minutes 32 seconds (computed 48 seconds ago)
 Apply Lag: 10 minutes 32 seconds (computed 48 seconds ago)
 Apply Rate: 39.97 MByte/s
 Real Time Query: OFF
 Instance(s):
 TESTSTY

Database Warning(s):
 ORA-16857: standby disconnected from redo source for longer than 
specified threshold

Database Status:
WARNING

Checked both primary and standby database, the  standby online redo logs have been created. It is strange to see some of the thread id is different between primary and secondary database.

— on standby :

SQL> select GROUP#,THREAD#,BYTES/1024/1024 ,ARCHIVED,STATUS
       from v$standby_log; 2

GROUP#     THREAD#    BYTES/1024/1024 ARC STATUS
---------- ---------- --------------- --- ----------
 4         1          50              NO UNASSIGNED
 5         0          50              YES UNASSIGNED
 6         0          50              YES UNASSIGNED
 7         0          50              YES UNASSIGNED

— On Primary

SQL> select GROUP#,THREAD#,BYTES/1024/1024 ,ARCHIVED,STATUS 
       from v$standby_log;

GROUP#     THREAD#    BYTES/1024/1024 ARC STATUS
---------- ---------- --------------- --- ----------
 4         0          50              YES UNASSIGNED
 5         0          50              YES UNASSIGNED
 6         0          50              YES UNASSIGNED
 7         0          50              YES UNASSIGNED

Drop all standby online redo logs on both primary and standby databases, and recreate then again by specify “thread 1” explicitly.

-- for standby db which is under recovery, recovery needs to be stopped first
SQL>alter database recover managed standby database cancel;

SQL>alter database add standby logfile thread 1 group 4 size 50m;

It will create standby online redo logs for one under +FRA, and another one under from “db_create_file_dest” parameter, if “db_create_online_log_dest_x” are not defined.

Finally restart the recovery process, everything is fine.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE 
THROUGH ALL SWITCHOVER DISCONNECT USING CURRENT LOGFILE;

Database altered.