Delphix Snapsync of an Oracle RAC dSource fails reporting ORA-01138 (KBA1778)
KBA
KBA#1778Applicable Delphix Versions
This article applies to the following versions of the Delphix Engine:
Major Release |
All Sub Releases |
5.2 |
5.2.2.0. 5.2.2.1, 5.2.3.0 |
5.1 |
5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0 |
5.0 |
5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1 ,5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4 |
Issue
During Delphix Snapsync operations against Oracle RAC dSources, RMAN is used to capture an SCN based incremental backup or a full backup at the first snapsync. In RAC environments RMAN requires all instances in the cluster be in the same startup state.
Oracle database can be in one of 4 states:
- Stopped: the instance is completely down.
- No Mount: the instance is started, processes are seen running and the shared memory segment is created.
- Mounted: the instance is started, the controlfile has been opened and the datafile locations identified.
- Open: The database is completely open and available to end users.
In order for RMAN to attach to an Oracle database and perform the backup associated with the dSource snapsync the database must be mounted or opened. In a RAC database all instances must in the same state, all must be mounted OR all must be open.
- For a non-standby dSource the database and all its instances will be in the open state.
- For standby database dSources running Active Data Guard all RAC instances would open read only with one instance performing managed recovery
- For regular standby database dSources (no active Data Guard) all RAC instances would be sitting in the mounted state.
Where RAC is running and the instances are not in the same startup phase or state and an RMAN backup is attempted RMAN will report the error ORA-01138.
Oracle describes this error as the following:
$ oerr ora 1138 01138, 00000, "database must either be open in this instance or not at all" // *Cause: The requested operation can not be done when the database is // mounted but not open in this instance, and another instance has // the database open. // *Action: Execute the operation in an open instance, open the database in // this instance, or close the database in the other instances.
The following SQL can be used to determine the state (STATUS) of each instance in the RAC database. The error appears when one instance's status does not match the others in the cluster.
SQL> select instance_number,instance_number,instance_name,status,database_status from gv$instance; INSTANCE_NUMBER INSTANCE_NUMBER INSTANCE_NAME STATUS DATABASE_STATUS --------------- --------------- ---------------- ------------ ----------------- 1 1 orcl121 MOUNTED ACTIVE 2 2 orcl122 OPEN ACTIVE
Troubleshooting
Reproducing the behaviour best demonstrates the problem, diagnostics for troubleshooting it and how the issue can be overcome.
A bit of background information
- A 2 node RAC dSource is linked to the Delphix Engine.
- The first Snapsync/snapshot is captured.
- A shutdown of instance 1 was performed and the instance is restarted to mount (either deliberately or by mistake).
- The database user in place within Delphix is the SYS user (as we need to be able to connect to mounted instances and only SYSDBA users can do this)
A Delphix snapsyc operation is initiated against the dSource database and fails reporting the following error:
Executing the following SQL against one of the RAC databases instances shows the problem where the status does no match.
SQL> select instance_number,instance_number,instance_name,status,database_status from gv$instance; INSTANCE_NUMBER INSTANCE_NUMBER INSTANCE_NAME STATUS DATABASE_STATUS --------------- --------------- ---------------- ------------ ----------------- 1 1 orcl121 MOUNTED ACTIVE 2 2 orcl122 OPEN ACTIVE
Resolution
To overcome the failure in snapsync one of the database instances must be restarted to a state (STATUS) that matches the other. In the example provided instance 2 is shutdown and restarted to mount (MOUNTED) to match instance 1. The direction taken to achieve the same state is dependent on the requirements of the site. It may be that this is a primary database where the intended state is OPEN or it could be a standby site where the intended state is MOUNTED.
In this case srvctl is used to stop instance 2 (orcl122) and restart it to mount.
$ srvctl stop instance -i orcl122 -d orcl12 -o immediate $ srvctl start instance -i orcl122 -d orcl12 -o mount
To ensure the next snapsync attempt is successful checking the state of each instance is performed.
$ sqlplus / as sysdba SQL*Plus: Release 12.1.0.2.0 Production on Wed Sep 19 14:22:18 2018 Copyright (c) 1982, 2014, Oracle. All rights reserved. Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Advanced Analytics and Real Application Testing options SQL> select instance_number,instance_number,instance_name,status,database_status from gv$instance; INSTANCE_NUMBER INSTANCE_NUMBER INSTANCE_NAME STATUS DATABASE_STATUS --------------- --------------- ---------------- ------------ ----------------- 1 1 orcl121 MOUNTED ACTIVE 2 2 orcl122 MOUNTED ACTIVE
Now that the STATUS of both instances is aligned, in this case to MOUNTED, the dSource snapsync can be attempted once more and should succeed without reporting ORA-01138.