Virtual database (VDB) provisioning and VDB refreshes are failing reporting the following errors:
Failed to recreate control file. Review the Oracle alert log for more details.
SQL*Plus: Release 184.108.40.206.0 Production on Thu Nov 17 15:04:32 2016^M Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 220.127.116.11.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> ORACLE instance shut down. SQL> ORACLE instance started. Total System Global Area 7.4826E+10 bytes Fixed Size 2261048 bytes Variable Size 6442454984 bytes Database Buffers 6.8183E+10 bytes Redo Buffers 199049216 bytes SQL> ERROR at line 12: ORA-01967: invalid option for CREATE CONTROLFILE
Examining the VDB's alert log shows that the recovery that Delphix performs during a provision or refresh operation has not completed successfully.
During the recovery phase of the provision the instance can be seen terminating abnormally:
Thu Nov 17 15:04:06 2016 Media Recovery Log /var/opt/delphix/delphix_mount/vplb/source-archive/arch_1_480925_799509181.log Media Recovery Log /var/opt/delphix/delphix_mount/vplb/source-archive/arch_2_483606_799509181.log Media Recovery Log /var/opt/delphix/delphix_mount/vplb/source-archive/arch_2_483607_799509181.log Thu Nov 17 15:04:27 2016 PMON (ospid: 7159): terminating the instance due to error 471 Thu Nov 17 15:04:27 2016 System state dump requested by (instance=1, osid=7159 (PMON)), summary=[abnormal instance termination]. System State dumped to trace file /prod/sys/oracle/software/base1120406/diag/rdbms/vplb/vplb/trace/vplb_diag_7169_20161117150427.trc Dumping diagnostic data in directory=[cdmp_20161117150427], requested by (instance=1, osid=7159 (PMON)), summary=[abnormal instance termination]. Instance terminated by PMON, pid = 7159
Normally during recovery of the VDB messages similar to the following should be seen appearing in the alert log when the recovery has completed successfully.
The exact message will depend on the provision type and release of Oracle.
Recovery completed through change 9645137 time 11/15/2016 22:29:27 Media Recovery Complete (vplb) Completed: alter database recover if needed start until change 9645155
The provision or refresh attempt is actually failing as a result of the instance termination so the question becomes what is causing this?
PMON terminating the instance is typically associated with a critical background process being killed or dying for some reason.
Looking into the Linux OS messages log (
/var/log/messages) shows that Linux is terminating Oracle through its out of memory killer functionality.
Nov 17 15:04:27 db20p03dx kernel:  32989 10202 59880 431 1 0 0 oracle Nov 17 15:04:27 db20p03dx kernel:  0 10204 56208 11289 3 0 0 TaniumClient Nov 17 15:04:27 db20p03dx kernel: Out of memory: Kill process 7179 (oracle) score 78 or sacrifice child Nov 17 15:04:27 db20p03dx kernel: Killed process 7179, UID 32989, (oracle) total-vm:73763960kB, anon-rss:101920kB, file-rss:13149952kB
The Linux kernel will decide to kill off processes when memory resource issues start appearing in the OS based on the value of
overcommit_memory is enabled (is set to 0 or 1) programs are allowed to allocate more memory than is really available within the OS.
When memory resources on the target node are used up to the extent where the resource shortage is threatening the stability of the system then the Out Of Memory Killer (OOM) can take action. OOM's role is to kill processes until enough memory is freed to allow the OS to operate normally.
As Oracle System Global Area's (SGA) and Oracle processes are typically large users of memory they can be targeted and terminated by OOM.
Allocate sufficient memory resources to the target node to ensure memory exhaustion does not occur during the provision process. The amount of memory available to a target node will need to be greater than the sum of all the Oracle VDBs SGA's that are expected to run concurrently on the host.