VDB Provision or Refresh Stalls in Oracle 19.7 Environment (KBA6417)
KBA
KBA# 6417
Issue
VDB provision or refresh may stall (job never completes) to Oracle target Environment running 19.7. The stalled job may also not be cancellable from the Delphix Engine interface.
Prerequisites
This issue is only known to occur in Oracle target Environments running Oracle 19.7.
Applicable Delphix Versions
- Click here to view the versions of the Delphix engine to which this article applies
-
Major Release All Sub Releases 6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.3.0, 6.0.3.1 5.3
5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0
Troubleshooting
Reviewing the Oracle alert log for the VDB, a deadlock between two or more PIDs can be observed. In the excerpt below, we see PID 75 and 33 waiting for a row cache enqueue lock (note, PIDs observed in practice may differ from this example):
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=75 System State dumped to trace file /u01/app/oracle/diag/rdbms/vdb1/VDB1/trace/VDB1_p006_22472.trc 2020-09-05T16:33:58.553032+00:00 TT03 (PID:22526): Sleep 160 seconds and then try to clear SRLs in 8 time(s) 2020-09-05T16:36:38.558018+00:00 TT03 (PID:22526): Sleep 320 seconds and then try to clear SRLs in 9 time(s) 2020-09-05T16:38:32.718236+00:00 >>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=33 System State dumped to trace file /u01/app/oracle/diag/rdbms/vdb1/VDB1/trace/VDB1_mmon_22309.trc
Reviewing the trace files indicated, the following stack traces confirm the behavior discussed in this document. Specifically, the stacks in question to locate are:
kgxSharedExamine - kxsGetRuntimeLock - kkscsCheckCursor
% grep kgxSharedExamine VDB1_p006_22472.trc ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-__select()+19<-skgpwwait()+420<-kgxWait()+836<-kgxSharedExamine()+785<-kxsGetRuntimeLock()+246<-kkscsCheckCursor()+550<-kkscsSearchChildList()+1324<-kksfbc()+15645<-kkspsc0()+1566<-kksParseCursor()+114<-opiosq0()+2330<-opiodr()+1202<-rpidrus()+198<-skgmstack()+65<-rpidru()+132<-rpiswu2()+543<-rpidrv()+1266<-rpisplu_internal()+474<-ktuscu()+294<-kqrcmt()+978<-ktcCommitTxn_new()+5261<-ktcCommitTxn()+94<-kturfptrSlaveWork()+869<-kturfptrSlaveMain()+762<-kxfprdp_int()+1915<-ksbdispatch()+365<-opirip()+522<-opidrv()+581<-sou2o()+165<-opimai_real()+173<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245 % grep kgxSharedExamine VDB1_mmon_22309.trc ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-__select()+19<-skgpwwait()+420<-kgxWait()+836<-kgxSharedExamine()+785<-kxsGetRuntimeLock()+246<-kkscsCheckCursor()+550<-kkscsSearchChildList()+1324<-kksfbc()+15645<-kkspsc0()+1566<-kksParseCursor()+114<-opiosq0()+2330<-opiodr()+1202<-rpidrus()+198<-skgmstack()+65<-rpidru()+132<-rpiswu2()+543<-rpidrv()+1266<-rpisplu_internal()+474<-ktuscu()+294<-kqrcmt()+978<-ktcCommitTxn_new()+5261<-ktcCommitTxn()+94<-kturfptrSlaveWork()+869<-kturfptrSlaveMain()+762<-kxfprdp_int()+1915<-ksbdispatch()+365<-opirip()+522<-opidrv()+581<-sou2o()+165<-opimai_real()+173<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
Resolution
This issue is ultimately caused by Oracle bug 30159581 (which may be superseded by 31747989). A patch may be obtainable from Oracle Support, or the following parameter can be added to the VDB configuration manually or via VDB Configuration Template which disables parallel undo processing:
"_min_undosegs_for_parallel_fptr"=0
It is recommended to persist this configuration change using a VDB configuration template, though an existing VDB configuration can also be changed ad-hoc. To alter an existing VDB Configuration parameter via CLI, the configParams can be edited:
DelphixEngine> /source; select <VDBNAME> DelphixEngine '<VDBNAME>'> update DelphixEngine '<VDBNAME>' update *> set configParams._min_undosegs_for_parallel_fptr=0 DelphixEngine '<VDBNAME>' update *> commit
The configParams setting can then be confirmed:
DelphixEngine '<VDBNAME>'> ls Properties type: OracleVirtualSource name: <VDBNAME> allowAutoVDBRestartOnHostReboot: false archivelogMode: true config: <VDBNAME> configParams: _min_undosegs_for_parallel_fptr: 0 _omf: 'ENABLED' audit_sys_operations: FALSE audit_trail: 'NONE' compatible: '19.0.0' db_domain: 'delphix.com' filesystemio_options: 'setall' log_archive_dest_1: 'location=/mnt/provision/<VDBNAME>/archive/ MANDATORY' log_archive_format: '%t_%s_%r.dbf' memory_max_target: 1073741824 memory_target: 1073741824 nls_language: 'AMERICAN' nls_territory: 'AMERICA' open_cursors: 300 processes: 300 remote_login_passwordfile: 'EXCLUSIVE' configTemplate: (unset) container: <VDBNAME>
If attempts to cancel the provision/refresh job for the VDB are unsuccessful (the job never cancels), the pmon process for the Delphix VDB can be killed on the target host, which should cause the job to fail on the Engine.
Although it is not generally recommended to restart management services in response to an issue without Delphix Support guidance, if the previous VDB provision or refresh job cannot be canceled from the Delphix Engine administrative interface, and the signature discussed here is matched, the management service can also be restarted to force the job to be failed, though intervention on the target host as mentioned above is recommended.
A reboot of the Delphix Engine is not required to recover from this condition.
Details on restarting the Engine can be found in our Documentation at:
https://docs.delphix.com/docs/configuration/starting-stopping-and-restarting-your-engine
Related Articles
The following articles may provide more information or related information to this article:
- Delphix Docs 5.3.x - Customizing Oracle VDB Configuration Settings
- Delphix Docs 6.x - Customizing Oracle VDB Configuration Settings
- Delphix Docs - Starting, Stopping, and Restarting Your Engine