Skip to main content
Delphix

VDB Provision or Refresh Stalls in Oracle 19.7 Environment (KBA6417)

 

KBA

KBA# 6417

 

Issue

VDB provision or refresh may stall (job never completes) to Oracle target Environment running 19.7.  The stalled job may also not be cancellable from the Delphix Engine interface.  

Prerequisites

This issue is only known to occur in Oracle target Environments running Oracle 19.7. 

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases
6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.3.0, 6.0.3.1

5.3

5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

Troubleshooting

Reviewing the Oracle alert log for the VDB, a deadlock between two or more PIDs can be observed. In the excerpt below, we see PID 75 and 33 waiting for a row cache enqueue lock (note, PIDs observed in practice may differ from this example):

>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=75
System State dumped to trace file /u01/app/oracle/diag/rdbms/vdb1/VDB1/trace/VDB1_p006_22472.trc
2020-09-05T16:33:58.553032+00:00
TT03 (PID:22526): Sleep 160 seconds and then try to clear SRLs in 8 time(s)
2020-09-05T16:36:38.558018+00:00
TT03 (PID:22526): Sleep 320 seconds and then try to clear SRLs in 9 time(s)
2020-09-05T16:38:32.718236+00:00
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=33
System State dumped to trace file /u01/app/oracle/diag/rdbms/vdb1/VDB1/trace/VDB1_mmon_22309.trc

Reviewing the trace files indicated, the following stack traces confirm the behavior discussed in this document. Specifically, the stacks in question to locate are:

kgxSharedExamine - kxsGetRuntimeLock - kkscsCheckCursor
% grep kgxSharedExamine VDB1_p006_22472.trc  
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-__select()+19<-skgpwwait()+420<-kgxWait()+836<-kgxSharedExamine()+785<-kxsGetRuntimeLock()+246<-kkscsCheckCursor()+550<-kkscsSearchChildList()+1324<-kksfbc()+15645<-kkspsc0()+1566<-kksParseCursor()+114<-opiosq0()+2330<-opiodr()+1202<-rpidrus()+198<-skgmstack()+65<-rpidru()+132<-rpiswu2()+543<-rpidrv()+1266<-rpisplu_internal()+474<-ktuscu()+294<-kqrcmt()+978<-ktcCommitTxn_new()+5261<-ktcCommitTxn()+94<-kturfptrSlaveWork()+869<-kturfptrSlaveMain()+762<-kxfprdp_int()+1915<-ksbdispatch()+365<-opirip()+522<-opidrv()+581<-sou2o()+165<-opimai_real()+173<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245

 % grep kgxSharedExamine VDB1_mmon_22309.trc
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-__select()+19<-skgpwwait()+420<-kgxWait()+836<-kgxSharedExamine()+785<-kxsGetRuntimeLock()+246<-kkscsCheckCursor()+550<-kkscsSearchChildList()+1324<-kksfbc()+15645<-kkspsc0()+1566<-kksParseCursor()+114<-opiosq0()+2330<-opiodr()+1202<-rpidrus()+198<-skgmstack()+65<-rpidru()+132<-rpiswu2()+543<-rpidrv()+1266<-rpisplu_internal()+474<-ktuscu()+294<-kqrcmt()+978<-ktcCommitTxn_new()+5261<-ktcCommitTxn()+94<-kturfptrSlaveWork()+869<-kturfptrSlaveMain()+762<-kxfprdp_int()+1915<-ksbdispatch()+365<-opirip()+522<-opidrv()+581<-sou2o()+165<-opimai_real()+173<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245

Resolution

This issue is ultimately caused by Oracle bug 30159581 (which may be superseded by 31747989).  A patch may be obtainable from Oracle Support, or the following parameter can be added to the VDB configuration manually or via VDB Configuration Template which disables parallel undo processing:

"_min_undosegs_for_parallel_fptr"=0

It is recommended to persist this configuration change using a VDB configuration template, though an existing VDB configuration can also be changed ad-hoc.  To alter an existing VDB Configuration parameter via CLI, the configParams can be edited:

DelphixEngine> /source; select <VDBNAME>
DelphixEngine '<VDBNAME>'> update
DelphixEngine '<VDBNAME>' update *> set configParams._min_undosegs_for_parallel_fptr=0 
DelphixEngine '<VDBNAME>' update *> commit

The configParams setting can then be confirmed:

DelphixEngine '<VDBNAME>'> ls
Properties
    type: OracleVirtualSource
    name: <VDBNAME>
    allowAutoVDBRestartOnHostReboot: false
    archivelogMode: true
    config: <VDBNAME>
    configParams:
        _min_undosegs_for_parallel_fptr: 0 
        _omf: 'ENABLED'
        audit_sys_operations: FALSE
        audit_trail: 'NONE'
        compatible: '19.0.0'
        db_domain: 'delphix.com'
        filesystemio_options: 'setall'
        log_archive_dest_1: 'location=/mnt/provision/<VDBNAME>/archive/ MANDATORY'
        log_archive_format: '%t_%s_%r.dbf'
        memory_max_target: 1073741824
        memory_target: 1073741824
        nls_language: 'AMERICAN'
        nls_territory: 'AMERICA'
        open_cursors: 300
        processes: 300
        remote_login_passwordfile: 'EXCLUSIVE'
    configTemplate: (unset)
    container: <VDBNAME>

If attempts to cancel the provision/refresh job for the VDB are unsuccessful (the job never cancels), the pmon process for the Delphix VDB can be killed on the target host, which should cause the job to fail on the Engine.

Although it is not generally recommended to restart management services in response to an issue without Delphix Support guidance, if the previous VDB provision or refresh job cannot be canceled from the Delphix Engine administrative interface, and the signature discussed here is matched, the management service can also be restarted to force the job to be failed, though intervention on the target host as mentioned above is recommended.

reboot of the Delphix Engine is not required to recover from this condition.

Details on restarting the Engine can be found in our Documentation at:

https://docs.delphix.com/docs/configuration/starting-stopping-and-restarting-your-engine