Skip to main content
Delphix

Delphix Engine May Hang During File Removal

Alert Type

Availability

Impact

The Delphix Engine may become unresponsive. I/O to Virtual Databases (VDB) may experience severe performance issues or hang.   Oracle VDBs may crash.

Contributing Factors

The issue may occur in the following Delphix Releases:

  • All Delphix Engine 4.0 releases
  • All Delphix Engine 4.1 releases
  • Delphix Engine 4.2.1.0 and 4.2.1.1
  • Delphix Engine 4.2.2.0 and 4.2.2.1

The issue only occurs when deleting one or more large files on a Delphix filesystem from a Linux or UNIX target. This may happen under one or more of the following circumstances:

  • Using the Linux or UNIX "rm" command to delete individual files or groups of files on an Oracle VDB
  • Using the Linux or UNIX "rm" command to delete individual files or groups of files on an unstructured data (vFiles) filesystem on a Delphix Engine. 
  • Dropping large Oracle datafiles or temporary tablespaces on a VDB using Oracle's "ALTER TABLESPACE DROP DATAFILE", "DROP TABLESPACE INCLUDING CONTENTS AND DATAFILES", and similar SQL statements.

The problem only occurs when large files are being deleted, usually a minimum of dozens of gigabytes in size. 

The problem does not occur when deleting entire VDBs, regardless of size, because in those circumstances entire filesystems are deleted. 

The problem does not occur when deleting files from VDBs or unstructured data from vFiles on Microsoft Windows environments. 

Symptoms

  • Existing sessions to an affected Delphix Engine may become unresponsive, including browser sessions to the Delphix Admin application, browser sessions to the Server Setup application, ssh CLI sessions and WebAPI sessions
  • Attempts to establish new sessions to the Delphix Engine may hang or fail
  • In rare cases, messages may appear on the Delphix Engine Console, including:

    WARNING: vmxnet3s0: ddl_dma_alloc_handle() failed

    or

    WARNING: vmxnet3s :0 : ddi_dma_mem_alloc() failed
  • VDBs may hang. New sessions to VDBs may hang or fail
  • Oracle VDBs may crash with one of the following errors:

    ORA-00494: enqueue [CF] held for too long ...
    ORA-00238: timeout waiting for control file enqueue held by ...
  • An affected Delphix engine that is forcibly rebooted may take much longer than normal to boot, possibly several hours or more. 

 

Relief/Workaround

On Linux- or UNIX-based target environments, do not delete large files from NFS-mounted filesystems provisioned from a Delphix Engine:

  • When attempting to clean up space from an unused VDB, instead of removing files manually, delete the entire VDB using the Delphix User Interface. See the instructions for Deleting a VDB in the end-user documentation. 
  • Avoid the use of Oracle's "ALTER TABLESPACE DROP DATAFILE", "DROP TABLESPACE ... INCLUDING CONTENTS AND DATAFILES" and similar statements that may remove large files on a database
  • Avoid the use of the Linux or UNIX "rm", "unlink", or "mv" commands on Oracle files or unstructured data provisioned from vFiles. 

An affected Delphix Engine that is completely unresponsive may be forcibly rebooted. See  "How to Generate an NMI" on the Delphix knowledge base. 

Resolution

The issue is fully resolved in Delphix Engine OS Version 4.2.2015.05.19, included with Delphix Engine 4.2.3.0.

Fresh Installations and Full Upgrades to Delphix Engine 4.2.3.0 will run OS Version 4.2.2015.05.19.

Deferred OS upgrades to Delphix 4.2.3.0 run a prior version of the Delphix OS and are still susceptible to the problem described in this bulletin. 

See the "Deferred OS Upgrade" section of "Upgrading to a New Version of the Delphix Engine"  for information about how to determine the current OS version of a Delphix Engine. 

Additional Information

The issue may result in protracted disruptions of service, although it occurs rarely. 

In some circumstances users have deleted a large number of files which are queued up on a target environment. Even after an affected Delphix Engine is rebooted and recovers, the issue may immediately recur when file deletions continue to be processed from a target environment. This may result in even more significant disruption of service. 

  • Was this article helpful?