Skip to main content
Delphix

TB011 Clock Drift May Lead to Server Hangs or Performance Issues

 

 

Alert Type

Availability/Performance

Impact

The Delphix Engine software periodically queries system time from the host platform and uses this data for internal timekeeping and scheduling on the Delphix Engine. In rare circumstances where the host platform returns times in a non-increasing sequence, the internal time mechanism on the Delphix Engine may become confused. And this can lead to pauses on the Delphix Engine that may vary in duration from a few seconds to much longer. In rare circumstances, the Delphix Engine may appear to be persistently hung. 

Contributing Factors

The issue can affect appliances running any version of Delphix Engine software.

The issue only impacts appliances that are virtual guest machines hosted on ESX 3.x or vSphere 4.x servers.

Symptoms

The most common manifestation of the issue is a hang of the GUI and CLI for Delphix. The hang may last from minutes to hours. During a hang, existing sessions may become non-responsive and/or some user interface components may not display properly. Also, new sessions may fail to be established. Existing SnapSync or provision jobs may be interrupted, and new policy-initiated jobs may be affected. 

Rarely, in addition to the GUI/CLI hang above, I/O to virtual databases (VDBs) may be impacted. This can lead to extremely poor performance on VDBs or VDB crashes.   

Relief/Workaround

If the problem manifests as only the GUI/CLI hang described earlier, Delphix Support can often recover the system without impacting VDBs. However, SnapSync, provision, or refresh jobs may have to be restarted. Contact Delphix Support for assistance. 

If the rare form of the issue impacting VDBs occurs, a Delphix Engine reboot will restore the system to service. It is possible to attempt a graceful shutdown of the system using the vSphere "Shut Down Guest" or "Restart Guest" options, respectively; however, graceful operations may be unsuccessful. In this case, a VM "Power reset" or "Power off" can be attempted. Contact Delphix Support for assistance.

Resolution

The recommended resolution to this issue is to upgrade servers to vSphere 5.0 or later releases, as these releases of the virtualization platform have default timekeeping options that will prevent the problem from occurring. 

For customers who are unable to migrate their servers from vSphere 4.x, the following options can be added the virtual machine's configuration file (.vmx):

monitor_control.disable_tsc_offsetting=TRUE
monitor_control.disable_rdtscopt_bt=TRUE
timeTracker.forceMonotonicTTAT=TRUE

Changes to the virtual machine configuration file will have to be effected by the customer's VM administrator and will require that the Delphix Engine environment be powered off during the change. 

Additional Information

Timekeeping in VMware Virtual Machines (external link)

Tips for editing a .vmx file (external link)