TB051 Delphix Engine May Crash or Restart Unexpectedly Under Heavy I/O Load
Alert Type
Availability
Impact
The Delphix Engine may become unresponsive and reboot. A Delphix Engine restart would result in abnormal termination of all running activies at the appliance level, including, but not limited to, VDB Provision, Refresh, Snapsync, or Replication jobs. Some failed jobs may have to be manually restarted.
Virtual Databases (VDBs) active at the time of a Delphix Engine restart will hang and may crash.
The issue occurs rarely, but may recur on affected Delphix systems.
Contributing Factors
The issue can only occur when running one of the following Delphix Engine
Major Release |
All Sub Releases |
5.3 |
5.3.0.0, 5.3.0.1, 5.3.0.2 |
The issue is more likely to occur during period of heavy I/O workload. This could include, but is not limited to:
- Heavy VDB workloads
- SnapSync jobs
- Delphix Replication
Resolution
The issue is fully resolved in Delphix Engine 5.3.0.3, and later releases.
Symptoms
-
You may see
Delphix Engine Communication Error
when navigating with a browser to the Delphix Engine or when an existing Delphix Admin or Server Setup application is disrupted by the issue.
-
A Delphix Engine alert and accompanying email may occur with the following text:
The server is starting up following an unexpected shutdown around <date>."
or
The management service is starting up following an unexpected shutdown around <date>."
-
Jobs or Actions running at the time of failure will terminate abnormally. A Delphix Engine alert and accompanying email will be issued with the following text:
<job_type> for "<object>" failed due to an error during execution: <job_type> for <object> failed due to server restart during execution."
where <job_type> is the type of job running, for example, DB_REFRESH, DB_PROVISION, or DB_SYNC and <object> is the name of the Delphix group and database name for which the job was being processed.