Applicable Delphix Versions
- Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases 6.0
220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199
There are two typical sets of symptoms when an Engine becomes unresponsive:
- The Delphix Engine NFS mounts are not responsive, and all target host IO operations fail. The web interface will not load, and attempts to login via SSH are unsuccessful as any Delphix Admin or self-service user, and no login prompt is presented when SSH connection is attempted.
- The Delphix Engine NFS mounts are responsive, and VDB operations are not disrupted. The web interface will not load, and attempts to login via SSH are unsuccessful as any Delphix Admin or self-service user. A login prompt is received when SSH connection is attempted, but login attempts fail, with no password prompt following the entry of a username.
In both conditions, the hypervisor still indicates the virtual machine (VM) is running, and ping may return successfully. Memory and CPU utilization may be variable, or the VM may indicate no activity.
In condition 1, a non-maskable interrupt (NMI) may be sent from the hypervisor to cause the Delphix operating system (DxOS) to kernel panic and generate a crash dump. The resulting crash dump can be collected by Delphix Support for further analysis.
If there is no response to the NMI on the VM console, retry the procedure. The final recourse is to reset or power on/off the system which will not generate a core and reduces potential for root cause analysis.
It is important to note that this procedure may not be successful in all cases. Unresponsive VM situations may occur for a variety of reasons related to the guest operating system or hypervisor issues. The following procedure is a best-effort to collect system state information at the time of a VM becoming unresponsive.
In condition 2, a Delphix Support user may still be able to login and offer recovery options. If possible, the NMI should not be issued or Engine rebooted until Support is engaged for further direction.
An administrative user with permission to access the OCI Console of the engine is required.
NMI / Diagnostic Interrupt Procedure
In the OCI Console, navigate to Compute > Instances, and locate the Delphix VM of concern.
- Click on the instance name to view details.
- Under More Actions drop-down menu, click Send Diagnostic Interrupt.
- A confirmation dialog will be presented confirming the desired action, with disclaimers.
Note: Although OCI offers Console history to display "OS-level error messages", there will be no output available here.
The Metrics panel may indicate zero CPU utilization when the VM is in an unresponsive state, and you may observe an increase in CPU utilization once the VM is back online following the panic and reboot.
Following this activity, once the Engine is online and accessible, a Support log bundle can be collected through the normal interface, but the dump files generated will need to be transferred by a Delphix Support engineer via screen-sharing session for further RCA.
The following articles may provide more information or related information to this article: