The symptoms of a server hang are that the system is not reachable via NFS, the GUI, SSH, console logins, etc. and that the hypervisor still indicates the virtual machine (VM) is running. The server may respond to ping depending on the nature of the hang. Should such a condition arise where the system is otherwise unreachable, a non-maskable interrupt (NMI) may be sent from the hypervisor to cause the Delphix operating system (DxOS) to kernel panic and generate a crash dump. The resulting crash dump can be collected by Delphix Support for further analysis.
If the system does not respond, retry the procedure. The final recourse is to reset or power on/off the system which will not generate a core and reduce potential for root cause analysis.
It is important to note that this procedure will not be successful in all cases. VM hangs may occur for a variety of reasons related to the guest operating system, or other hypervisor issues. The following procedure is a best-effort to collect system state information at the time of a VM hang.
An administrative user with permissions to access the Delphix VM Console in Azure Portal is required to perform these actions. This is typically granted by "VM Contributor" role.
Additional storage blob container permissions may be required to access the Boot Diagnostics interface.
Applicable Delphix Versions
This article applies to the following versions of the Delphix Engine:
All Sub Releases
1. After logging into the Azure Portal, navigate to "Virtual Machines", then click the VM name in the resulting tab.
2. Scroll to the bottom of VM tools to locate "Support + troubleshooting" heading. Click "Boot diagnostics" and click "Serial log" to access the VM serial log history. A "Download serial log" hyperlink should be available to download the current console content prior to the NMI operation. This serial log access is also helpful for monitoring the NMI and resulting panic, and any startup messages post-restart.
Missing permissions for storage blob container may result in an error:
Error encountered while getting the screenshot or serial log file from the blob container in storage account <storage account name>. Please make sure you have permissions and fireall (sic) is not blocking access to the Storage account.
3. From "Support + troubleshooting", click "Serial console" to access the VM interactive console. Initially there will be a delay for a number of seconds while the console connects:
Missing permissions for the admin user will cause the Serial console connection may result in an error:
The serial console connection to the VM encountered an error: 'Forbidden (403) - You do not have the required permissions to use this VM serial console. Please ensure you have at least VM Contributor role permissions.
4. Once connected, the "Send Command" button will be available. Click this icon, then click "Send Non-Maskable Interrupt (NMI)".
A final warning will be posted to the user that the VM will be crashed and restarted for debugging purposes. Click "Send NMI" button to initiate the process.
5. The serial console should subsequently indicate a VM panic due to NMI received.
panic[cpu0]/thread=ffffff003d005c40: NMI received fffffffffbc18ed0 fffffffffbad559f () fffffffffbc18f00 unix:av_dispatch_nmivect+34 () fffffffffbc18f10 unix:nmiint+152 () ffffff003d005bd0 unix:mach_cpu_idle+6 () ffffff003d005c00 unix:cpu_idle+11a () ffffff003d005c20 unix:idle+a7 () ffffff003d005c30 unix:thread_start+8 () dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Once this process is completed, and the VM is accessible again, Delphix Support will login to the Engine directly to relocate the crash dump file to be collected in a Support log bundle, or manually transferred from the VM via SCP, etc.
The following articles may provide more information or related information to this article: