Generating a Non-Maskable Interrupt (NMI) in Hyper-V (KBA6342)
KBA
KBA# 6342Applicable Delphix Versions
- Click here to view the versions of the Delphix engine to which this article applies
-
Major Release All Sub Releases 6.0 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0
Issue
The symptoms of an unresponsive server are that the system is not reachable via NFS, the GUI, SSH, console logins, et cetera, and that the EC2 console still indicates the virtual machine (VM) is running and CPU or memory resources may still be consumed. The server may respond to ping, depending on the nature of the issue.
There are two typical sets of symptoms when an Engine becomes unresponsive:
1. The Delphix Engine NFS mounts are not responsive, and all target host IO operations fail. The web interface will not load, and attempts to login via SSH are unsuccessful as any Delphix Admin or self-service user, and no login prompt is presented when SSH connection is attempted.
2. The Delphix Engine NFS mounts are responsive, and VDB operations are not disrupted. The web interface will not load, and attempts to login via SSH are unsuccessful as any Delphix Admin or self-service user. A login prompt is received when SSH connection is attempted, but login attempts fail, with no password prompt following the entry of a username.
In both conditions, the hypervisor still indicates the virtual machine (VM) is running, and ping may return successfully. Memory and CPU utilization may be variable, or the VM may indicate no activity.
Should such a condition arise where the system is otherwise unreachable, a non-maskable interrupt (NMI) or diagnostic interrupt may be sent from the hypervisor to cause the Delphix operating system (DxOS) to kernel panic and generate a crash dump. The resulting crash dump can be collected by Delphix Support for further analysis.
If the system does not respond, retry the procedure. The final recourse is to reset or power on/off the system which will not generate a core and reduces the potential for root cause analysis.
It is important to note that this procedure will not be successful in all cases. Unresponsive VMs may occur for a variety of reasons related to the guest operating system or other hypervisor issues. The following procedure is a best-effort to collect system state information at the time of the issue.
Prerequisites
An administrative user with PowerShell access is required to complete this process.
Generate NMI
An NMI to a VM in Hyper-V can be issued by an Administrator using the Debug-VM PowerShell cmdlet. During this process it is recommended to also monitor the VM console in another window.
In the example below, the Delphix VM named "SeanN HyperV Test" is currently running on server DEVSUPPORT-HV01.
The cmdlet syntax to issue an NMI with Debug-VM is:
PS C:> debug-vm "SeanN HyperV Test" -InjectNonMaskableInterrupt -Force
If executed successfully, the command prompt will return immediately.
Once the command is issued successfully, there will be no console updates while the memory dump is generated. It is critical that the VM not be restarted during this period; as it may seem the VM is unresponsive, restarting during this phase will cause the memory dump to fail and diagnostic details will be unavailable for any potential root cause analysis.
The Windows Event Log can also be reviewed to confirm the NMI was issued by navigating to Application and Services Logs - Microsoft - Windows - Hyper-V-VMMS - Admin. The event ID is 33500.
Once the crash dump is generated the VM will automatically restart from the original boot device, and normal Delphix VM boot activity should be observed.
Delphix Support engagement will be required to collect the diagnostic data from the Engine for further analysis.
Related Articles
The following articles may provide more information or related information to this article: