Resolving VDB Refresh Mounting Timeout Issues Caused by Diskpart Hangs (KBA1704)
KBA
KBA#1704Applicable Delphix Versions
Major Release |
All Sub Releases |
5.2 | 5.2.2.0 |
5.1 |
5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0 |
5.0 |
5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1 ,5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4 |
4.3 |
4.3.1.0, 4.3.2.0, 4.3.2.1, 4.3.3.0, 4.3.4.0, 4.3.4.1, 4.3.5.0 |
4.2 |
4.2.0.0, 4.2.0.3, 4.2.1.0, 4.2.1.1, 4.2.2.0, 4.2.2.1, 4.2.3.0, 4.2.4.0 , 4.2.5.0, 4.2.5.1 |
4.1 |
4.1.0.0, 4.1.2.0, 4.1.3.0, 4.1.3.1, 4.1.3.2, 4.1.4.0, 4.1.5.0, 4.1.6.0 |
4.0 |
4.0.0.0, 4.0.0.1, 4.0.1.0, 4.0.2.0, 4.0.3.0, 4.0.4.0, 4.0.5.0, 4.0.6.0, 4.0.6.1 |
Issue
This issue occurs when iSCSI mounting/un-mounting operations occur, manifesting mostly during VDB operations such as refreshes, rollbacks, etc. In theory, this can happen to dSources when mounting or unmounting (disable/enable operations). In all cases, the Windows target host, whether standalone or in a failover cluster scenario, has some sort of Veritas disk management software installed, typically for some other application outside of Delphix usage. In most situations, there are physical SQL Server databases running in a mixed bag with Delphix iSCSI drives handling the VDBs.
When running a job on VDB you'll receive this type of error:
event_id | 222363 job | JOB-54152 event_time | 2018-01-23 21:13:28.471 job_state | FAILED percent_complete | 35 message_code | exception.windowshost.mount.timedout message_params | ["CSPWD00B0005"] message_details | Attempt to discover and mount LUNs from the Delphix Engine through the iSCSI initiator on target host "CSPWD00B0005" timed out. message_action | Make sure the target host has not exhausted all of its memory. If SQL Server instances are running on the target host, limit the maximum memory for all SQL Server instances to leave sufficient memory (at least 2 GB) for other operations on the host. Check the Windows event log for iSCSI initiator timeout errors. message_command_output | event_type | ERROR
A timeout can be a legitimate problem with the infrastructure, but this article discusses what happens when the reason for a 20 minute timeout is due to the diskpart windows utility hanging on the operation. This utility is used in the mounting process to gather information about the disk used for the iSCSI mount on the VDB experiencing the operation.
Troubleshooting
To check if this is the diskpart "hang" scenario, open a session to the windows host where VDB database files are to be mounted. Open a command prompt or Powershell and run the diskpart command. If you don't receive a prompt to start entering commands, diskpart is hung.
Open the Task Manager and check the process list under the details tab for diskpart.exe.
Right click the process and select Analyze wait chain. If you see processes chained off like in the image below, you are likely going to encounter a problem.
Notice the executable - vds.exe. This is the virtual disk service provider.
You'll want to find that process in task manager and check its wait chain:
If you see the VxVDS.exe binary on the chain, the issue is likely caused by the Veritas Super VDS Provider. It's essentially a wrapper to the VDS providers, both to Microsoft and Veritas and enables the Veritas provider to handle management of its own disk management software as well as Microsoft disk management software such as diskpart and disk management. For reasons unspecified, the use of this provider can randomly cause the hanging issue and disrupt Delphix VDB operations relevant to iSCSI mounting, manifesting as a timeout error.
This problem can be easily resolved.
Resolution
The resolution is rather simple and is described in this link: VxVDS Information and steps to unregister/register.
In brief, a windows administrator needs to log onto the affected target (or in the case of dSources, staging) host.
Run these commands, following steps under How to unregister VxVDS: "1. Without stop/restart of Storage Agent" (do not run any other commands listed in the document):
cd %VMPATH% vxvdsdyn.exe /unregserver vxvds.exe /unregserver taskkill /f /im vxvds.exe taskkill /f /im vxvdsdyn.exe taskkill /f /im vds.exe vxassist refresh # this will bring up VDS service without Veritas VDS providers
There is no need to restart the windows host. These commands perform the unregistering dynamically. Once completed, when diskpart is executed, the Veritas VxVDS provider no longer wraps with diskpart and the VDB operations can resume normally at this point. The change in the VxVDS Provider persists on the Windows host. The Veritas software is unaffected by the unregistering as the procedure simply instructs the VxVDS Super VDS Provider from affecting non-Veritas utilities such as diskpart and disk manager windows disk utilities.