Resolving Persistent "Error 21" Errors in SQL Server VDBs (KBA1047)
KBA
KBA#1047Issue
Following a disruption of network connectivity between the Delphix Engine and a Target SQL Server environment, you may receive the following error:
Msg 823, Level 24, State 2, Line 2 The operating system returned error 21(The device is not ready.) to SQL Server during a read at offset 0x00000000012000 in file 'C:\Program Files\Delphix\DelphixConnector\ENGINE-UUID-vdb-1\DATA\db\databasename.mdf'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
This SQL Server error will prevent normal use of the affected database(s), and may prevent operations on VDBs and dSources (including Disable and Enable operations) from being performed by the Delphix Engine.
The Delphix Engine will typically detect this issue during VDB health monitoring, and affected databases will appear in a Stopped state in the Datasets page.
Troubleshooting Error 21
This error is raised by SQL Server after a disruption of iSCSI traffic between the Delphix Engine and a SQL Server Environment, if the disruption lasts longer than a configured timeout period. Causes for this include, but are not limited to:
- Planned maintenance of network, storage or compute infrastructure supporting the Delphix Engine
- An unplanned outage of network, storage or compute infrastructure supporting the Delphix Engine
- An unplanned restart of the Delphix Engine
- A planned restart of the Delphix Engine where dSources and Virtual Databases (VDBs) were not disabled
- Resource contention at the hypervisor (VMware) level
- VMware snapshots, snapshot consolidation, or vMotion operations, resulting in delayed or stalled disk access
The "Error 21" errors will persist even after connectivity to the iSCSI disks has been re-established, preventing successful operation of the VDB.
To identify when the underlying issue may have started, look for the first instances of the strings "error 21" or "Msg 823, Level 24, State 2, Line 2" in your SQL Server ERRORLOG.
In some cases, only a subset of dSources or VDBs will be affected. Databases which are not accessed during a network disruption may remain online.
Resolution
Before proceeding with resolution, confirm that both the Delphix Engine and the SQL Server Environment can be reached and are responding normally.
The resolution steps will depend on the status of the database in the Management GUI:
Container Type | Status in Management GUI | Action to Resolve |
dSource | Active | Disable and Enable dSource If this fails due to KB4094893, restart the SQL Server instance. |
VDB | Stopped | Start VDB If this fails due to KB4094893, restart the SQL Server instance. |
VDB | Running | Disable and Enable VDB If this fails due to KB4094893, restart the SQL Server instance. |
Workaround for Microsoft KB4094893
Due to the issue described in Microsoft KB4094893, several versions of SQL Server are unable to clear the "Error 21" state when a Start, Disable or Enable VDB operation is performed:
- SQL Server 2014 (fixed in SP2 CU12)
- SQL Server 2016 (fixed in SP1 CU9 and SP2 CU1)
- SQL Server 2017 (fixed in CU6)
- Resolved in SQL Server 2019 and later
Please note that the fixes for this issue do not prevent Windows from raising an Error 21 IO error, but they allow stuck VDBs to be restarted (using the Resolution steps above) without restarting the SQL Server instance.
If Start or Enable operations for an affected dSource or VDB fail, it may be necessary to restart the SQL Server instance on the affected host. Please be aware that restarting the instance will temporarily disrupt access to all databases on the SQL Server instance, whether or not they are hosted by the Delphix Engine.
Preventing recurrence of the issue
To minimize disruptions, we recommend performing a Disable operation on dSources and VDBs before performing any disruptive network maintenance.
We also recommend increasing disk timeouts as part of our Requirements for Windows iSCSI Configuration.
The MaxRequestHoldTime
value in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4D36E97B-E325-11CE-BFC1-08002BE10318}\<Instance Number>\Parameters
has been shown to increase how long an iSCSI disk operation will be held open during a network disruption. Increasing this from 60 to 300 (as shown in the requirements document) will be helpful if frequent network maintenance events or infrastructure issues are triggering this issue.
Delphix provides PowerShell scripts to make it easier to check and set iSCSI and disk related registry parameters in KBA1251.
Finally, we encourage any customers affected by Microsoft KB4094893 to upgrade any Staging and Target SQL Server instances to a Cumulative Update or Service Pack where this issue is resolved.