TB006 I/O Errors May Occur If VMWare Heap Exhausted
- Last updated
- Save as PDF
Alert Type
Data Loss
Impact
The Delphix Engine relies on stable back-end storage attached to the VMware ESX server and presented to Delphix as virtual logical units (LUNs). In rare cases, memory contention on the ESX sever may result in recurring I/O errors, and these errors can result in filesystem corruption and/or data loss on the Delphix filesystem.
Contributing Factors
The problem can affect any release of Delphix Engine software.
Delphix Engine 3.1.5.0, 3.2.2.0, and later releases include changes that significantly reduce the impact of IO failures; however, silent IO failures from VMDK storage may still result in data corruption.
The problem is only known to occur when Delphix LUNs are comprised of VMDK disks stored on VMFS3 or VMFS5.
The problem can only occur when the total size of VMDK disks on an ESX server hosting a Delphix Engine is 4TB or larger. This includes both disks being used for the Delphix Engine and other, non-related guests on the same ESX server.
Symptoms
The Delphix Engine may become non-responsive, may reboot unexpectedly, or may fail to reboot successfully.
Delphix Engines running 3.2.0.0, or later releases, will issue a storage fault with the following text:
Title: Critical storage device error
Details: There has been an error with one or more storage devices.
User Action: Contact Delphix Support for assistance.
Resolution
Increase the maximum heap value for the ESX server. Delphix also recommends setting the Minimum heap value to the Maximum heap value, when possible, to guarantee that sufficient heap space will be available for the selected maximum VMDK capacity. The VMware administrator can effect these changes by configuring the VMF3.MaxHeapSizeMB and VMF3.MinHeapSizeMB variables, respectively:
- Log into the vCenter Server or ESX host using vSphere Client or VMware Infrastructure (VI) Client. When connecting to vCenter Server, select the ESX host from the inventory.
- Click the Configuration tab.
- Click Advanced Settings.
- Click VMFS3.
- Update the field in VMFS3.MaxHeapSizeMB. (see Table 1 below for sizes)
- Update the field in VMFS3.MinHeapSizeMB (optional, see Table 1 below for applicable releases)
- Reboot the ESX host for the changes to take effect
Table 1: Heap Values and Maximum Storage Sizes
Version | Minimum heap value | Default value of maximum heap value | Maximum heap value | Default open VMDK storage per host | Maximum open VMDK storage per host |
---|---|---|---|---|---|
ESXi/ESX 4.0 | N/A | 16MB | 128MB | 4TB | 32TB |
ESXi/ESX 4.1 | N/A | 80MB | 128MB | 8TB | 32TB |
ESXi 5.0 Build 914586 and earlier |
N/A | 80MB | 256MB | 8TB | 25TB |
ESXi 5.0 Build 1024429 and later |
256MB | 640MB | 640MB | 60TB | 60TB |
ESXi 5.1 Build 914609 and earlier |
N/A | 80MB | 256MB | 8TB | 25TB |
ESXi 5.1 Build 1065491 and later |
256MB | 640MB | 640MB | 60TB | 60TB |
For large configurations, where the total VMDK capacity would otherwise exceed 60TB, consider using virtual or physical-mapped RDM instead of VMDK disks.
Additional Information
See the VMware Knowledge Base article:
ESXi/ESX host reports VMFS heap warnings when hosting virtual machines that collectively use 4 TB or 20 TB of virtual disk storage (1004424) for further notes and details.