Skip to main content
Delphix

TB006 I/O Errors May Occur If VMWare Heap Exhausted

 

 

Alert Type

Data Loss

Impact

The Delphix Engine relies on stable back-end storage attached to the VMware ESX server and presented to Delphix as virtual logical units (LUNs). In rare cases, memory contention on the ESX sever may result in recurring I/O errors, and these errors can result in filesystem corruption and/or data loss on the Delphix filesystem. 

Contributing Factors

The problem can affect any release of Delphix Engine software. 

Delphix Engine 3.1.5.0, 3.2.2.0, and later releases include changes that significantly reduce the impact of IO failures; however, silent IO failures from VMDK storage may still result in data corruption.

The problem is only known to occur when Delphix LUNs are comprised of VMDK disks stored on VMFS3 or VMFS5.

The problem can only occur when the total size of VMDK disks on an ESX server hosting a Delphix Engine is 4TB or larger. This includes both disks being used for the Delphix Engine and other, non-related guests on the same ESX server. 

Symptoms

The Delphix Engine may become non-responsive, may reboot unexpectedly, or may fail to reboot successfully. 

Delphix Engines running 3.2.0.0, or later releases, will issue a storage fault with the following text:

Title: Critical storage device error

Details: There has been an error with one or more storage devices.
User Action: Contact Delphix Support for assistance.

Resolution

Increase the maximum heap value for the ESX server.  Delphix also recommends setting the Minimum heap value to the Maximum heap value, when possible, to guarantee that sufficient heap space will be available for the selected maximum VMDK capacity. The VMware administrator can effect these changes by configuring the VMF3.MaxHeapSizeMB and VMF3.MinHeapSizeMB variables, respectively:

  1. Log into the vCenter Server or ESX host using vSphere Client or VMware Infrastructure (VI) Client. When connecting to vCenter Server, select the ESX host from the inventory.
  2. Click the Configuration tab.
  3. Click Advanced Settings.
  4. Click VMFS3.
  5. Update the field in VMFS3.MaxHeapSizeMB. (see Table 1 below for sizes)
  6. Update the field in VMFS3.MinHeapSizeMB (optional, see Table 1 below for applicable releases)
  7. Reboot the ESX host for the changes to take effect

 

Table 1: Heap Values and Maximum Storage Sizes
Version Minimum heap value Default value of maximum heap value Maximum heap value Default open VMDK storage per host Maximum open VMDK storage per host
ESXi/ESX 4.0 N/A 16MB 128MB 4TB 32TB
ESXi/ESX 4.1 N/A 80MB 128MB 8TB 32TB
ESXi 5.0 Build 914586 and
earlier 
N/A 80MB 256MB 8TB 25TB
ESXi 5.0 Build 1024429 and
later 
256MB 640MB 640MB 60TB 60TB
ESXi 5.1 Build 914609 and
earlier 
N/A 80MB 256MB 8TB 25TB
ESXi 5.1 Build 1065491 and
later 
256MB 640MB 640MB 60TB 60TB

For large configurations, where the total VMDK capacity would otherwise exceed 60TB, consider using virtual or physical-mapped RDM instead of VMDK disks. 

Additional Information

See the VMware Knowledge Base article:

ESXi/ESX host reports VMFS heap warnings when hosting virtual machines that collectively use 4 TB or 20 TB of virtual disk storage (1004424) for further notes and details.