TB105 Delphix Cloud Engines May Experience Data Loss and Hangs
Alert Type
Data Loss, Availability
Impact
Delphix Cloud Engines, where storage is provisioned on AWS S3 or Azure Blob Storage, may rarely encounter an issue that leads to data corruption or loss. Although rare, if the problem is triggered, data loss and protracted engine outages could occur.
If corruption occurs on an affected engine, no suitable recovery mechanism is available other than to restore an engine from backup.
Contributing Factors
This article applies to the following versions of Delphix Continuous Data:
Date | Release |
---|---|
Apr 13, 2023 | 10.0.0.0 |
Mar 13, 2023 | Mar 20, 2023 | 9.0.0.0 | 9.0.0.1 |
Feb 13, 2023 | 8.0.0.0 |
The issue can only occur on Continuous Data Cloud Engines, i.e. where persistent storage uses object storage, specifically Amazon S3 storage or Azure Blob Storage. Engines using only traditional block storage such as Amazon EBS volumes, are not impacted.
The issue is thought to occur only rarely. Susceptible systems where there is also a low number of write I/O operations may be at increased risk of encountering the issue.
Restarting, Rebooting, Powering Off, or Upgrading a susceptible system can significantly increase risk of encountering the issue.
Symptoms
If data corruption occurs, it is possible that no symptoms will occur if corrupted blocks are not subsequently accessed.
If corrupted blocks are accessed, an affected engine will become unresponsive:
-
The Delphix Admin application will not respond
-
Virtual Databases (VDBs) may become unresponsive and crash
Relief/Workaround
Avoid upgrades to any affected Delphix releases.
Do not perform any operations on susceptible engines that may trigger the problem, including engine reboots, restarts, or powering off.
Resolution
The issue is fully resolved in Continuous Data 10.0.0.1 and later releases.
Additional Information
To check if a Delphix Engine is configured as a Cloud Engine:
Either:
-
Navigate to the Engine using a browser.
-
Login to the SETUP app using sysadmin credentials.
-
Examine the center panel labeled Storage and note the Object Storage for Data parameter. If this parameter shows Enabled, then the Engine is configured as a Cloud Engine.
Or:
-
Connect to the engine using ssh, putty, or equivalent utility, using credentials with the sysadmin role.
-
Enter the command “storage objectStorage ls” at the command prompt. If the engine is configured as a Cloud Engine, the displayed properties will have non-null values, e.g.
sample.acme.com> storage objectStorage ls
Properties type: S3ObjectStore accessCredentials: type: S3ObjectStoreAccessInstanceProfile bucket: smpl-prod-dlpx-10000-qar-80192-27a4593a cacheDevices: xvdb configured: true endpoint: https://s3.us-west-2.amazonaws.com region: us-west-2 size: 12TB Operations update testConnection cacheHitsReport clearCacheHits
Related Documents
Continuous Data Installation and Setup Configurations (see Delphix Cloud Engines section)
Starting, Stopping, and Restarting Your Engine