TB114 Delphix Engine May Fail to Reboot Following Upgrade
Alert Type
Availability
Impact
An affected Delphix Engine, Continuous Data Engine or Continuous Compliance Engine, may fail to reboot following an upgrade. The upgrade process may corrupt the boot loader, necessitating an involved and manual process to recover, which includes needing to deploy an additional Delphix Engine. This may result in a protracted outage of the affected system.
Contributing Factors
This article applies to the following versions of the Delphix engine:
Date | Release |
---|---|
Dec 20, 2023 | 18.0.0.0 |
Nov 21, 2023 | 17.0.0.0 |
The issue can only occur as the result of a Delphix Engine upgrade to one of the affected versions enumerated above.
Delphix Engines deployed on the Microsoft Azure platform are thought to be more susceptible than other platforms.
The issue can only occur after performing an Apply Now type of upgrade in which a Delphix Engine reboot occurs immediately following the application of the upgrade. The issue will not occur for a Delay the Reboot type of upgrade.
Once a Delphix Engine has successfully rebooted following an upgrade, there is no chance that the issue will occur on a subsequent reboot. If a Delphix Engine has already upgraded to an affected release, the problem will not occur.
Symptoms
During a reboot following an upgrade, an affected Delphix Engine may fail to start. On the system console, the following message may be seen:
error: file 'bufio.mod' not found Entering rescue mode_ grub rescue>
Relief/Workaround
If you have already downloaded an upgrade image but have not yet upgraded, you may avoid the issue by selecting a Delay the Reboot type of upgrade. Otherwise, defer upgrading until 18.0.0.1, or a later release.
Resolution
Resolved in DevOps Data Platform 18.0.0.1 and later releases for both the Continuous Data Engine and the Continuous Compliance Engine.
Additional Information
The issue occurs rarely. On systems thought to be most susceptible to the problem, i.e. Delphix Engines deployed on Azure, the issue appears to occur only once in approximately ten upgrade events.
Although rare, the impact of the issue is so severe, Delphix has removed the upgrade images for affected releases from the download.delphix.com site.
Related Documents
Upgrading the Delphix Engine: Overview
How to use boot diagnostics to troubleshoot virtual machines in Azure (external link)