Skip to main content
Delphix

Resolving Fault "Next backup to restore cannot be found" (KBA1719)

 

 

KBA

KBA#1719

Applicable Delphix Versions

This article applies to all versions of the Delphix Engine.

Issue

In some environments, the Delphix Engine may detect a gap in the SQL Server backup history tables of a Source Database.

When this occurs, the Delphix Engine will raise the following fault:

Title: Next backup to restore cannot be found

Details: The next backup to be restored for dSource "StackOverflow" was not found in the msdb.dbo.backupset table.

User Action: Perform a sync on the dSource to continue pulling new changes from the source database.

Warning_2017-10-27-17-19-29.png

Troubleshooting Fault

This fault is raised when the Delphix Engine detects a gap in the SQL Server backup history tables: msdb.dbo.backup*.

These history tables are the Delphix Engine's source of truth for backup ingestion, during Snapshots and in all Validated Sync modes.

In the following example, metadata for a log backup of the StackOverflow database was not added to the msdb database due to a blocking issue:

database_name  backup_start_date        type  first_lsn           last_lsn            physical_device_name
StackOverflow  2017-10-27 08:15:01.000  L     286000000875600038  286000000877900001  ..._20171027_081501.trn
StackOverflow  2017-10-27 08:30:00.000  L     286000000877900001  286000000878200001  ..._20171027_083000.trn
StackOverflow  2017-10-27 08:45:01.000  L     286000000878200001  286000000878500001  ..._20171027_084501.trn
StackOverflow  2017-10-27 09:15:00.000  L     286000000878800001  286000000879100001  ..._20171027_091500.trn

The missing log backup is identifiable by the gap between the last_lsn of the 08:45 backup, and the first_lsn of the 9:15 backup.

The cause of missing backup history data will vary between environments, but may include:

  • Blocking or deadlocks on the backup metadata tables, preventing the SQL Server backup routines from inserting backup information
  • Intentional removal of backup information by DBAs to prevent the backup being used (for example, due to a corrupt backup, or the presence of sensitive data)
  • Backups were taken on an Availability Group node that has since been removed as a member of the Availability Group

If you need to investigate the cause of this issue, contact your Database Administrators. The Delphix Engine does not manipulate data in SQL Server backup tables, and this behavior is occurring outside the Delphix Engine.

Resolution

There are two possible resolutions to this issue:

  • Option 1: Perform a Snapshot of the dSource
  • Option 2: Locate and import missing backup metadata

Generally, the recommended approach to resolving this issue is to perform a Snapshot of the dSource.

Ingesting a recent Full or Differential backup will often allow you to "catch up" more quickly with your current log backups, and Validated Sync operations will automatically resume once the snapshot is complete.

Locating and importing missing backup data may be preferable if your Source Database is too large for a Snapshot to be practical, or if you cannot tolerate any "missing" snapshots in your timeflow (for backups that have been skipped).

Option 1: Perform a Snapshot of the dSource 

To take a Snapshot of the dSource, use the dSource's Snapshot button. This is available on the dSource Card, or in the Configuration tab, depending on your version.

2017-10-27-17-22-40.png

This will open a dialog and provide several options for re-synchronizing the Source Database:

2017-10-27-17-24-03.png

Once this snapshot is complete, a new Timeflow Card will be created to represent the new backup. Validated Sync operations will resume and new Timeflow Cards will be created for all backups created since the snapshot.

Depending on your backup schedule, it may be necessary to wait for the next full backup to be taken from the source system, or to initiate one manually.

Option 2: Locate and import missing backup metadata 

Connect to the affected SQL Server using SQL Server Management Studio. For Availability Group sources, it will be necessary to connect to all nodes of the Availability Group.

Use the backupset and backupmediafamily tables to check if metadata about one or more log backups is "missing". The following query can be used, substituting <DATABASE NAME> as appropriate:

SELECT TOP 100
  bs.database_name, bs.backup_start_date, bs.type, bs.first_lsn, bs.last_lsn, bmf.physical_device_name
FROM
  msdb.dbo.backupset bs INNER JOIN
  msdb.dbo.backupmediafamily bmf ON bs.media_set_id = bmf.media_set_id
WHERE
  database_name IN ('<DATABASE NAME>') AND bs.type = 'L'
ORDER BY
  bs.backup_start_date DESC

This will provide output similar to the following:

database_name  backup_start_date        type  first_lsn           last_lsn            physical_device_name
StackOverflow  2017-10-27 09:15:00.000  L     286000000878800001  286000000879100001  ..._20171027_091500.trn
StackOverflow  2017-10-27 08:45:01.000  L     286000000878200001  286000000878500001  ..._20171027_084501.trn
StackOverflow  2017-10-27 08:30:00.000  L     286000000877900001  286000000878200001  ..._20171027_083000.trn
StackOverflow  2017-10-27 08:15:01.000  L     286000000875600038  286000000877900001  ..._20171027_081501.trn

Missing backupsets can be identified by a gap between the last_lsn of one backup and the first_lsn of the next (note that the above results are in descending order, from most recent to earliest). The date of the missing backup should follow the date of the last card in the dSource's Timeflow:

2017-10-27-17-57-47.png

In the case of an Availability Group, run the above query on each of the members of the Availability Group. The server on which the gap is visible is the server on which the following steps should be performed.

Using the physical_device_name of the other rows, check this directory containing the other backups and look for the missing file.

2017-10-27-17-33-59.png

Once this is done, use the RESTORE VERIFY ONLY ... WITH LOADHISTORY command to import the metadata from the missing backup to your Source Instance. It is important that the directory name is consistent with the other records in msdb (using a drive letter or Windows file share as appropriate).

RESTORE VERIFYONLY FROM DISK = '<PATH TO BACKUP>' WITH LOADHISTORY;

Clear any faults relating to the dSources.

Finally, Disable and Enable the affected dSources, using the Disable/Enable Slider. This step is necessary because the default behaviour of this fault is to wait until a Snapshot is performed.

2017-10-27-18-00-51.png