What are SQL Server AlwaysOn Availability Groups?
An Availability Group creates a failover environment for a set of databases. This set of databases is constantly synced across a discrete set of replicas, which are basically instances sitting on separate physical hosts. Of these two to five replicas, one is the primary, and it accepts read/write requests. The other secondary replicas can optionally be configured to provide read-only access. This is one significant advantage over normal failover clustering, since secondary machines can now have their storage leveraged. An Availability Group also syncs its replicas without shared storage, the second significant advantage for customers. Replica instances are in constant communication with one another to stay up to date.
Availability Group Configuration
This diagram from Microsoft also provides a helpful visual for Availability Group Configuration:
Availability Group Object Model
The above object model brings up some important details for what the user will see upon discovering an environment. Currently, the GUI shows an environment as two tabs: Environment Details and Databases. A Windows Cluster environment can work the same way.
Environment Details will show Basic Information about the Windows Cluster, and it will list the cluster nodes. Highlighting a given cluster node will display details about the physical host, much like the display for Oracle clusters.
Currently, databases displays each instance in bold, listing its databases underneath. We will only discover databases that are part of an Availability Group. Databases that are not in an Availability Group cannot use any of the Windows Cluster features, so they will not be shown as part of the Windows Cluster. These standalone databases must be added separately. Clusters will require a little more detail. If a set of databases is part of an Availability Group, then the Availability Group name will be shown in bold, and the databases will be listed underneath. The unique instances in the group will also be shown under the bold group name.
The cluster discovery process looks like this:
1) Make sure that the Cluster IP Address provided is the correct address
2) From the default cluster node, find all the nodes in the cluster (registry)
3) On each node (java layer):
- run AGDiscovery to get all the instances on the node (powershell):
- for each instance in the registry:
- (registry) make sure the instance version is eligible for AG (>= 11.0)
- (registry) make sure the hadr_enabled switch is on
- (sql) query for the AG groups on this instance
- (sql) query for the databases in the AG groups on this instance
- Update Metadata
Standalone databases can be linked as they are now. Databases which are part of an Availability Group can be linked through the Availability Group Listener. We will impose a restriction that all of the identical databases in an Availability Group use a shared backup location. If database “foo” exists on instance1, instance2, and instance3, then it should have the same backup path on each of those instances.
The pre-provisioning process for Availability Groups requires querying all of the nodes in the group. Because backup strategies for Availability Groups can vary greatly, Delphix cannot anticipate where the next backup will be taken. Only the node which took the backup has the information about where to find the backup files. So, on each pre-provisioning run, Delphix queries all of the nodes for a list of backups that were taken since the latest restored backup. Then, it compares these backups to figure out which one is the earliest.
We can detect if a backup is missing by comparing the LSNs of the next backup to be restored and the latest restored backup. Availability Groups do not allow backups to be taken in simple mode, so log chain break is impossible. This means that if there is a gap between the last LSN of the latest restored backup and the first LSN of the next backup to be restored, we know that a backup is missing. If we notice that some nodes are down as well, then we can alert the user and suggest that the node with the next backup is down.
However, pre-provisioning in one job leaves the Delphix Engine vulnerable to losing data. In the time that Delphix spends restoring one backup, another node may take a backup and go down. The best way to allow pre-provisioning to run as fast as possible and miss as few backups as possible is to split pre-provisioning into two asynchronous jobs.
Job1 will query for new backups and cache their locations. This job will run every minute. If the database is not part of an availability group, then Job1 will simply query it for a backup. If the database is part of an availability group, then Job1 will query all of the identical databases in the availability group.
Job2 will look at the cached list of backups that need to be applied, and it will restore each one.
How do you Setup Linking Databases on Windows Failover Cluster?
You can Link to databases on Windows Server Failover Clusters (WSFC).
You can use databases on Windows Server Failover Clusters (WSFC) as data sources (dSources). Add the Environment as described below, based on which WSFC feature the source databases use:
- Failover Cluster Instances
Add the environment as a standalone source using the cluster name or address.
- AlwaysOn Availability Groups
Add the environment as a cluster source using the cluster name or address.
Databases which are part of an Availability Group can be linked through the Availability Group Listener. We will impose a restriction that all of the identical databases in an Availability Group use a shared backup location. If database “foo” exists on instance1, instance2, and instance3, then it should have the same backup path on each of those instances.
Backup location and Autodiscover
The GUI only supports one backup location when linking a dSource.
Some customers indicate that this is problematic:
"This is a concern to us as we could have 4 different hosts in the AG and Delphix is requesting that we write all backups to one central location for all. Previously we requested Delphix look at msdb and extract the backup location from there along with using the MS written function that tells you which is the primary DB location. I have been to various MS seminars and their advice was to backup up each server to their own location and not try and write all server to one central are. How can we overcome this?"
There are a couple of options:
- Use a shared backup location that is an SMB share like \\common\backup, and then each individual host can backup at \\common\backup\host<number>
since we do a recursive search for the file path, this would work as long as they create this directory structure and expose the parent over SMB
- The other option is to use a feature we shipped in 4.2 (DLPX-32466) that takes the backup path from SQL Server backup set metadata tables.
When taking each backup use a SMB share that is accessible by the staging server, something like:
backup database <db> to disk = '\\host1\backups\backup1'
For the other hosts do:
backup database <db> to disk = '\\host2\backups\backup1', etc., and then make sure the backups are taken to an SMB shared path, not a local path like C:\backups\backup1
Delphix will pick up the full path from the SQL Server tables and use that instead. Then there is no need to specify a shared backup location.
This is handled in 4.2 and on by setting "AutoDiscover Backup Path" to enabled in the GUI when linking a dSource.
The Delphix Engine documentation Linking Data Sources with SQL Server discusses the use of the Autodiscover and Backup Path options when linking a dSource.