Skip to main content
Delphix

Postgres Plugin: Provision fails with "The Database instance is down on host source" (KBA6397)

 

 

KBA

KBA# 6397

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases
6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0

5.3

5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

5.2

5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1

5.1

5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0

5.0

5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1 ,5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

4.3

4.3.1.0, 4.3.2.0, 4.3.2.1, 4.3.3.0, 4.3.4.0, 4.3.4.1, 4.3.5.0

4.2

4.2.0.0, 4.2.0.3, 4.2.1.0, 4.2.1.1, 4.2.2.0, 4.2.2.1, 4.2.3.0, 4.2.4.0 , 4.2.5.0, 4.2.5.1

4.1

4.1.0.0, 4.1.2.0, 4.1.3.0, 4.1.3.1, 4.1.3.2, 4.1.4.0, 4.1.5.0, 4.1.6.0

Troubleshooting dbDown during Postgres Plugin provision

UI Error:

dbDown-postgres.png

Error detail failure text (included for searchability):

The Database instance is down on host source.
Error Codetoolkit.postgres-1-4-0.dbDown

The symptom of this issue only occurs during provision of a Postgres Virtual Database. The cause of the issue occurs at the time of Linking or Re-sync and typically will stem from the fact that the Source prerequisite configuration changes were not applied.

Encountering this issue could imply:

1. A failure to update the postgresql.conf or pg_hba.conf on the Source database as per Delphix documentation.

2. Adjustments were made to the postgresql.conf and pg_hba.conf on the Source, but the database was not restarted prior to creating a full backup.

3. Synchronization has been enabled with the source database (below, Postgres dSource linking wizard) and incorrect credentials relating to the Postgres source were provided. The "Replication User" listed below is the user created as a role within the source database: 

CREATE ROLE delphix SUPERUSER LOGIN REPLICATION PASSWORD 'somepassword'

 

Postgres dSource linking wizard:

dSource-wizard.png

Symptoms 

Postgres Virtual Database (native) logging destination location $DATABASE_MOUNT_DIR/data/log/postgresql-Day.log.The following excerpt shows one style of failure, where the key symptom here is "password authentication failed". As suggested in the previous text there are several variations on the failure mode:

2020-09-01 03:45:08.744 UTC [28679] LOG:  entering standby mode
2020-09-01 03:45:08.752 UTC [28680] FATAL:  could not connect to the primary server: FATAL:  password authentication failed for user "delphix"
2020-09-01 03:45:08.758 UTC [28681] FATAL:  could not connect to the primary server: FATAL:  password authentication failed for user "delphix"
2020-09-01 03:45:13.763 UTC [28682] FATAL:  could not connect to the primary server: FATAL:  password authentication failed for user "delphix"
...
2020-09-01 03:48:26.754 UTC [29901] LOG:  invalid checkpoint record
2020-09-01 03:48:26.754 UTC [29901] FATAL:  could not locate required checkpoint record
2020-09-01 03:48:26.754 UTC [29901] HINT:  If you are not restoring from a backup, try removing the file "/mnt/provision/mydb/data/backup_label".
2020-09-01 03:48:26.755 UTC [29899] LOG:  startup process (PID 29901) exited with exit code 1
2020-09-01 03:48:26.755 UTC [29899] LOG:  aborting startup due to startup process failure
2020-09-01 03:48:26.758 UTC [29899] LOG:  database system is shut down

Postgres Virtual Database plugin log location $TOOLKIT_DIR/postgres/logs/delphix_postgres_debug.log.

[2020-08-30T05:28:30][DEBUG][virtual/configure.lua][startVirtual.sh]:[CMD: pg_startDatabase "/usr/pgsql-10/bin" "/mnt/provision/test22/data" "5434" ]
[2020-08-30T05:28:30][DEBUG][virtual/configure.lua][startVirtual.sh]:[Startup ...]
[2020-08-30T05:28:30][DEBUG][virtual/configure.lua][startVirtual.sh]:[CMD: /usr/pgsql-10/bin/pg_ctl -D /mnt/provision/test22/data start]
[2020-08-30T05:28:35][DEBUG][virtual/configure.lua][startVirtual.sh]:[ERROR message JSON: {
  "errorType": "ERROR",
  "messageID": "dbDown",
  "allParams": {
    "param1": "source"
  }
}]

Resolution

Two key points to understand when tackling this type of issue:

1. Configuration filespostgresql.confandpg_hba.confas per the Postgres database source running state are included in the backups.

2. Changes to these configuration files only take effect IF the database is stopped/restarted after editing the configuration files and the backup is created subsequently.

As per this documentation link, chose one of the two options provided below in the pg_hba.conf (where the second option is less secure, but could be used when troubleshooting to completely open access to the staging host):

# vi /var/lib/pgsql/10/data/pg_hba.conf 

Add the following:

host    all            all    1.2.3.4/32    trust  # Delphix Engine
host    all            all    5.6.7.8/32    trust  # Staging host
host    replication    all    5.6.7.8/32    trust  # Staging host

OR non-specifically (opening all IP addresses):

host    all            all    0.0.0.0/0            trust
host    replication    all    0.0.0.0/0            trust # Setting this to "password" will mean
                                                         # the password must be correctly entered

Example postgresql.conf:

listen_addresses = '*'
max_wal_senders = 4
wal_level = archive
archive_mode = on
archive_command = 'cp %p /tmp/archivelog/%f'
archive_timeout = 60
wal_keep_segments = 10

After making the above changes (on the Source database), they can be brought into effect as follows:

1. Stop the database:

# /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data stop
waiting for server to shut down.... done
server stopped

2. Start the database:

# /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data start

3. On the source remove the previous full backup and create a new one:

# rm /var/tmp/bk/*
# /usr/pgsql-10/bin/pg_basebackup -p 5432 -D /var/tmp/bk -F t -v -w -P -Xs

4. Zip the file (adhering to the correct naming convention) and transfer it to the staging host:

# zip PostgreSQL_T20200830.zip base.tar pg_wal.tar
# scp PostgreSQL_T20200830.zip postgres@1.2.3.4:/var/tmp/bk/.

5. Link the new backup file via the staging host and attempt to provision from the Snapshot created. Where a dSource is already linked use ReSync against a new full backup to potentially resolve this issue.

 

Diagnosibility

The error associated with this Knowledge Article is at best misleading. A bug has been filed to improve the messaging:

POST-268 Provision failure: dbDown is not correct