Skip to main content
Delphix

Masking PostgreSQL - Hangs, Blocks, and Deadlocks (KBA10089)

 

 

KBA

KBA# 10089

At a Glance   

Description This article details possible root causes and solutions to Blocks and Deadlocks on PostgreSQL. 
Root cause PostgreSQL, like other databases, uses locks on selected objects to prevent conflicts between concurrent transactions and to ensure data consistency. 

Since masking uses multiple transactions, locks can, in rare cases, cause the database operations to hang (block). A special case of a block is when PostgreSQL detects the block and terminates a transaction - this is called a Deadlock and will terminate the masking job.
How to identify: Below are a few ways to identify hung jobs on the Masking Engine:
  Mask Hangs On the masking engine, there are some clear signs that the Database is blocked:
  1. Masking performance drops without more records being masked.
  2. The Masked Rows is stuck at a multiple of Commit Size -1 (for example 9,999). 
  3. The logs show no actions except multiple Throughput logs similar to this one:
...
ThroughputLogger:interval 0.000 MB (0 rows) counted in 30.000 sec; throughput = 0.000 MB/sec
...
SQL Blocks To find the block that causes the hang, use the following SQL on the Postgres database (and look for UPDATE and SELECT from the masked table):
SELECT pid, locktype, relation::regclass, mode, transactionid AS tid, virtualtransaction AS vtid, granted
FROM pg_catalog.pg_locks l LEFT JOIN pg_catalog.pg_database db ON db.oid=l.database 
WHERE db.datname IS NULL AND NOT pid = pg_backend_pid();
Deadlocks PostgreSQL will log deadlocks in the database logs (pg_logs). 
ERROR:  deadlock detected
Detail: Process [nnn] waits for [Lock] on transaction [mmm]; blocked by process [ppp].
Solutions: Two possible solutions have been listed below:
1 Indexes & Constraints The Masking Engine has built-in support for dropping and disabling indexes and constraints. This has been updated in version 11. Details:
2 OTF Masking Another way to separate Read and Update on a table is to use On-The-Fly (OTF) masking:
Best practice There are some best practices for an In-Place masking job. These points will reduce the chance of encountering a hang (or deadlock). 

The SELECT statement fetching the data will define the order in which data is read. The best options are when only data rows are accessed in the order the data is stored and when no other database objects are accessed (for example indexes).

Therefore, consider the following: 
  1. Ensure Indexes and Constraints are dropped and disabled on masked columns. 
  2. Try to avoid using WHERE and JOIN which could change the order of the data or cause multiple scans.
  3. Avoid using UNION and ORDER BY.
  4. Ensure there is no external transaction running which will change the data on the masked table. This includes Triggers (which should be disabled in this case). 
More Info More info on how to access Masking logs: 

PostgreSQL documentation  - Concurrency Control with sections about isolation, locking, and consistency: 

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Date Release
Apr 13, 2023 10.0.0.0
Mar 13, 2023 | Mar 20, 2023 9.0.0.0 | 9.0.0.1
Feb 13, 2023 8.0.0.0
Jan 12, 2023 7.0.0.0
Releases Prior to 2023
Major Release All Sub Releases
6.0

6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.8.1, 6.0.9.0, 6.0.10.0, 6.0.10.1, 6.0.11.0, 6.0.12.0, 6.0.12.1, 6.0.13.0, 6.0.13.1, 6.0.14.0, 6.0.15.0, 6.0.16.0, 6.0.17.0, 6.0.17.1, 6.0.17.2

5.3

5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0, 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

5.2

5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1

5.1

5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0

5.0

5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1, 5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

Blocks and Deadlocks 

One of the harder issues to resolve is when a database creates a hung masking job due to locks. If the database resolves this by killing one of the processes, the lock is called a deadlock.

In masking, the error happens since the database takes locks on the UPDATE and SELECT and, for some reason, these are not compatible with each other (or another process).  

Possible Scenarios

There are a few possible scenarios for why locks and deadlock happens: 

  • The record order of the fetch is scattered (i.e. due to the order in an index, SORT, JOIN, or WHERE) resulting in scattered locks on SELECT. 
  • Due to the above, the updated pages are also scattered. 
  • The number of locks or limited resources on the databases causes lock escalations. 
  • The number of pages locked increases when WHERE is used. 
  • External processes (queries) are running while the database is masked.

 

Masking Jobs 

Execution hung

There are no errors or warnings when the database is blocked. The masking engine will just wait for the database to return the status from the commit (Update) which is not happening (as it is waiting for a lock to release).
 

That the database queries have become unresponsive can be seen on the masking engine in these three areas: 

Job Monitor - performance 

Masking performance drops without more records being masked.

Job Monitor - masked rows

Masked Rows in the Job Monitor will be stuck at *9,999 (a multiple of Commit Size -1 (for example 19,999)).

Job Logs 

The logs show no actions except multiple Throughput logs with 0 rows - similar to this one:

...
ThroughputLogger:interval 0.000 MB (0 rows) counted in 30.000 sec; throughput = 0.000 MB/sec
...

Execution deadlocked 

A deadlock is when the database deadlock monitor thread has identified an unresolvable block and terminates one of the processes. This will resolve the lock but also terminate the masking job.

The error seen in the Masking Logs is something like this: 

ERROR:  deadlock detected
Detail: Process [nnn] waits for [Lock] on transaction [mmm]; blocked by process [ppp].

Why do lock issues happen? 

Databases use locks to ensure data consistency and maintain transactional integrity. Locks are employed to manage concurrent access to shared resources within the database, such as tables, rows, or even specific fields.

Here are some reasons why the lock can happen on masking.

INDEX on Masked column

Indexes are reference tables created to locate data more efficiently. When an indexed column is masked, both the table and the index need to be updated. Since the index is a reference table (frequently a b-tree or hash table), this table needs to be updated and re-arranged. As such, indexes can take more locks than the masked table.

CONSTRAINT on Masked column

Many Constraints use Indexes to improve performance and for checking data in the constraint. Constraints will therefore also take locks. 

WHERE clause used in Custom SQL 

Where clauses can change the Execution Plan and the order of the fetched data. This is especially true if there is an Index on the column in the predicate (where). If an index scan is used, the SELECT might access records in a 'non-logical' order, which will increase the number of locks.

Note: the best order is a Full Table Scan

ORDER BY in the Rule Set SQL 

Order By will change the order in which the data is fetched. This will result in scattered and fragmented reads and increase the number of locks.  Do not use ORDER BY.

UNION in the Rule Set SQL 

Union will cause multiple passes through the data and hence cause the SELECT and UPDATE locks to intersect. Do not use UNION.

Row count executed while table being masked 

Before the masking job starts, the masking engine will execute a SELECT COUNT query on all tables in the masked Rule Set. Most of the time, the result of this is returned before the job starts, but for large tables the locks on SELECT COUNT can cause issues. 

The SQL for SELECT is the same here, so all points listed above do apply to the SELECT COUNT as well. 

Parallel processing on SELECT 

It is best to not use parallel processing on SELECT. PostgreSQL suggests using this feature with caution. Furthermore, the SELECT is not the slow part in masking - UPDATE is. 

 

There are likely other possible causes as to why locks will cause the masking job to hang. To understand why, the first step is to understand what type of locks and what objects and transactions are locking. How to do that, coming up...

Troubleshooting 

In order to understand why a lock or deadlock happened, we need to collect data from the database. The only time to see this is when the lock is happening.

Use the SQL commands below to collect data: 

1: Get current active processes  

This SQL lists active processes. Make note of the PID.

SELECT pid, datname, usename, state, query FROM pg_stat_activity WHERE state = 'active';
-- Query in multiple lines
-- ------------------------
SELECT pid, datname, usename, state, query 
FROM pg_stat_activity 
WHERE state = 'active';

2: Get lock status on current processes

This SQL provides all current locks on the database (note that this list can be long on a busy database). Check the relevant PIDs from the query above.

SELECT pid, locktype, relation::regclass, mode, transactionid AS tid, virtualtransaction AS vtid, granted FROM pg_catalog.pg_locks l LEFT JOIN pg_catalog.pg_database db ON db.oid=l.database WHERE db.datname IS NULL AND NOT pid = pg_backend_pid();
-- Query in multiple lines
-- ------------------------
SELECT 
   pid, 
   locktype, 
   relation::regclass,
   mode, 
   transactionid AS tid, 
   virtualtransaction AS vtid, 
   granted
FROM pg_catalog.pg_locks l 
LEFT JOIN pg_catalog.pg_database db ON db.oid=l.database 
WHERE db.datname IS NULL AND NOT pid = pg_backend_pid();

3: Find process in PG_LOGS  

More details might be found in the PG_LOGS.

  • Get process ID from queries above.
  • Replace '<pid of the process>' with the process ID.
  • Look in the latest log file.
grep p:<pid of the process> [latest log].log

Summary

The masking engine is a simple ETL process that reads, masks, and updates masked records. The masking engine will depend on how locks on the database are taken. 

If a masking job has hung or deadlocked, the first step is to check the database and determine why.