Masking PostgreSQL - Hangs, Blocks, and Deadlocks (KBA10089)
KBA
KBA# 10089At a Glance
Description | This article details possible root causes and solutions to Blocks and Deadlocks on PostgreSQL. | |
---|---|---|
Root cause | PostgreSQL, like other databases, uses locks on selected objects to prevent conflicts between concurrent transactions and to ensure data consistency. Since masking uses multiple transactions, locks can, in rare cases, cause the database operations to hang (block). A special case of a block is when PostgreSQL detects the block and terminates a transaction - this is called a Deadlock and will terminate the masking job. |
|
How to identify: | Below are a few ways to identify hung jobs on the Masking Engine: | |
Mask Hangs | On the masking engine, there are some clear signs that the Database is blocked:
... ThroughputLogger:interval 0.000 MB (0 rows) counted in 30.000 sec; throughput = 0.000 MB/sec ... |
|
SQL Blocks | To find the block that causes the hang, use the following SQL on the Postgres database (and look for UPDATE and SELECT from the masked table):
SELECT pid, locktype, relation::regclass, mode, transactionid AS tid, virtualtransaction AS vtid, granted FROM pg_catalog.pg_locks l LEFT JOIN pg_catalog.pg_database db ON db.oid=l.database WHERE db.datname IS NULL AND NOT pid = pg_backend_pid(); |
|
Deadlocks | PostgreSQL will log deadlocks in the database logs (pg_logs).
|
|
Solutions: | Two possible solutions have been listed below: | |
1 | Indexes & Constraints | The Masking Engine has built-in support for dropping and disabling indexes and constraints. This has been updated in version 11. Details: |
2 | OTF Masking | Another way to separate Read and Update on a table is to use On-The-Fly (OTF) masking: |
Best practice | There are some best practices for an In-Place masking job. These points will reduce the chance of encountering a hang (or deadlock). The SELECT statement fetching the data will define the order in which data is read. The best options are when only data rows are accessed in the order the data is stored and when no other database objects are accessed (for example indexes). Therefore, consider the following:
|
|
More Info | More info on how to access Masking logs:
PostgreSQL documentation - Concurrency Control with sections about isolation, locking, and consistency:
|
Applicable Delphix Versions
- Click here to view the versions of the Delphix engine to which this article applies
-
Date Release Apr 13, 2023 10.0.0.0 Mar 13, 2023 | Mar 20, 2023 9.0.0.0 | 9.0.0.1 Feb 13, 2023 8.0.0.0 Jan 12, 2023 7.0.0.0 Releases Prior to 2023 Major Release All Sub Releases 6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.8.1, 6.0.9.0, 6.0.10.0, 6.0.10.1, 6.0.11.0, 6.0.12.0, 6.0.12.1, 6.0.13.0, 6.0.13.1, 6.0.14.0, 6.0.15.0, 6.0.16.0, 6.0.17.0, 6.0.17.1, 6.0.17.2
5.3
5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0, 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0 5.2
5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1
5.1
5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0
5.0
5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1, 5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4
Blocks and Deadlocks
One of the harder issues to resolve is when a database creates a hung masking job due to locks. If the database resolves this by killing one of the processes, the lock is called a deadlock.
In masking, the error happens since the database takes locks on the UPDATE and SELECT and, for some reason, these are not compatible with each other (or another process).
Possible Scenarios
There are a few possible scenarios for why locks and deadlock happens:
- The record order of the fetch is scattered (i.e. due to the order in an index, SORT, JOIN, or WHERE) resulting in scattered locks on SELECT.
- Due to the above, the updated pages are also scattered.
- The number of locks or limited resources on the databases causes lock escalations.
- The number of pages locked increases when WHERE is used.
- External processes (queries) are running while the database is masked.
Masking Jobs
Execution hung
There are no errors or warnings when the database is blocked. The masking engine will just wait for the database to return the status from the commit (Update) which is not happening (as it is waiting for a lock to release).
That the database queries have become unresponsive can be seen on the masking engine in these three areas:
Job Monitor - performance
Masking performance drops without more records being masked.
Job Monitor - masked rows
Masked Rows in the Job Monitor will be stuck at *9,999 (a multiple of Commit Size -1 (for example 19,999)).
Job Logs
The logs show no actions except multiple Throughput logs with 0 rows - similar to this one:
... ThroughputLogger:interval 0.000 MB (0 rows) counted in 30.000 sec; throughput = 0.000 MB/sec ...
Execution deadlocked
A deadlock is when the database deadlock monitor thread has identified an unresolvable block and terminates one of the processes. This will resolve the lock but also terminate the masking job.
The error seen in the Masking Logs is something like this:
ERROR: deadlock detected Detail: Process [nnn] waits for [Lock] on transaction [mmm]; blocked by process [ppp].
Why do lock issues happen?
Databases use locks to ensure data consistency and maintain transactional integrity. Locks are employed to manage concurrent access to shared resources within the database, such as tables, rows, or even specific fields.
Here are some reasons why the lock can happen on masking.
INDEX on Masked column
Indexes are reference tables created to locate data more efficiently. When an indexed column is masked, both the table and the index need to be updated. Since the index is a reference table (frequently a b-tree or hash table), this table needs to be updated and re-arranged. As such, indexes can take more locks than the masked table.
CONSTRAINT on Masked column
Many Constraints use Indexes to improve performance and for checking data in the constraint. Constraints will therefore also take locks.
WHERE clause used in Custom SQL
Where clauses can change the Execution Plan and the order of the fetched data. This is especially true if there is an Index on the column in the predicate (where). If an index scan is used, the SELECT might access records in a 'non-logical' order, which will increase the number of locks.
Note: the best order is a Full Table Scan.
ORDER BY in the Rule Set SQL
Order By will change the order in which the data is fetched. This will result in scattered and fragmented reads and increase the number of locks. Do not use ORDER BY.
UNION in the Rule Set SQL
Union will cause multiple passes through the data and hence cause the SELECT and UPDATE locks to intersect. Do not use UNION.
Row count executed while table being masked
Before the masking job starts, the masking engine will execute a SELECT COUNT query on all tables in the masked Rule Set. Most of the time, the result of this is returned before the job starts, but for large tables the locks on SELECT COUNT can cause issues.
The SQL for SELECT is the same here, so all points listed above do apply to the SELECT COUNT as well.
Parallel processing on SELECT
It is best to not use parallel processing on SELECT. PostgreSQL suggests using this feature with caution. Furthermore, the SELECT is not the slow part in masking - UPDATE is.
There are likely other possible causes as to why locks will cause the masking job to hang. To understand why, the first step is to understand what type of locks and what objects and transactions are locking. How to do that, coming up...
Troubleshooting
In order to understand why a lock or deadlock happened, we need to collect data from the database. The only time to see this is when the lock is happening.
Use the SQL commands below to collect data:
1: Get current active processes
This SQL lists active processes. Make note of the PID.
SELECT pid, datname, usename, state, query FROM pg_stat_activity WHERE state = 'active';
-- Query in multiple lines -- ------------------------ SELECT pid, datname, usename, state, query FROM pg_stat_activity WHERE state = 'active';
2: Get lock status on current processes
This SQL provides all current locks on the database (note that this list can be long on a busy database). Check the relevant PIDs from the query above.
SELECT pid, locktype, relation::regclass, mode, transactionid AS tid, virtualtransaction AS vtid, granted FROM pg_catalog.pg_locks l LEFT JOIN pg_catalog.pg_database db ON db.oid=l.database WHERE db.datname IS NULL AND NOT pid = pg_backend_pid();
-- Query in multiple lines -- ------------------------ SELECT pid, locktype, relation::regclass, mode, transactionid AS tid, virtualtransaction AS vtid, granted FROM pg_catalog.pg_locks l LEFT JOIN pg_catalog.pg_database db ON db.oid=l.database WHERE db.datname IS NULL AND NOT pid = pg_backend_pid();
3: Find process in PG_LOGS
More details might be found in the PG_LOGS.
- Get process ID from queries above.
- Replace '<pid of the process>' with the process ID.
- Look in the latest log file.
grep p:<pid of the process> [latest log].log
Summary
The masking engine is a simple ETL process that reads, masks, and updates masked records. The masking engine will depend on how locks on the database are taken.
If a masking job has hung or deadlocked, the first step is to check the database and determine why.