Masking Job Monitor, Negative Row Count, Calculating (KBA6686)
KBA
KBA# 6686
At a Glance
Description: | This article details possible root causes for negative row counts in the Job Monitor for database masking jobs. |
---|---|
Root causes: | The root causes are likely one of the following:
|
Calculating Rows: | The Monitor page shows 'Calculating...' when the row count is unknown. This can be due to: 1. it has not been calculated yet by the database, or 2. it has not been registered by the engine (tracked in DLPX-72404). See C1: No or delayed Row Count below. |
How to identify: | To identify the root cause, look at these values:
|
Investigating logs: | Use grep or select-string with the following filter (more detail in Investigating the Logs below):
"count|Insert.*proc|Update.* proc|Table.* proc" |
More info: | More info on how to access logs:
For masking columns with PK (applies to Clustered Indexes as well): |
Applicable Delphix Versions
- Click here to view the versions of the Delphix engine to which this article applies
-
Date Release Feb 13, 2023 8.0.0.0 Jan 12, 2023 7.0.0.0 Releases Prior to 2023 Major Release All Sub Releases 6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2. 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.9.0, 6.0.10.0, 6.0.11.0, 6.0.12.0, 6.0.12.1, 6.0.13.0, 6.0.13.1, 6.0.14.0, 6.0.15.0, 6.0.16.0, 6.0.17.0, 6.0.17.1, 6.0.17.2
5.3
5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0 5.2
5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1
5.1
5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0
5.0
5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1 ,5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4
4.3
4.3.1.0, 4.3.2.0, 4.3.2.1, 4.3.3.0, 4.3.4.0, 4.3.4.1, 4.3.5.0
4.2
4.2.0.0, 4.2.0.3, 4.2.1.0, 4.2.1.1, 4.2.2.0, 4.2.2.1, 4.2.3.0, 4.2.4.0 , 4.2.5.0, 4.2.5.1
4.1
4.1.0.0, 4.1.2.0, 4.1.3.0, 4.1.3.1, 4.1.3.2, 4.1.4.0, 4.1.5.0, 4.1.6.0
Negative Row Count
Negative Row Count is referring to when the Remaining row number is negative.
Most of the time this is a non-critical issue. Though, there are times when this does indicate a critical issue.
This article has been created to assist in investigating this issue.
Troubleshooting
What is needed:
- From the UI:
- The Row Count
- The Remaining rows (negative number)
- The Rows per Minute
- From Database (or File):
- The total number of rows to be masked before masking
- The total number of rows to be masked after masking
- From the Logs:
- Finished processing counts from:
- Table Input
- On-The-Fly: Table Output or DelphixTableInsert
- In-Place: Update or DelphixTableUpdate
- Finished processing counts from:
Causes
There are a few reasons for a negative row count. As seen below, some do not affect the masking job while others are a sign of a critical issue with the masking job.
Not Critical
C1: No or delayed Row Count
When the masking job is initiated, a separate thread is created to count the number of records in each table. This is accomplished by sending queries with 'SELECT COUNT(*)' to the database (the source database using OTF).
Depending on the data, how the data is stored, and WHERE clauses, the row count might take longer than actually masking the table. This will result in the row count being negative.
The time each count was returned is logged in the Masking Engine logs. If this entry is not in the log, then the count never arrived.
2022-10-24 12:16:47,799 ... After calculating count. Total count is :: 12345678
Investigation
- Rows in the UI might show -1 or 0 depending on the Masking Engine version. This indicates that the Row Count has not yet been calculated.
# Cause 1 # Values at the End of Execution +-------------+--------+------------------------+------------------------+--------+ | | DB | UI | Logs | DB | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | IP + OTF | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count | = | 0 | 0 | -X | = | = | = | = | | Table count | = | -1 | 0 | -X | = | = | = | = | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
This is caused by the count query taking a long time to count the rows in the table. If the table is a 'scattered' heap table, then the count can be significantly delayed. Also, check for deadlocks in the database logs.
One solution would be to optimize the table (re-order the data or add a Clustered Index).
C2: Row Count Deadlocked
One reason for the row count arriving too late is that the SELECT COUNT(*) query was deadlocked and was terminated.
There are no errors in the logs when this happens but the event should be recorded in the database logs.
If the following log entry is NOT in the logs for the table then check the database logs.
## This log line is missing 2022-... After calculating count. Total count is ::
Investigation
- The same as with RC-1. The UI might show -1 or 0 depending on the version. This indicates that the Row Count did not arrive.
# Cause 2 # Values at the End of Execution +-------------+--------+------------------------+------------------------+--------+ | | DB | UI | Logs | DB | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | IP + OTF | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count | = | 0 | 0 | -X | = | = | = | = | | Table count | = | -1 | 0 | -X | = | = | = | = | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
The SELECT COUNT query can deadlock if a required object is also accessed by another query. This can be because the data in the table is a 'scattered' heap table and rows are read in a non-optimal way.
- Check the database (error) logs for deadlock.
- If this needs to be resolved - the solution could be to:
- Optimize the table - using Shrink or Rebuild database commands.
- Re-order the table.
- Add a Clustered Index.
Critical
These issues are seen as critical as the masking operation has been affected and the data is no longer consistent.
C3: IP - Masking Primary/Logical Key
Masking a column that is part of the Primary Key or Logical Key when it is also used as the URI (Unique Row Identifier) will cause many strange issues (this includes duplicated records, deadlocks, and additional records added).
Additional records have been observed on DB2 databases.
Investigation
# Cause 3 # Values at the End of Execution +-------------+--------+------------------------+------------------------+--------+ | | DB | UI | Logs | DB | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | IP | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count | = | = | n | -X | = | +X | +X | +X | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
To mask a column that is part of a Primary Key, please read KBA1776 about PK masking (link provided in the Related Articles section).
C4: IP - Masking Clustered Index
Masking a column that is part of a Clustered Index (this can happen even if the index is dropped) can in some databases cause 'phantom reads' due to masked data changing the order of the records in the table.
This has been observed on Sybase.
Investigation
# Cause 4 # Values at the End of Execution +-------------+--------+------------------------+------------------------+--------+ | | DB | UI | Logs | DB | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | IP | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count | = | = | n | -X | = | +X | +X | = | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
Drop the Clustered Index (as it could be part of the primary key, please read KBA1776 about PK masking (link provided in the Related Articles section)). It might be necessary to create a new Clustered Index and Logical Key.
C5: IP - Triggers adding records
Negative Row count can be caused by additional records added to the table during the masking operation. Since the row count was collected prior to the job starting, this will result in a negative row count.
The additional records can be a result of Triggers or other processes, such as PreSQL scripts, configured in the masking job or external processes.
Investigation
# Cause 5 # Values at the End of Execution +-------------+--------+------------------------+------------------------+--------+ | | DB | UI | Logs | DB | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | IP | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count | = | = | n | -X | = | +X | +X | +X | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
There should be no additional operation running while masking the data. Please check the table for:
- Triggers.
- SQL Scripts.
- External processes (including replications).
C6: IP/OTF - Duplicates from SELECT
This is caused by the SELECT statement reading or accessing data twice.
- In an In-Place (IP) job, this is usually caused by 'Phantom Reads' (sometimes also referred to as 'Dirty Reads').
- In an On-The-FLy (OTF) job, this could be corruption in the SELECT or JDBC and cause additional records (root cause unknown).
Investigation
- IP job will have the same number of records before and after masking but negative rows Remaining.
- OTF job will have X additional number of records after masking. The same number will show up as negative Remaining.
$ Cause 6 # Values at the End of Execution +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | DB | UI | Logs | DB | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count (IP) | = | = | n | -X | = | +X | +X | = | | Table count (OTF) | = | = | n | -X | = | +X | +X | +X | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution - IP
The root cause is likely to be the Isolation Level (Dirty Reads). The SELECT statement read some records twice due to read-consistency issues on the database.
As the issue is not understood, please raise a support case when encountering this issue.
Solutions to try:
- Set the Isolation Level to Read Committed.
- Ensure the table is ordered:
- Create a new Clustered Index on a URI (any existing Clustered Index needs to be dropped).
- Optimize the table (Reorder).
- Use On-The-Fly (OTF).
Comment: Do not use nolock on SELECT.
Resolution - OTF
The issue here is not yet known. One possible solution could be to be cast the input value(s). The solution will depend on the algorithm used - for Secure Lookup, casting the value to HEX (using BINTOHEX) or HASH might resolve the issue.
Otherwise, please raise a support ticket with Delphix Support.
C7: IP/OTF - Duplicates from Masking
There could be an issue with an algorithm (or other) that causes additional records to be created.
Investigation
The issue here is that the masking engine generates additional records. This can be seen by looking at the logs and if only the output has additional records this is the issue.
# Cause 7 # Values at the End of Execution +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | DB | UI | Logs | DB | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count (IP) | = | = | n | -X | = | = | +X | = | | Table count (OTF) | = | = | n | -X | = | = | +X | +X | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
- If you are using Custom algorithms, please contact the author of the algorithm.
- If you are using Out-of-the-Box algorithms, please raise a support ticket with Delphix Support.
C8: OTF - Target same as Source
This is very similar to RC-4. This is also a configuration issue. The Target Connector has been set to the same as the Source Connector.
This will result in masked data inserted into the source table adding to the number of records to mask.
- If the number of records is small enough to fit buffers - the output will be twice the size (x+x).
- If the number of records is larger than the buffer size - the SELECT will start fetching records from the output - size increase unknown.
Investigation
x
indicates the number of records on a small table.X
indicates the number of records on a large table.y
is a factor related to how many times the input is fetching the output before the job finishes.
# Cause 8 # Values at the End of Execution +------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | DB | UI | Logs | DB | +------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count (s) | x | x | n | 0 | x | x | x | x+x | | Table count (l) | X | X | n | X-y*X | X | y*X | y*X | y*X+X | +------------------+--------+------+-----+-----------+-------+-------+--------+--------+ # s = Small table (few records) # l = Large table (a large number of records)
Resolution
This issue is due to the Target being set to the Source.
- Change the Target and Source connectors to connect to the correct databases/instances.
- Job Configuration - set Batch Mode to on (performance) and usually Truncate should be on.
Investigating the Logs
The logs are available from the UI (Home > Admin > Logs) or can be accessed using the API.
To get the key details from the logs:
- Change all
1234
to the execution ID for the job execution.
Linux
grep -E "JOB_ID_.*1234|count.*execution .1234." info.log | grep -E "count|Update.* proc|Insert.*proc|Table.* proc"
Windows PowerShell
Select-String "JOB_ID_.*1234|count.*execution .1234." info.log | Select-String "count|Insert.*proc|Update.* proc|Table.* proc"
Related Articles
The following articles may provide more information or related information to this article:
- Masking Columns with PK and FK Constraints (KBA1776)
- How to Troubleshoot Masking Jobs and Access Masking Logs (KBA7988)