Masking Job Monitor and Negative Row Count (KBA6686)
KBA
KBA# 6686
At a Glance
Description: | This article details possible root causes for negative row counts in the Job Monitor for database masking jobs. |
---|---|
Root causes: | The root causes are likely one of the following:
|
How to identify: | To identify the root cause, look at these values:
|
Applicable Delphix Versions
- Click here to view the versions of the Delphix engine to which this article applies
-
Major Release All Sub Releases 6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2. 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.9.0, 6.0.10.0, 6.0.11.0, 6.0.12.0 5.3
5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0 5.2
5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1
5.1
5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0
5.0
5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1 ,5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4
4.3
4.3.1.0, 4.3.2.0, 4.3.2.1, 4.3.3.0, 4.3.4.0, 4.3.4.1, 4.3.5.0
4.2
4.2.0.0, 4.2.0.3, 4.2.1.0, 4.2.1.1, 4.2.2.0, 4.2.2.1, 4.2.3.0, 4.2.4.0 , 4.2.5.0, 4.2.5.1
4.1
4.1.0.0, 4.1.2.0, 4.1.3.0, 4.1.3.1, 4.1.3.2, 4.1.4.0, 4.1.5.0, 4.1.6.0
Negative Row Count
Negative Row Count is referring to when the Remaining row number is negative.
This occasionally happens on the engine and most of the time this is a non-critical issue. There are times when this indicates there is a critical issue at play, though.
This KBA has been created in order to assist in investigating this issue.
Troubleshooting
What is needed:
- From the UI:
- The Row Count
- The Remaining rows (negative number)
- The Rows per Minute
- From Database (or File):
- The total number of rows to be masked before masking
- The total number of rows to be masked after masking
- From the Logs:
- Finished processing counts from:
- Table Input
- On-The-Fly: Table Output or DelphixTableInsert
- In-Place: Update or DelphixTableUpdate
- Finished processing counts from:
Root Causes
There are a few reasons for a negative row count. As seen below, some do not affect the masking job while others are a sign of a critical issue with the masking job.
Not Critical
RC-1: No or delayed Row Count
When the masking job is initiated, a separate thread is created to count the number of records in each table. This is accomplished by sending queries with 'SELECT COUNT(*)' to the database (the source database using OTF).
Depending on the data, how the data is stored, and WHERE clauses, the row count might take longer than actually masking the table. This will result in the row count being negative.
The time each count was returned is logged in the Masking Engine logs. If this entry is not in the log, then the count never arrived.
2020-10-24 12:16:47,799 ... After calculating count. Total count is :: 12345678
Investigation
- Rows in the UI might show '-1' or '0' depending on the Masking Engine version. This indicates that the Row Count has not yet been calculated.
# RC-1 # Values at the End of Execution +-------------+--------+------------------------+------------------------+--------+ | | DB | UI | Logs | DB | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | IP + OTF | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count | = | 0 | 0 | -X | = | = | = | = | | Table count | = | -1 | 0 | -X | = | = | = | = | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
This is caused by the count query taking a long time to count the rows in the table. If the records in the table is a 'scattered' heap table, then the count can take a long time. Also, check for deadlocks in the database logs.
One solution here can be to optimize the table (re-order the data or add a Clustered Index).
RC-2: Row Count Deadlocked
One reason for the row count arriving too late is that the SELECT COUNT(*) query was deadlocked and was terminated.
There are no errors in the logs when this happens but the event should be recorded in the database logs.
If the following log entry is NOT in the logs for the table then check the database logs.
## This log line is missing 2021-... After calculating count. Total count is ::
Investigation
- The same as with RC-1. The UI might show '-1' or '0 depending on the version. This indicates that the Row Count did not arrive.
# RC-2 # Values at the End of Execution +-------------+--------+------------------------+------------------------+--------+ | | DB | UI | Logs | DB | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | IP + OTF | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count | = | 0 | 0 | -X | = | = | = | = | | Table count | = | -1 | 0 | -X | = | = | = | = | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
The SELECT COUNT query can deadlock if a required object is also accessed by another query. This can be because the data in the table is a 'scattered' heap table and rows are read in a non-optimal way.
- Check the database (error) logs for deadlock.
- If this needs to be resolved - the solution could be to:
- Optimize the table - using Shrink or Rebuild database commands.
- Re-order the table.
- Add a Clustered Index.
Critical
RC-3: In-Place - Masking Primary Key
Masking a column that is part of the Primary Key when it is also used as the URI (Unique Row Identifier) will cause many strange effects (this includes duplicated records, deadlocks, and additional records added).
Additional records have been noted on DB2 databases.
Investigation
# RC-3 # Values at the End of Execution +-------------+--------+------------------------+------------------------+--------+ | | DB | UI | Logs | DB | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | IP | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count | = | = | n | -X | = | +X | +X | +X | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
To mask a column that is part of a Primary Key, please read the KBA about PK provided at the end of this page.
RC-4: In-Place - Triggers adding records
Negative Row count can be caused by additional records added to the table during the masking operation. Since the row count was collected prior to the job starting, this will result in a negative row count.
The additional records can be a result of Triggers or other processes, such as PreSQL scripts, configured in the masking job or external processes.
Investigation
# RC-4 # Values at the End of Execution +-------------+--------+------------------------+------------------------+--------+ | | DB | UI | Logs | DB | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | IP | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count | = | = | n | -X | = | +X | +X | +X | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
There should be no additional operation running while masking the data. Please check the table for:
- Triggers.
- SQL Scripts.
- External processes (including replications).
RC-5: In-Place / On-The-Fly - Duplicates from SELECT
This is caused by the SELECT statement reading or accessing data twice. In an IP job, this is usually caused by 'Dirty Reads'. In an OTF job, this could be corruption in the SELECT or JDBC and cause additional records (root cause unknown).
Investigation
- IP job will have the same number of records before and after masking but negative rows Remaining.
- OTF job will have X additional number of records after masking. The same number will show up as negative Remaining.
$ RC-5 # Values at the End of Execution +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | DB | UI | Logs | DB | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count (IP) | = | = | n | -X | = | +X | +X | = | | Table count (OTF) | = | = | n | -X | = | +X | +X | +X | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution - IP
The root cause is likely to be the Isolation Level (Dirty Reads). The SELECT statement read some records twice due to read-consistency issues on the database.
As the issue is not understood, please raise a support case when encountering this issue.
Solutions to try:
- Set the Isolation Level to Read Committed.
- Ensure the table is ordered:
- Create a Clustered Index.
- Optimize the table (Reorder).
- Use On-The-Fly (OTF).
Resolution - OTF
The issue here is not yet known. One possible solution could be to be cast the input value(s). The solution will depend on the algorithm used - for Secure Lookup, casting the value to HEX (using BINTOHEX) might resolve the issue.
Otherwise, please raise a support ticket with Delphix Support.
RC-6: In-Place / On-The-Fly - Duplicates from Masking
There could be an issue with an algorithm (or other) that causes additional records to be created.
Investigation
The issue here is that the masking engine generates additional records. This can be seen by looking at the logs and if only the output has additional records this is the issue.
# RC-6 # Values at the End of Execution +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | DB | UI | Logs | DB | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count (IP) | = | = | n | -X | = | = | +X | = | | Table count (OTF) | = | = | n | -X | = | = | +X | +X | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
- If Custom Algorithms, please contact the author of the algorithm.
- If Out-of-the-Box, please raise a support ticket with Delphix Support.
RC-7: On-The-Fly - Target same as Source
This is very similar to RC-4. This is also a configuration issue. The Target Connector has been set to the same as the Source Connector.
This will result in masked data inserted into the source table adding to the number of records to mask.
- If the number of records is small enough to fit buffers - the output will be twice the size (x+x).
- If the number of records is larger than the buffer size - the 'SELECT' will start fetching records from the output - size increase unknown.
Investigation
- 'x' indicates the number of records on a small table.
- 'X' indicates the number of records on a large table.
- 'y' is a factor related to how many times the input is fetching the output before the job finishes.
# RC-7 # Values at the End of Execution +------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | DB | UI | Logs | DB | +------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count (s) | x | x | n | 0 | x | x | x | x+x | | Table count (l) | X | X | n | X-y*X | X | y*X | y*X | y*X+X | +------------------+--------+------+-----+-----------+-------+-------+--------+--------+ # s = Small table (few records) # l = Large table (a large number of records)
Resolution
This issue is due to the Target being set to the Source.
- Change the Target and Source connectors to connect to the correct databases/instances.
- Job Configuration - set Batch Mode to on (performance) and usually Truncate should be on.
Investigating the Logs
The logs are available from the UI (Home > Admin > Logs) or can be accessed using the API.
To get the key details from the logs:
- Change all '1234' to the execution ID for the job execution.
Linux
grep -E "JOB_ID_.*1234|count.*execution .1234." info.log | grep -E "count|Update.* proc|Insert.*proc|Table.* proc"
Windows PowerShell
Select-String "JOB_ID_.*1234|count.*execution .1234." info.log | Select-String "count|Insert.*proc|Update.* proc|Table.* proc"
Related Articles
The following articles may provide more information or related information to this article: