Masking Job Monitor and Negative Row Count (KBA6686)
KBA
KBA# 6686
At a Glance
Description: | This article details possible root causes for negative row counts in the Job Monitor for database masking jobs. |
---|---|
Root causes: | The root causes are likely one of the following:
|
How to identify: | To identify the Root Cause, look at these values:
|
Applicable Delphix Versions
- Click here to view the versions of the Delphix engine to which this article applies
-
Major Release All Sub Releases 6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2. 6.0.5.0 5.3
5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0 5.2
5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1
5.1
5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0
5.0
5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1 ,5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4
4.3
4.3.1.0, 4.3.2.0, 4.3.2.1, 4.3.3.0, 4.3.4.0, 4.3.4.1, 4.3.5.0
4.2
4.2.0.0, 4.2.0.3, 4.2.1.0, 4.2.1.1, 4.2.2.0, 4.2.2.1, 4.2.3.0, 4.2.4.0 , 4.2.5.0, 4.2.5.1
4.1
4.1.0.0, 4.1.2.0, 4.1.3.0, 4.1.3.1, 4.1.3.2, 4.1.4.0, 4.1.5.0, 4.1.6.0
Negative Row Count
Negative Row Count is referring to when the Remaining row number is negative.
This occasionally happens on the engine and most of the time this is a non-critical issue. There are times when this indicates there is a critical issue at play, though.
This KBA has been created in order to assist in investigating this issue.
Troubleshooting
What is needed:
- From the UI:
- The Row Count.
- The Renaming rows (negative number).
- The Rows per Minute.
- From Database (or file):
- The total number of rows to be masked before masking.
- The total number of rows to be masked after masking.
- From the Logs:
- Finished processing counts from:
- Table Input
- On-The-Fly: Table Output or DelphixTableInsert
- In-Place: Update or DelphixTableUpdate
- Finished processing counts from:
Root Causes
There are a few reasons for a negative row count. As seen below, some do not affect the masking job while others are a sign of a critical issue with the masking job.
Not Critical
RS 1: Row count 'Too Late'
When the masking job is initiated, a separate thread is created to count the number of records in each table. This is accomplished by sending queries with 'SELECT COUNT(*)' to the database (the source database using OTF).
Depending on the data, how the data is stored, and WHERE clauses, the row count might take longer than actually masking the table. This will result with the row count being negative.
The time each count was returned is logged in the Masking Engine logs. If this entry is not in the log, then the count never arrived.
2020-10-24 12:16:47,799 ... After calculating count. Total count is :: 1048576
Investigation
- Rows in the UI might show '-1' or '0' depending on the Masking Engine version. This indicates that the Row Count has not yet been calculated.
# Values at the End of Execution +-------------+--------+------------------------+------------------------+--------+ | | DB | UI | Logs | DB | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | IP + OTF | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count | = | 0 | 0 | -X | = | = | = | = | | Table count | = | -1 | 0 | -X | = | = | = | = | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
This is caused by the count query taking a long time to count the rows in the table. If the records in the table is a 'scattered' heap table, then the count can take a long time. Also, check for deadlocks in the database logs.
One solution here can be to optimize the table (reinsert the data or adding a Clustered Index).
RS 2: Row count Deadlocked
One reason for the row count arriving too late is that the SELECT COUNT(*) query deadlocked and was terminated.
There are no errors in the logs when this happens but the event should be recorded in the database logs.
If the following log entry is NOT in the logs for the table then check the database logs.
After calculating count
Investigation
- The same as with RS 1. The UI might show '-1' or '0 depending on the version. This indicates that the Row Count did not arrive.
# Values at the End of Execution +-------------+--------+------------------------+------------------------+--------+ | | DB | UI | Logs | DB | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | IP + OTF | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count | = | 0 | 0 | -X | = | = | = | = | | Table count | = | -1 | 0 | -X | = | = | = | = | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
The SELECT COUNT query can deadlock if a required object is also accessed by another query. This can be because the data in the table is a 'scattered' heap table and rows are read in a non-optimal way.
- Check the database (error) logs for deadlock.
- If this needs to be resolved - the solution could be to:
- optimize the table - using Shrink or Rebuild database commands.
- add a Clustered Index.
Critical
RS 3: In-Place - Masking Primary Key
Masking a column that is part of the Primary Key when it is also used as the URI (Unique Row Identifier) will cause many strange effects (this includes duplicated records, deadlocks, and additional records added).
Additional records have been noted on DB2 databases.
Investigation
# Values at the End of Execution +-------------+--------+------------------------+------------------------+--------+ | | DB | UI | Logs | DB | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | IP | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count | = | = | n | -X | = | +X | +X | +X | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
To mask a column that is part of a Primary Key, please read the KBA about PK provided at the end of this page.
RS 4: In-Place - Triggers adding records
Negative Row Count can be caused by additional records added to the table during the masking operation. Since the row count was collected prior to the job starting, this will result in a negative row count.
The additional records can be a result of Triggers or other processes, such as PreSQL scripts, configured in the masking job or external processes.
Investigation
# Values at the End of Execution +-------------+--------+------------------------+------------------------+--------+ | | DB | UI | Logs | DB | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | IP | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count | = | = | n | -X | = | +X | +X | +X | +-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
There should be no additional operation running while masking the data. Please check the table for:
- Triggers
- SQL Scripts
- External processes (including replications)
RS 5: In-Place / On-The-Fly - Duplicates from SELECT
The value stored in a field could cause the SELECT or the JDBC to be corrupt and cause additional records. This could be a sign of a bug or that the character encoding has been incorrectly configured.
Investigation
# Values at the End of Execution +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | DB | UI | Logs | DB | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count (IP) | = | = | n | -X | = | +X | +X | = | | Table count (OTF) | = | = | n | -X | = | +X | +X | +X | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
One solution here is that the value in the field can be cast. The solution will depend on the algorithm used - for Secure Lookup, casting the value to HEX (using BINTOHEX) might resolve the issue.
Otherwise, please raise a support ticket with Delphix Support.
RS 6: In-Place / On-The-Fly - Duplicates from Masking
There could be an issue with an algorithm (or other) that causes additional records to be created.
This is a bug and if the algorithm causing the issue is a Custom Algorithm, please contact the author of the algorithm.
Investigation
# Values at the End of Execution +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | DB | UI | Logs | DB | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count (IP) | = | = | n | -X | = | = | +X | = | | Table count (OTF) | = | = | n | -X | = | = | +X | +X | +-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
- If Custom Algorithms, please contact the author of the algorithm.
- If Out-of-the-Box, please raise a support ticket with Delphix Support.
RS 7: On-The-Fly - Target same as Source
This is very similar to RS 4. This is also a configuration issue. The Target Connector has been set to the same as the Source Connector.
This will result in masked data inserted into the source table adding to the number of records to mask.
- If the number of records is small enough to fit buffers - the output will be twice the size (x+x).
- If the number of records is larger than the buffer size - the 'SELECT' will start fetching records from the output - size increase unknown.
Investigation
- The size of the table is indicated with 's' (small) and 'l' (large).
- 'x' indicates the number of records on a small table.
- 'X' indicates the number of records on a large table.
- 'y' is a factor related to how many times the input is fetching the output before the job finishes.
# Values at the End of Execution +------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | DB | UI | Logs | DB | +------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | | Source | Rows | rpm | Remaining | Count | Input | Output | Target | +------------------+--------+------+-----+-----------+-------+-------+--------+--------+ | Table count (s) | x | x | n | 0 | x | x | x | x+x | | Table count (l) | X | X | n | X-y*X | X | y*X | y*X | y*X+X | +------------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
This issue is due to the Target being set to the Source.
- Change the Target and Source connectors to connect to the correct databases/instances.
- Job Configuration - set Batch Mode to on (performance) and usually Truncate should be on.
Related Articles
The following articles may provide more information or related information to this article: