Skip to main content
Delphix

Masking Job Monitor and Negative Row Count (KBA6686)

 

KBA

KBA# 6686

 

At a Glance

Description: This article details possible root causes for negative row counts in the Job Monitor for database masking jobs. 
Root causes: The root causes are likely one of the following: 
  1. The row count did not return in time (or at all).
  2. Incorrect configuration of masking job (OTF). 
  3. Masking column which is part of the Primary Key. 
  4. The SELECT has corrupted the data and duplicated it.
  5. The Masking operation duplicated data.
How to identify: To identify the Root Cause, look at these values: 
  • UI - check for Row Count, RPM, and Renaming Rows for the table. 
  • Table - check SELECT COUNT(*) before and after masking (including WHERE if used).
  • Logs - check the Finished Processing counts for the job (Table Input and Update/Output).

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases
6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2. 6.0.5.0

5.3

5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

5.2

5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1

5.1

5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0

5.0

5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1 ,5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

4.3

4.3.1.0, 4.3.2.0, 4.3.2.1, 4.3.3.0, 4.3.4.0, 4.3.4.1, 4.3.5.0

4.2

4.2.0.0, 4.2.0.3, 4.2.1.0, 4.2.1.1, 4.2.2.0, 4.2.2.1, 4.2.3.0, 4.2.4.0 , 4.2.5.0, 4.2.5.1

4.1

4.1.0.0, 4.1.2.0, 4.1.3.0, 4.1.3.1, 4.1.3.2, 4.1.4.0, 4.1.5.0, 4.1.6.0

Negative Row Count

Negative Row Count is referring to when the Remaining row number is negative. 

This occasionally happens on the engine and most of the time this is a non-critical issue. There are times when this indicates there is a critical issue at play, though.

This KBA has been created in order to assist in investigating this issue. 

Troubleshooting 

What is needed:

  • From the UI:
    • The Row Count.
    • The Renaming rows (negative number).
    • The Rows per Minute.
  • From Database (or file):
    • The total number of rows to be masked before masking.
    • The total number of rows to be masked after masking.
  • From the Logs
    • Finished processing counts from:
      • Table Input
      • On-The-Fly: Table Output or DelphixTableInsert
      • In-Place: Update or DelphixTableUpdate

Root Causes

There are a few reasons for a negative row count. As seen below, some do not affect the masking job while others are a sign of a critical issue with the masking job. 

Note

Note:

There may be multiple issues at play at the same time, so look at the row numbers and see if they match a given scenario.

 

Not Critical

RS 1: Row count 'Too Late'

When the masking job is initiated, a separate thread is created to count the number of records in each table. This is accomplished by sending queries with 'SELECT COUNT(*)' to the database (the source database using OTF).

Depending on the data, how the data is stored, and WHERE clauses, the row count might take longer than actually masking the table. This will result with the row count being negative.

 

The time each count was returned is logged in the Masking Engine logs. If this entry is not in the log, then the count never arrived.

2020-10-24 12:16:47,799 ... After calculating count. Total count is :: 1048576
Investigation
  • Rows in the UI might show '-1' or '0' depending on the Masking Engine version. This indicates that the Row Count has not yet been calculated. 
# Values at the End of Execution
+-------------+--------+------------------------+------------------------+--------+
|             |   DB   |           UI           |          Logs          |   DB   |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| IP + OTF    | Source | Rows | rpm | Remaining | Count | Input | Output | Target |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| Table count |      = |    0 |   0 |        -X |     = |     = |      = |      = |
| Table count |      = |   -1 |   0 |        -X |     = |     = |      = |      = |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution

This is caused by the count query taking a long time to count the rows in the table. If the records in the table is a 'scattered' heap table, then the count can take a long time. Also, check for deadlocks in the database logs.

One solution here can be to optimize the table (reinsert the data or adding a Clustered Index).

RS 2: Row count Deadlocked

One reason for the row count arriving too late is that the SELECT COUNT(*) query deadlocked and was terminated.

There are no errors in the logs when this happens but the event should be recorded in the database logs. 

If the following log entry is NOT in the logs for the table then check the database logs. 

After calculating count
Investigation
  • The same as with RS 1. The UI might show '-1' or '0 depending on the version. This indicates that the Row Count did not arrive.
# Values at the End of Execution
+-------------+--------+------------------------+------------------------+--------+
|             |   DB   |           UI           |          Logs          |   DB   |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| IP + OTF    | Source | Rows | rpm | Remaining | Count | Input | Output | Target |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| Table count |      = |    0 |   0 |        -X |     = |     = |      = |      = |
| Table count |      = |   -1 |   0 |        -X |     = |     = |      = |      = |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution

The SELECT COUNT query can deadlock if a required object is also accessed by another query. This can be because the data in the table is a 'scattered' heap table and rows are read in a non-optimal way.

  1. Check the database (error) logs for deadlock.
  2. If this needs to be resolved - the solution could be to:
    • optimize the table - using Shrink or Rebuild database commands.
    • add a Clustered Index.
Note

Note:

A fix for this is being investigated and might be available in a future release.

Critical 

RS 3: In-Place - Masking Primary Key

Masking a column that is part of the Primary Key when it is also used as the URI (Unique Row Identifier) will cause many strange effects (this includes duplicated records, deadlocks, and additional records added). 

Additional records have been noted on DB2 databases.

Investigation
# Values at the End of Execution
+-------------+--------+------------------------+------------------------+--------+
|             |   DB   |           UI           |          Logs          |   DB   |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| IP          | Source | Rows | rpm | Remaining | Count | Input | Output | Target |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| Table count |      = |    = |   n |        -X |     = |    +X |     +X |     +X |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution

To mask a column that is part of a Primary Key, please read the KBA about PK provided at the end of this page.

RS 4: In-Place - Triggers adding records

Negative Row Count can be caused by additional records added to the table during the masking operation. Since the row count was collected prior to the job starting, this will result in a negative row count.

The additional records can be a result of Triggers or other processes, such as PreSQL scripts, configured in the masking job or external processes. 

Investigation
# Values at the End of Execution
+-------------+--------+------------------------+------------------------+--------+
|             |   DB   |           UI           |          Logs          |   DB   |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| IP          | Source | Rows | rpm | Remaining | Count | Input | Output | Target |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| Table count |      = |    = |   n |        -X |     = |    +X |     +X |     +X |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution

There should be no additional operation running while masking the data. Please check the table for:

  • Triggers
  • SQL Scripts
  • External processes (including replications)

RS 5: In-Place / On-The-Fly - Duplicates from SELECT

The value stored in a field could cause the SELECT or the JDBC to be corrupt and cause additional records. This could be a sign of a bug or that the character encoding has been incorrectly configured.  

Investigation
# Values at the End of Execution
+-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
|                   |   DB   |           UI           |          Logs          |   DB   |
+-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
|                   | Source | Rows | rpm | Remaining | Count | Input | Output | Target |
+-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
| Table count (IP)  |      = |    = |   n |        -X |     = |    +X |     +X |      = |
| Table count (OTF) |      = |    = |   n |        -X |     = |    +X |     +X |     +X |
+-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution

One solution here is that the value in the field can be cast. The solution will depend on the algorithm used - for Secure Lookup, casting the value to HEX (using BINTOHEX) might resolve the issue.

Otherwise, please raise a support ticket with Delphix Support.

RS 6: In-Place / On-The-Fly - Duplicates from Masking

There could be an issue with an algorithm (or other) that causes additional records to be created. 

This is a bug and if the algorithm causing the issue is a Custom Algorithm, please contact the author of the algorithm. 

Investigation
# Values at the End of Execution
+-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
|                   |   DB   |           UI           |          Logs          |   DB   |
+-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
|                   | Source | Rows | rpm | Remaining | Count | Input | Output | Target |
+-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
| Table count (IP)  |      = |    = |   n |        -X |     = |     = |     +X |      = |
| Table count (OTF) |      = |    = |   n |        -X |     = |     = |     +X |     +X |
+-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
  • If Custom Algorithms, please contact the author of the algorithm. 
  • If Out-of-the-Box, please raise a support ticket with Delphix Support.

RS 7: On-The-Fly - Target same as Source

This is very similar to RS 4. This is also a configuration issue. The Target Connector has been set to the same as the Source Connector. 

This will result in masked data inserted into the source table adding to the number of records to mask. 

  • If the number of records is small enough to fit buffers - the output will be twice the size (x+x).
  • If the number of records is larger than the buffer size - the 'SELECT' will start fetching records from the output - size increase unknown.
Investigation
  • The size of the table is indicated with 's' (small) and 'l' (large).
  • 'x' indicates the number of records on a small table.
  • 'X' indicates the number of records on a large table.
  • 'y' is a factor related to how many times the input is fetching the output before the job finishes. 
# Values at the End of Execution
+------------------+--------+------+-----+-----------+-------+-------+--------+--------+
|                  |   DB   |           UI           |          Logs          |   DB   |
+------------------+--------+------+-----+-----------+-------+-------+--------+--------+
|                  | Source | Rows | rpm | Remaining | Count | Input | Output | Target |
+------------------+--------+------+-----+-----------+-------+-------+--------+--------+
| Table count (s)  |      x |    x |   n |         0 |     x |     x |      x |    x+x |
| Table count (l)  |      X |    X |   n |     X-y*X |     X |   y*X |    y*X |  y*X+X |
+------------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution

This issue is due to the Target being set to the Source. 

  1. Change the Target and Source connectors to connect to the correct databases/instances.
  2. Job Configuration - set Batch Mode to on (performance) and usually Truncate should be on. 

 

 

Related Articles

The following articles may provide more information or related information to this article: