Skip to main content
Delphix

Masking Job Monitor, Negative Row Count, Calculating (KBA6686)

 

 

KBA

KBA# 6686

 

At a Glance

Description: This article details possible root causes for negative row counts in the Job Monitor for database masking jobs. 
Root causes: The root causes are likely one of the following: 
  1. The Row Count did not return in time (or at all).
  2. Incorrect job and connector configuration on On-The-Fly (OTF) jobs. 
  3. Masking column which is part of the Primary Key (or Clustered Index). 
  4. The SELECT has 'phantom reads' and duplicated rows.
  5. The masking operation duplicated data.
  6. The masking engine is not registering row counts (see Calculating Rows... below).
Calculating Rows: The Monitor page shows 'Calculating...' when the row count is unknown.

This can be due to:
 1. it has not been calculated yet by the database, or
 2. it has not been registered by the engine (tracked in DLPX-72404).

See C1: No or delayed Row Count below.
How to identify: To identify the root cause, look at these values: 
  • UI - check for Row Count, RPM, and Renaming Rows for the table. 
  • Table - check SELECT COUNT(*) before and after masking (including WHERE if used).
  • Logs - check the Finished Processing counts for the job (Table Input and Update/Output).
Investigating logs: Use grep or select-string with the following filter (more detail in Investigating the Logs below):
"count|Insert.*proc|Update.* proc|Table.* proc"
 
More info: More info on how to access logs: 

For masking columns with PK (applies to Clustered Indexes as well):

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Date Release
Feb 13, 2023 8.0.0.0
Jan 12, 2023 7.0.0.0
Releases Prior to 2023
Major Release All Sub Releases
6.0

6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2. 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.9.0, 6.0.10.0, 6.0.11.0, 6.0.12.0, 6.0.12.1, 6.0.13.0, 6.0.13.1, 6.0.14.0, 6.0.15.0, 6.0.16.0, 6.0.17.0, 6.0.17.1, 6.0.17.2

5.3

5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

5.2

5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1

5.1

5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0

5.0

5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1 ,5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

4.3

4.3.1.0, 4.3.2.0, 4.3.2.1, 4.3.3.0, 4.3.4.0, 4.3.4.1, 4.3.5.0

4.2

4.2.0.0, 4.2.0.3, 4.2.1.0, 4.2.1.1, 4.2.2.0, 4.2.2.1, 4.2.3.0, 4.2.4.0 , 4.2.5.0, 4.2.5.1

4.1

4.1.0.0, 4.1.2.0, 4.1.3.0, 4.1.3.1, 4.1.3.2, 4.1.4.0, 4.1.5.0, 4.1.6.0

Negative Row Count

Negative Row Count is referring to when the Remaining row number is negative. 

Most of the time this is a non-critical issue. Though, there are times when this does indicate a critical issue.

This article has been created to assist in investigating this issue. 

Troubleshooting 

What is needed:

  • From the UI:
    • The Row Count
    • The Remaining rows (negative number)
    • The Rows per Minute
  • From Database (or File):
    • The total number of rows to be masked before masking
    • The total number of rows to be masked after masking
  • From the Logs: 
    • Finished processing counts from:
      • Table Input
      • On-The-Fly: Table Output or DelphixTableInsert
      • In-Place: Update or DelphixTableUpdate

Causes

There are a few reasons for a negative row count. As seen below, some do not affect the masking job while others are a sign of a critical issue with the masking job. 

 

Note

Note:

Multiple issues can be occurring at the same time, review the row numbers and see if they match a given scenario.

 

Not Critical

C1: No or delayed Row Count

When the masking job is initiated, a separate thread is created to count the number of records in each table. This is accomplished by sending queries with 'SELECT COUNT(*)' to the database (the source database using OTF).

Depending on the data, how the data is stored, and WHERE clauses, the row count might take longer than actually masking the table. This will result in the row count being negative.

The time each count was returned is logged in the Masking Engine logs. If this entry is not in the log, then the count never arrived.

2022-10-24 12:16:47,799 ... After calculating count. Total count is :: 12345678
Investigation
  • Rows in the UI might show -1 or 0 depending on the Masking Engine version. This indicates that the Row Count has not yet been calculated. 
# Cause 1
# Values at the End of Execution
+-------------+--------+------------------------+------------------------+--------+
|             |   DB   |           UI           |          Logs          |   DB   |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| IP + OTF    | Source | Rows | rpm | Remaining | Count | Input | Output | Target |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| Table count |      = |    0 |   0 |        -X |     = |     = |      = |      = |
| Table count |      = |   -1 |   0 |        -X |     = |     = |      = |      = |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution

This is caused by the count query taking a long time to count the rows in the table. If the table is a 'scattered' heap table, then the count can be significantly delayed. Also, check for deadlocks in the database logs.

One solution would be to optimize the table (re-order the data or add a Clustered Index).

C2: Row Count Deadlocked

One reason for the row count arriving too late is that the SELECT COUNT(*) query was deadlocked and was terminated.

There are no errors in the logs when this happens but the event should be recorded in the database logs. 

If the following log entry is NOT in the logs for the table then check the database logs. 

## This log line is missing 
2022-... After calculating count. Total count is ::
Investigation
  • The same as with RC-1. The UI might show -1 or 0 depending on the version. This indicates that the Row Count did not arrive.
# Cause 2
# Values at the End of Execution
+-------------+--------+------------------------+------------------------+--------+
|             |   DB   |           UI           |          Logs          |   DB   |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| IP + OTF    | Source | Rows | rpm | Remaining | Count | Input | Output | Target |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| Table count |      = |    0 |   0 |        -X |     = |     = |      = |      = |
| Table count |      = |   -1 |   0 |        -X |     = |     = |      = |      = |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution

The SELECT COUNT query can deadlock if a required object is also accessed by another query. This can be because the data in the table is a 'scattered' heap table and rows are read in a non-optimal way.

  1. Check the database (error) logs for deadlock.
  2. If this needs to be resolved - the solution could be to:
    • Optimize the table - using Shrink or Rebuild database commands.
    • Re-order the table.
    • Add a Clustered Index.
Note

Note:

A fix for this is being investigated and may be available in a future release.

Critical 

These issues are seen as critical as the masking operation has been affected and the data is no longer consistent. 

C3: IP - Masking Primary/Logical Key

Masking a column that is part of the Primary Key or Logical Key when it is also used as the URI (Unique Row Identifier) will cause many strange issues (this includes duplicated records, deadlocks, and additional records added). 

Additional records have been observed on DB2 databases.

Investigation
# Cause 3
# Values at the End of Execution
+-------------+--------+------------------------+------------------------+--------+
|             |   DB   |           UI           |          Logs          |   DB   |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| IP          | Source | Rows | rpm | Remaining | Count | Input | Output | Target |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| Table count |      = |    = |   n |        -X |     = |    +X |     +X |     +X |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution

To mask a column that is part of a Primary Key, please read KBA1776 about PK masking (link provided in the Related Articles section).

C4: IP - Masking Clustered Index

Masking a column that is part of a Clustered Index (this can happen even if the index is dropped) can in some databases cause 'phantom reads' due to masked data changing the order of the records in the table. 

This has been observed on Sybase. 

Investigation
# Cause 4
# Values at the End of Execution
+-------------+--------+------------------------+------------------------+--------+
|             |   DB   |           UI           |          Logs          |   DB   |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| IP          | Source | Rows | rpm | Remaining | Count | Input | Output | Target |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| Table count |      = |    = |   n |        -X |     = |    +X |     +X |      = |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution

Drop the Clustered Index (as it could be part of the primary key, please read KBA1776 about PK masking (link provided in the Related Articles section)). It might be necessary to create a new Clustered Index and Logical Key.

C5: IP - Triggers adding records

Negative Row count can be caused by additional records added to the table during the masking operation. Since the row count was collected prior to the job starting, this will result in a negative row count.

The additional records can be a result of Triggers or other processes, such as PreSQL scripts, configured in the masking job or external processes. 

Investigation
# Cause 5
# Values at the End of Execution
+-------------+--------+------------------------+------------------------+--------+
|             |   DB   |           UI           |          Logs          |   DB   |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| IP          | Source | Rows | rpm | Remaining | Count | Input | Output | Target |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
| Table count |      = |    = |   n |        -X |     = |    +X |     +X |     +X |
+-------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution

There should be no additional operation running while masking the data. Please check the table for:

  • Triggers.
  • SQL Scripts.
  • External processes (including replications).

C6: IP/OTF - Duplicates from SELECT

This is caused by the SELECT statement reading or accessing data twice. 

  • In an In-Place (IP) job, this is usually caused by 'Phantom Reads' (sometimes also referred to as 'Dirty Reads').
  • In an On-The-FLy (OTF) job, this could be corruption in the SELECT or JDBC and cause additional records (root cause unknown).
Investigation
  • IP job will have the same number of records before and after masking but negative rows Remaining
  • OTF job will have X additional number of records after masking. The same number will show up as negative Remaining
$ Cause 6
# Values at the End of Execution
+-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
|                   |   DB   |           UI           |          Logs          |   DB   |
+-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
|                   | Source | Rows | rpm | Remaining | Count | Input | Output | Target |
+-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
| Table count (IP)  |      = |    = |   n |        -X |     = |    +X |     +X |      = |
| Table count (OTF) |      = |    = |   n |        -X |     = |    +X |     +X |     +X |
+-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution - IP

The root cause is likely to be the Isolation Level (Dirty Reads). The SELECT statement read some records twice due to read-consistency issues on the database. 

As the issue is not understood, please raise a support case when encountering this issue.

Solutions to try:

  1. Set the Isolation Level to Read Committed
  2. Ensure the table is ordered:
    1. Create a new Clustered Index on a URI (any existing Clustered Index needs to be dropped).
    2. Optimize the table (Reorder).
  3. Use On-The-Fly (OTF).

Comment: Do not use nolock on SELECT. 

Resolution - OTF

The issue here is not yet known. One possible solution could be to be cast the input value(s). The solution will depend on the algorithm used - for Secure Lookup, casting the value to HEX (using BINTOHEX) or HASH might resolve the issue.

Otherwise, please raise a support ticket with Delphix Support.

C7: IP/OTF - Duplicates from Masking

There could be an issue with an algorithm (or other) that causes additional records to be created. 

Investigation

The issue here is that the masking engine generates additional records. This can be seen by looking at the logs and if only the output has additional records this is the issue. 

# Cause 7
# Values at the End of Execution
+-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
|                   |   DB   |           UI           |          Logs          |   DB   |
+-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
|                   | Source | Rows | rpm | Remaining | Count | Input | Output | Target |
+-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
| Table count (IP)  |      = |    = |   n |        -X |     = |     = |     +X |      = |
| Table count (OTF) |      = |    = |   n |        -X |     = |     = |     +X |     +X |
+-------------------+--------+------+-----+-----------+-------+-------+--------+--------+
Resolution
  • If you are using Custom algorithms, please contact the author of the algorithm. 
  • If you are using Out-of-the-Box algorithms, please raise a support ticket with Delphix Support.

C8: OTF - Target same as Source

This is very similar to RC-4. This is also a configuration issue. The Target Connector has been set to the same as the Source Connector. 

This will result in masked data inserted into the source table adding to the number of records to mask. 

  • If the number of records is small enough to fit buffers - the output will be twice the size (x+x).
  • If the number of records is larger than the buffer size - the SELECT will start fetching records from the output - size increase unknown.
Investigation
  • x indicates the number of records on a small table.
  • X indicates the number of records on a large table.
  • y is a factor related to how many times the input is fetching the output before the job finishes. 
# Cause 8
# Values at the End of Execution
+------------------+--------+------+-----+-----------+-------+-------+--------+--------+
|                  |   DB   |           UI           |          Logs          |   DB   |
+------------------+--------+------+-----+-----------+-------+-------+--------+--------+
|                  | Source | Rows | rpm | Remaining | Count | Input | Output | Target |
+------------------+--------+------+-----+-----------+-------+-------+--------+--------+
| Table count (s)  |      x |    x |   n |         0 |     x |     x |      x |    x+x |
| Table count (l)  |      X |    X |   n |     X-y*X |     X |   y*X |    y*X |  y*X+X |
+------------------+--------+------+-----+-----------+-------+-------+--------+--------+

# s = Small table (few records)
# l = Large table (a large number of records)
Resolution

This issue is due to the Target being set to the Source. 

  1. Change the Target and Source connectors to connect to the correct databases/instances.
  2. Job Configuration - set Batch Mode to on (performance) and usually Truncate should be on. 

Investigating the Logs

The logs are available from the UI (Home > Admin > Logs) or can be accessed using the API.

To get the key details from the logs: 

  • Change all 1234 to the execution ID for the job execution.

Linux

grep -E "JOB_ID_.*1234|count.*execution .1234." info.log | grep -E "count|Update.* proc|Insert.*proc|Table.* proc"

Windows PowerShell

Select-String "JOB_ID_.*1234|count.*execution .1234." info.log | Select-String "count|Insert.*proc|Update.* proc|Table.* proc"

 

Related Articles

The following articles may provide more information or related information to this article: