Skip to main content
Delphix

Troubleshooting Masking Performance In-Place Generic DB (KBA6409)

 

 

KBA

KBA# 6409

 

At a Glance

Summary: This KBA describes step-by-step troubleshooting procedures for In-Place Masking jobs on generic database. 

Special steps are available for Oracle and MS SQL Server (see links below).

First: Before investigating performance, ensure Triggers are dropped (or disabled) and Constraints and Indexes are Dropped on masked columns. 
DIY Investigation: Masking performance greatly depends on the database (DB) and the network. This includes how the data is stored and accessed.  

These are the initial steps to verify performance:
  1. Measure latency using ping.
  2. On the DB, execute a test and measure the time it took:
    • SELECT COUNT(*) FROM [slow_table]
    • UPDATE [slow_table] SET [maskcol]=[maskcol] WHERE [keycol] = [value]
  3. Execution Plans for the above statements (SELECT and UPDATE).
  4. Check if the UPDATE triggers additional operations on the database through Triggers, Constraints, or Indexes.
  5. Check that the UPDATE or SELECT statement on the DB is not blocked (this includes logs and journals).
Support Investigation: Steps:
  1. Upload the details collected above.
  2. Temporarily set:
    • Feedback Size = Commit Size = 10,000.
    • Row Limit = 200,000.

      This is for investigation only. Reset these values after bundle upload.
       
  3. Upload a Support Bundle. 
  4. Provide the Masking Job ID and Execution ID.
More info: More info on how to access logs: 

Casting using SQL:

Performance recommendations:

Troubleshoot Performance in Oracle and MS SQL Server:

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Date Release
Aug 24, 2023 14.0.0.0
Jul 24, 2023 13.0.0.0
Jun 21, 2023 12.0.0.0
May 25, 2023 11.0.0.0
Apr 13, 2023 10.0.0.0 | 10.0.0.1
Mar 13, 2023 | Mar 20, 2023 9.0.0.0 | 9.0.0.1
Feb 13, 2023 8.0.0.0
Jan 12, 2023 7.0.0.0
Releases Prior to 2023
Major Release All Sub Releases
6.0

6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.8.1, 6.0.9.0, 6.0.10.0, 6.0.10.1, 6.0.11.0, 6.0.12.0, 6.0.12.1, 6.0.13.0, 6.0.13.1, 6.0.14.0, 6.0.15.0, 6.0.16.0, 6.0.17.0, 6.0.17.1, 6.0.17.2

5.3

5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0, 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

5.2

5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1

5.1

5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0

5.0

5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1, 5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

DB Masking Performance  

This article looks at how to troubleshoot masking performance. There are specific Oracle and MS SQL Server articles which go into more detail for those databases. See Related Articles.

It will only look at the rpm (rows per minute) of the actual masking operation (In-Place). It will not include the runtime of Disabling Constraints, Dropping Indexes, or Executing Pre/Post SQL Scripts. 

The steps in this guide can be performed with available tools and will help to determine and answer what operation may be causing the bottleneck:

  • Reading the data (SELECT)
  • Masking operation (data size, algorithms, or the number of columns)
  • Updating the data (UPDATE)
  • The database is blocked (SELECT, UPDATE, logs, or journaling) 

 

Note

Note:

Often the performance bottleneck is the UPDATE step.

 

Performance factors

Some key factors to slow performance are:

Database Factors:  - Database Write Performance
- Database Read Performance
- Execution Plan and Indexes (and Triggers)
- The number of records per block/page
- DB fragmentation
Note:  A frequent cause of slow performance stems from the number of blocks/pages accessed and updated. This can be due to execution plans, what data and how it is stored, and fragmentation. It can also be caused by Triggers and Indexes.
Network Factors: - Latency
- Throughput
Note: This is rarely an issue as most masking engines are in the same Data Centre as the Database.
Masking Operation Factors: - Number of algorithms
- Data type (for example, Large Text Object)
- Algorithm type
Note: The new algorithms (Alg v2) are highly tuned and can mask at a rate of many millions of rows per minute.

 

The diagram below shows where the delays are accumulating to cause slow performance. 

KBA6409 - Generic - Timing Events v2.png

The diagram excludes the throughput and the size of the data as this is normally not a performance-limiting factor. The time tc (time to count) is only included for reference.

  • COUNT: tc = tlc + tdbc + tother 
  • MASKING: tm
  • SELECT: ts = tls + tdbs + tother
  • UPDATE: tu = tlu + tdbu + tother

tother = Java, JDBC. 

Masking performance in rpm will be dictated by compounding and capping factors since some of the processing is done in parallel. 

To improve masking performance and rpm, the bottleneck needs to be found and resolved. 

Masking Performance at different stages

The performance will differ at different stages of the masking job. Unique stages are at play when the masking job starts, and this makes it easy to measure them. Measuring these stages can help determine where the bottleneck is located.

 

Note

Note:

The performance can also be measured from the Finished log entries. However, this is usually not practical as the job has to finish (successfully) in order to collect these timing values.

 

An In-Place masking job execution has the following characteristics.

SELECT

  • The SELECT reads the dataset to be masked.
  • The first 10,000 to 20,000 records are normally read fast as the records are read into the masking engine buffers.
  • This is usually visible as a performance peak at the start of the job and shows the performance of the SELECT statement.

Masking Transformation  

  • The SELECT is usually faster than the masking operation.
  • How fast depends on how many algorithms are used and the masked data.

UPDATE

  • The last step is UPDATE.
  • This is usually the performance bottleneck. 
  • The likely causes are the number of blocks/pages accessed and updated and the write performance of the database. 

 

Generic Moving v1 (fast input w gauge).gif

Performance Graph 

By temporarily setting the Feedback Size = Commit Size to 10,000 and Row Limit = 200,000, the execution can be illustrated in a graph. This graph shows the performance as the buffers fill up and can show where the bottlenecks are located.

An example of this can be seen below, which shows the job in the illustration above. The graph shows the relative performance compared with the RPM of the overall masking job.

The characteristics in this example are: 

  • The SELECT performance is initially the max performance that can be fetched from the database. 
  • As soon as the records start to appear and are masked they will be sent to the UPDATE step.
  • The UPDATE performance is seen as soon as records are available to be updated.
  • The SELECT will be at max speed until the initial buffers are full (around 20,000 - 30,0000 rows).
  • The performance is then (in this case) capped to the performance of the masking operation.
  • When all (relevant) buffers are full the SELECT step will be forced to the same performance as the UPDATE step.

Masking Relative Performance - SELECT Mask UPDATE.png

Examples and Actions  

No masking job is the same and the characteristics will differ. With that said, the bottleneck can be identified in one of four different groups based on the bottleneck. 

These are:

Update 
KBA6409 - Performance Graph - Update.png The SELECT quickly peaks at max performance, then drops by masking operation (duration depends on Row Limit), and then capped by the Update performance. 

Database Actions:

  • Investigate all details around UPDATE.
  • Check network and storage latencies.
  • Investigate Triggers, Indexes, and Constraints. 
Signature in the logs:
ThroughputLogger:interval 42.43 MB (1204439 rows) counted in 30.000 sec
... at the start fast and then continues slow (here 40 Krpm) ...
ThroughputLogger:interval 0.53 MB (20012 rows) counted in 30.000 sec ...
ThroughputLogger:interval 0.51 MB (19285 rows) counted in 30.000 sec ...
ThroughputLogger:interval 0.52 MB (19663 rows) counted in 30.000 sec ...
... continues ...

 
Hung Job (DB lock)
KBA6409 - Performance Graph - Hung.png This is technically not a Performance issue - but frequently reported as such since the RPM is dropping (no records masked and the time duration increases).

Database Actions:
  • Check for locks on SELECT and/or UPDATE.
  • Check Transaction Logs/Journaling and free space.
Signature in the logs:
...
ThroughputLogger:interval 0.00 MB (0 rows) counted in 30.000 sec ...
ThroughputLogger:interval 0.00 MB (0 rows) counted in 30.000 sec ...
ThroughputLogger:interval 0.00 MB (0 rows) counted in 30.000 sec ...
ThroughputLogger:interval 0.00 MB (0 rows) counted in 30.000 sec ...
... continues ...
Masking operation  
KBA6409 - Performance Graph - Masking.png The SELECT quickly peaks at max performance and then is capped by the masking operation.

This is rare. If there are a moderate number of OOTB (Out of the Box) algorithms (say 10 per table), this is likely not the issue. Algorithms with RegEx or the old version of the Segment Mapping algorithm (and 5 were used per table) could hit this.

Actions:
  • Investigate the number of algorithms, data type, and data size.
  • Check the load on the masking engine.
Signature in the logs (very similar to Update):
ThroughputLogger:interval 42.43 MB (1204439 rows) counted in 30.000 sec
... at the start fast and then continues slow (here 40 Krpm) ...
ThroughputLogger:interval 0.53 MB (20012 rows) counted in 30.000 sec ...
ThroughputLogger:interval 0.51 MB (19285 rows) counted in 30.000 sec ...
ThroughputLogger:interval 0.52 MB (19663 rows) counted in 30.000 sec ...
... continues ...
Select  
KBA6409 - Performance Graph - SELECT.png This is a rare case. If SELECT is slow, then the performance will not be faster than the Input Step can read data and hence capped on SELECT.

Actions:
  • Investigate the Execution Plan for SELECT 
  • Check network latency.
Signature in the logs:
ThroughputLogger:interval 0.53 MB (20012 rows) counted in 30.000 sec ...
ThroughputLogger:interval 0.51 MB (19285 rows) counted in 30.000 sec ...
ThroughputLogger:interval 0.52 MB (19663 rows) counted in 30.000 sec ...
...  starts slow and continues slow (here 40 Krpm) ...

Data Collection

To find the bottleneck, you need to look at stats from each of these areas. There is no single report or query that will give the answer here. To find the root cause, you need all the data.

Some points to remember:  

  • On the database, make sure to flush buffers and the cache before testing. 
  • To test masking performance, try masking using a fast algorithm (Character Mapping (CM) on Null SL). 
  • Running the job with Stream = 1 will help to isolate the table and the issue.

What to collect  

The following information needs to be collected: 

  • Masking Rule Set and Inventory.
    • For example: how many masked columns, algorithms, and column size.
  • Database performance.
    • The number of rows in the table.
    • How much time does it take to run SELECT COUNT(*), SELECT, and UPDATE statements on the database? 
    • Any execution plan.
  • Testing the masking performance.
    • Temporarily set Feedback Size = Commit Size to 10,000 and Row Limit to 200,000.
    • Test run the masking job.
    • Remember to reset the Feedback Size to the original value (10,000 is too small for large jobs and will cause the logs to grow).

Database performance tests  

Network performance   

Ping the Masking Engine and collect latency data. Run this a few times to get a good average value for the latency.

ping Your_Masking_Engine

SELECT COUNT Performance 

This will show how fast the database can count the records in the table. This is usually the max performance as there is no network latency. 

Actions

  • Change TABLE to the table to test.
  • Take note of the number of records.
  • Take note of the duration to complete this statement (in seconds). 
SELECT COUNT(*) AS COUNT_TEST FROM [TABLE]

Database performance estimates 

With the number of records and the duration, the select read performance of all records can be calculated: 

  • Count Rows per Minute: RowCount / Duration_Secs * 60

Question

Is the SELECT COUNT performance the same (or close to) the Masking Job Performance?

  • Yes - On the database, investigate the root cause as to why reading data in the SELECT COUNT is slow.
  • No - keep reading.

SELECT Performance 

The SELECT is here encapsulated in a SELECT COUNT to reduce the output size. The duration is expected to be the same as above. 

Actions

  • Change TABLE to the table to test.
  • Take note of the number of records.
  • Take note of the duration to complete this statement (in seconds). 
SELECT count(*) AS SELECT_TEST FROM (select * from [TABLE]) as tst

Database performance estimates 

With the metadata above, the select fetch performance can be calculated (note that this is the performance on the database and excludes network latency and java processing times): 

  • Select Rows per Minute: RowCount / Duration_Secs * 60

UPDATE Performance 

The UPDATE performance is usually slower and a likely bottleneck. This will measure how long time it will take to write the same data as read (out = in). 

Actions:

  • Change TABLE to the table to test.
  • Change msk1, msk2,... to the masked columns (use as many as being masked).
  • Take note of the number of records.
  • Take note of the duration to complete this statement (in seconds). 
UPDATE [TABLE] SET msk1 = msk1, msk2 = msk2, ...

Database performance estimates 

With the number of updated records and the duration, the UPDATE performance can be calculated: 

  • Update Rows per Minute: RowCount / Duration_Secs * 60

Question

Is the SELECT COUNT performance the same (or close to) the Masking Job Performance?

  • Yes -on the database, investigate as to why updating the table is slow.
  • No - open a support case.

Table Metadata 

Some data about the table is also useful. 

Actions

  • Take note of the table description.
  • Take note of any triggers and indexes on the table.
  • If possible - collect the number of pages/blocks required to store this table on the database. 

Example Output

The example below is from a small table called 'MyTable'. The SQL to collect this will differ from database to database. 

+-----------+----------+------------+-----------+-------------+
| TableName | RowCount | TotalPages | UsedPages | RowsPerPage |
+-----------+----------+------------+-----------+-------------+
| MyTable   |  509,223 |      1,937 |     1,933 |         263 |
+-----------+----------+------------+-----------+-------------+

Example calculations 

Crunching the numbers above you get the following result.

  • The UPDATE is 7.8 times slower than SELECT (650/83).
  • The UPDATE performance will cap the performance of the masking job.
+---------------+--------------+----------+----------+---------+
|               |              |    COUNT |   SELECT |  UPDATE |
+---------------+--------------+----------+----------+---------+
|               |     Metadata |    47 ms |    47 ms |  360 ms |
+---------------+--------------+----------+----------+---------+
| Rows per Min  | 509,223 rows | 650 Mrpm | 650 Mrpm | 83 Mrpm |
+---------------+--------------+----------+----------+---------+

Collect data for Support Investigation 

When opening a Masking Performance Support Case - please include: 

  1. Table description (with Indexes and Triggers).
  2. SELECT and UPDATE statistics collected from the database. 
  3. Run a Masking Job with Feedback Size = Commit Size = 10,000 and Row Limit = 200,000.
  4. Upload a Support Bundle. 
  5. Detail the Masking Job ID, Execution ID, and the table to investigate. 

Related Articles

The following articles may provide more information or related information to this article: