Troubleshooting Masking Performance In-Place Generic DB (KBA6409)

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

KBA

KBA# 6409

At a Glance

Summary:	This KBA describes step-by-step troubleshooting procedures for In-Place Masking jobs on generic database. Special steps are available for Oracle and MS SQL Server (see links below).
First:	Before investigating performance, ensure Triggers are dropped (or disabled) and Constraints and Indexes are Dropped on masked columns.
DIY Investigation:	Masking performance greatly depends on the database (DB) and the network. This includes how the data is stored and accessed. These are the initial steps to verify performance: Measure latency using ping. On the DB, execute a test and measure the time it took: SELECT COUNT(*) FROM [slow_table] UPDATE [maskcol]=[maskcol] WHERE [keycol] = [value] Execution Plans for the above statements (SELECT and UPDATE). Check if the UPDATE triggers additional operations on the database through Triggers, Constraints, or Indexes. Check that the UPDATE or SELECT statement on the DB is not blocked (this includes logs and journals).
Test Update:	The likely performance issue is the time it takes to UPDATE a record on the database. To verify this you can instruct the DB to not Update any records using the followign trick. Use Custom SQL in the Rule Set and change the PK/LK/ID/ROWID column to NULL "[PK/LK/ID col]". Change back after the test. If this is fast then the bottleneck is on the DB side. Note 1: Important, when used and working as intended, no data will be masked. Note 2: For Oracle see special KBA. Oracle uses ROWID, so this needs to be set to NULL. Example - the PK is here called 'MY_PK': SELECT NULL "MY_PK", [masked columns] FROM [table]
Support Investigation:	Steps: Upload the details collected above. Temporarily set: Feedback Size = Commit Size = 10,000. Row Limit = 200,000. This is for investigation only. Reset these values after bundle upload. Upload a Support Bundle. Provide the Masking Job ID and Execution ID.
More info:	More info on how to access logs: KBA: How to Troubleshoot Masking Jobs and Access Masking Logs (KBA7988) Casting using SQL: KBA: Algorithm: Casting Values Before Masking (KBA1580) KBA: Using the Custom SQL Rule Set Feature (KBA4455) Performance recommendations: KBA: Masking Commit Size, Row Limit and Feedback Size (KBA6387) KBA: Best Practice: Masking Performance (Database) (KBA1561) Troubleshoot Performance in Oracle and MS SQL Server: KBA: Troubleshooting Masking Performance In-Place Oracle (KBA5985) KBA: Troubleshooting Masking Performance In-Place MS SQL Server (KBA6371)

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies

Date	Release
Aug 24, 2023	14.0.0.0
Jul 24, 2023	13.0.0.0
Jun 21, 2023	12.0.0.0
May 25, 2023	11.0.0.0
Apr 13, 2023	10.0.0.0 \| 10.0.0.1
Mar 13, 2023 \| Mar 20, 2023	9.0.0.0 \| 9.0.0.1
Feb 13, 2023	8.0.0.0
Jan 12, 2023	7.0.0.0
Releases Prior to 2023
Major Release	All Sub Releases
6.0	6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.8.1, 6.0.9.0, 6.0.10.0, 6.0.10.1, 6.0.11.0, 6.0.12.0, 6.0.12.1, 6.0.13.0, 6.0.13.1, 6.0.14.0, 6.0.15.0, 6.0.16.0, 6.0.17.0, 6.0.17.1, 6.0.17.2
5.3	5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0, 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0
5.2	5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1
5.1	5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0
5.0	5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1, 5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

DB Masking Performance

This article looks at how to troubleshoot masking performance. There are specific Oracle and MS SQL Server articles which go into more detail for those databases. See Related Articles.

It will only look at the rpm (rows per minute) of the actual masking operation (In-Place). It will not include the runtime of Disabling Constraints, Dropping Indexes, or Executing Pre/Post SQL Scripts.

The steps in this guide can be performed with available tools and will help to determine and answer what operation may be causing the bottleneck:

Reading the data (SELECT)
Masking operation (data size, algorithms, or the number of columns)
Updating the data (UPDATE)
The database is blocked (SELECT, UPDATE, logs, or journaling)

Performance factors

Some key factors to slow performance are:

Database	Factors:	- Database Write Performance - Database Read Performance - Execution Plan and Indexes (and Triggers) - The number of records per block/page - DB fragmentation
Database	Note:	A frequent cause of slow performance stems from the number of blocks/pages accessed and updated. This can be due to execution plans, what data and how it is stored, and fragmentation. It can also be caused by Triggers and Indexes.
Network	Factors:	- Latency - Throughput
Network	Note:	This is rarely an issue as most masking engines are in the same Data Centre as the Database.
Masking Operation	Factors:	- Number of algorithms - Data type (for example, Large Text Object) - Algorithm type
Masking Operation	Note:	The new algorithms (Alg v2) are highly tuned and can mask at a rate of many millions of rows per minute.

The diagram below shows where the delays are accumulating to cause slow performance.

KBA6409 - Generic - Timing Events v2.png

The diagram excludes the throughput and the size of the data as this is normally not a performance-limiting factor. The time t_c (time to count) is only included for reference.

COUNT: t_c = t_lc + t_dbc+ t_other
MASKING: t_m
SELECT: ts = t_ls + t_dbs+ t_other
UPDATE: tu = t_lu + t_dbu+ t_other

t_other= Java, JDBC.

Masking performance in rpm will be dictated by compounding and capping factors since some of the processing is done in parallel.

To improve masking performance and rpm, the bottleneck needs to be found and resolved.

Masking Performance at different stages

The performance will differ at different stages of the masking job. Unique stages are at play when the masking job starts, and this makes it easy to measure them. Measuring these stages can help determine where the bottleneck is located.

An In-Place masking job execution has the following characteristics.

SELECT

The SELECT reads the dataset to be masked.
The first 10,000 to 20,000 records are normally read fast as the records are read into the masking engine buffers.
This is usually visible as a performance peak at the start of the job and shows the performance of the SELECT statement.

Masking Transformation

The SELECT is usually faster than the masking operation.
How fast depends on how many algorithms are used and the masked data.

UPDATE

The last step is UPDATE.
This is usually the performance bottleneck.
The likely causes are the number of blocks/pages accessed and updated and the write performance of the database.

Generic Moving v1 (fast input w gauge).gif

Performance Graph

By temporarily setting the Feedback Size = Commit Size to 10,000 and Row Limit = 200,000, the execution can be illustrated in a graph. This graph shows the performance as the buffers fill up and can show where the bottlenecks are located.

An example of this can be seen below, which shows the job in the illustration above. The graph shows the relative performance compared with the RPM of the overall masking job.

The characteristics in this example are:

The SELECT performance is initially the max performance that can be fetched from the database.
As soon as the records start to appear and are masked they will be sent to the UPDATE step.
The UPDATE performance is seen as soon as records are available to be updated.
The SELECT will be at max speed until the initial buffers are full (around 20,000 - 30,0000 rows).
The performance is then (in this case) capped to the performance of the masking operation.
When all (relevant) buffers are full the SELECT step will be forced to the same performance as the UPDATE step.

Masking Relative Performance - SELECT Mask UPDATE.png

Examples and Actions

No masking job is the same and the characteristics will differ. With that said, the bottleneck can be identified in one of four different groups based on the bottleneck.

These are:

Update

The SELECT quickly peaks at max performance, then drops by masking operation (duration depends on Row Limit), and then capped by the Update performance.

Database Actions:

Investigate all details around UPDATE.
Check network and storage latencies.
Investigate Triggers, Indexes, and Constraints.

Signature in the logs:
ThroughputLogger:interval 42.43 MB (1204439 rows) counted in 30.000 sec
... at the start fast and then continues slow (here 40 Krpm) ...
ThroughputLogger:interval 0.53 MB (20012 rows) counted in 30.000 sec ...
ThroughputLogger:interval 0.51 MB (19285 rows) counted in 30.000 sec ...
ThroughputLogger:interval 0.52 MB (19663 rows) counted in 30.000 sec ...
... continues ...

Hung Job (DB lock)

	This is technically not a Performance issue - but frequently reported as such since the RPM is dropping (no records masked and the time duration increases). Database Actions: Check for locks on SELECT and/or UPDATE. Check Transaction Logs/Journaling and free space.
	Signature in the logs: ... ThroughputLogger:interval 0.00 MB (0 rows) counted in 30.000 sec ... ThroughputLogger:interval 0.00 MB (0 rows) counted in 30.000 sec ... ThroughputLogger:interval 0.00 MB (0 rows) counted in 30.000 sec ... ThroughputLogger:interval 0.00 MB (0 rows) counted in 30.000 sec ... ... continues ...

Masking operation

KBA6409 - Performance Graph - Masking.png

The SELECT quickly peaks at max performance and then is capped by the masking operation.

This is rare. If there are a moderate number of OOTB (Out of the Box) algorithms (say 10 per table), this is likely not the issue. Algorithms with RegEx or the old version of the Segment Mapping algorithm (and 5 were used per table) could hit this.

Actions:

Investigate the number of algorithms, data type, and data size.
Check the load on the masking engine.

Signature in the logs (very similar to Update):
ThroughputLogger:interval 42.43 MB (1204439 rows) counted in 30.000 sec
... at the start fast and then continues slow (here 40 Krpm) ...
ThroughputLogger:interval 0.53 MB (20012 rows) counted in 30.000 sec ...
ThroughputLogger:interval 0.51 MB (19285 rows) counted in 30.000 sec ...
ThroughputLogger:interval 0.52 MB (19663 rows) counted in 30.000 sec ...
... continues ...

Select

	This is a rare case. If SELECT is slow, then the performance will not be faster than the Input Step can read data and hence capped on SELECT. Actions: Investigate the Execution Plan for SELECT Check network latency.
	Signature in the logs: ThroughputLogger:interval 0.53 MB (20012 rows) counted in 30.000 sec ... ThroughputLogger:interval 0.51 MB (19285 rows) counted in 30.000 sec ... ThroughputLogger:interval 0.52 MB (19663 rows) counted in 30.000 sec ... ... starts slow and continues slow (here 40 Krpm) ...

Data Collection

To find the bottleneck, you need to look at stats from each of these areas. There is no single report or query that will give the answer here. To find the root cause, you need all the data.

Some points to remember:

On the database, make sure to flush buffers and the cache before testing.
To test masking performance, try masking using a fast algorithm (Character Mapping (CM) on Null SL).
Running the job with Stream = 1 will help to isolate the table and the issue.

What to collect

The following information needs to be collected:

Masking Rule Set and Inventory.
- For example: how many masked columns, algorithms, and column size.
Database performance.
- The number of rows in the table.
- How much time does it take to run SELECT COUNT(*), SELECT, and UPDATE statements on the database?
- Any execution plan.
Testing the masking performance.
- Temporarily set Feedback Size = Commit Size to 10,000 and Row Limit to 200,000.
- Test run the masking job.
- Remember to reset the Feedback Size to the original value (10,000 is too small for large jobs and will cause the logs to grow).

Database performance tests

Network performance

Ping the Masking Engine and collect latency data. Run this a few times to get a good average value for the latency.

ping Your_Masking_Engine

SELECT COUNT Performance

This will show how fast the database can count the records in the table. This is usually the max performance as there is no network latency.

Actions

Change TABLE to the table to test.
Take note of the number of records.
Take note of the duration to complete this statement (in seconds).

SELECT COUNT(*) AS COUNT_TEST FROM [TABLE]

Database performance estimates

With the number of records and the duration, the select read performance of all records can be calculated:

Count Rows per Minute: RowCount / Duration_Secs * 60

Question

Is the SELECT COUNT performance the same (or close to) the Masking Job Performance?

Yes - On the database, investigate the root cause as to why reading data in the SELECT COUNT is slow.
No - keep reading.

SELECT Performance

The SELECT is here encapsulated in a SELECT COUNT to reduce the output size. The duration is expected to be the same as above.

Actions

Change TABLE to the table to test.
Take note of the number of records.
Take note of the duration to complete this statement (in seconds).

SELECT count(*) AS SELECT_TEST FROM (select * from [TABLE]) as tst

Database performance estimates

With the metadata above, the select fetch performance can be calculated (note that this is the performance on the database and excludes network latency and java processing times):

Select Rows per Minute: RowCount / Duration_Secs * 60

UPDATE Performance

The UPDATE performance is usually slower and a likely bottleneck. This will measure how long time it will take to write the same data as read (out = in).

Actions:

Change TABLE to the table to test.
Change msk1, msk2,... to the masked columns (use as many as being masked).
Take note of the number of records.
Take note of the duration to complete this statement (in seconds).

UPDATE [TABLE] SET msk1 = msk1, msk2 = msk2, ...

Database performance estimates

With the number of updated records and the duration, the UPDATE performance can be calculated:

Update Rows per Minute: RowCount / Duration_Secs * 60

Question

Is the SELECT COUNT performance the same (or close to) the Masking Job Performance?

Yes -on the database, investigate as to why updating the table is slow.
No - open a support case.

Table Metadata

Some data about the table is also useful.

Actions

Take note of the table description.
Take note of any triggers and indexes on the table.
If possible - collect the number of pages/blocks required to store this table on the database.

Example Output

The example below is from a small table called 'MyTable'. The SQL to collect this will differ from database to database.

+-----------+----------+------------+-----------+-------------+
| TableName | RowCount | TotalPages | UsedPages | RowsPerPage |
+-----------+----------+------------+-----------+-------------+
| MyTable   |  509,223 |      1,937 |     1,933 |         263 |
+-----------+----------+------------+-----------+-------------+

Example calculations

Crunching the numbers above you get the following result.

The UPDATE is 7.8 times slower than SELECT (650/83).
The UPDATE performance will cap the performance of the masking job.

+---------------+--------------+----------+----------+---------+
|               |              |    COUNT |   SELECT |  UPDATE |
+---------------+--------------+----------+----------+---------+
|               |     Metadata |    47 ms |    47 ms |  360 ms |
+---------------+--------------+----------+----------+---------+
| Rows per Min  | 509,223 rows | 650 Mrpm | 650 Mrpm | 83 Mrpm |
+---------------+--------------+----------+----------+---------+

Collect data for Support Investigation

When opening a Masking Performance Support Case - please include:

Table description (with Indexes and Triggers).
SELECT and UPDATE statistics collected from the database.
Run a Masking Job with Feedback Size = Commit Size = 10,000 and Row Limit = 200,000.
Upload a Support Bundle.
Detail the Masking Job ID, Execution ID, and the table to investigate.

The following articles may provide more information or related information to this article:

KBA

At a Glance

Applicable Delphix Versions

DB Masking Performance

Performance factors

Masking Performance at different stages

SELECT

Masking Transformation

UPDATE

Performance Graph

Examples and Actions

Update

Hung Job (DB lock)

Masking operation

Select

Data Collection

What to collect

Database performance tests

Network performance

SELECT COUNT Performance

SELECT Performance

UPDATE Performance

Table Metadata

Example calculations

Collect data for Support Investigation

Related Articles

More info on how to access logs

Casting using SQL

Performance recommendations

Troubleshoot Performance in Oracle and MS SQL Server