Troubleshooting Masking Performance In-Place Generic DB (KBA6409)
KBA
KBA# 6409
At a Glance
Summary: | This KBA describes step-by-step troubleshooting procedures for In-Place Masking jobs on generic database.
Special steps are available for Oracle and MS SQL Server (see links below). |
---|---|
First: | Before investigating performance, ensure Triggers are dropped (or disabled) and Constraints and Indexes are Dropped on masked columns. |
DIY Investigation: | Masking performance greatly depends on the database (DB) and the network. This includes how the data is stored and accessed. These are the initial steps to verify performance:
|
Support Investigation: | Steps:
|
More info: | More info on how to access logs:
Casting using SQL:
Troubleshoot Performance in Oracle and MS SQL Server: |
Applicable Delphix Versions
- Click here to view the versions of the Delphix engine to which this article applies
-
Date Release Aug 24, 2023 14.0.0.0 Jul 24, 2023 13.0.0.0 Jun 21, 2023 12.0.0.0 May 25, 2023 11.0.0.0 Apr 13, 2023 10.0.0.0 | 10.0.0.1 Mar 13, 2023 | Mar 20, 2023 9.0.0.0 | 9.0.0.1 Feb 13, 2023 8.0.0.0 Jan 12, 2023 7.0.0.0 Releases Prior to 2023 Major Release All Sub Releases 6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.8.1, 6.0.9.0, 6.0.10.0, 6.0.10.1, 6.0.11.0, 6.0.12.0, 6.0.12.1, 6.0.13.0, 6.0.13.1, 6.0.14.0, 6.0.15.0, 6.0.16.0, 6.0.17.0, 6.0.17.1, 6.0.17.2
5.3
5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0, 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0 5.2
5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1
5.1
5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0
5.0
5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1, 5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4
DB Masking Performance
This article looks at how to troubleshoot masking performance. There are specific Oracle and MS SQL Server articles which go into more detail for those databases. See Related Articles.
It will only look at the rpm (rows per minute) of the actual masking operation (In-Place). It will not include the runtime of Disabling Constraints, Dropping Indexes, or Executing Pre/Post SQL Scripts.
The steps in this guide can be performed with available tools and will help to determine and answer what operation may be causing the bottleneck:
- Reading the data (SELECT)
- Masking operation (data size, algorithms, or the number of columns)
- Updating the data (UPDATE)
- The database is blocked (SELECT, UPDATE, logs, or journaling)
Performance factors
Some key factors to slow performance are:
Database | Factors: | - Database Write Performance - Database Read Performance - Execution Plan and Indexes (and Triggers) - The number of records per block/page - DB fragmentation |
Note: | A frequent cause of slow performance stems from the number of blocks/pages accessed and updated. This can be due to execution plans, what data and how it is stored, and fragmentation. It can also be caused by Triggers and Indexes. | |
Network | Factors: | - Latency - Throughput |
Note: | This is rarely an issue as most masking engines are in the same Data Centre as the Database. | |
Masking Operation | Factors: | - Number of algorithms - Data type (for example, Large Text Object) - Algorithm type |
Note: | The new algorithms (Alg v2) are highly tuned and can mask at a rate of many millions of rows per minute. |
The diagram below shows where the delays are accumulating to cause slow performance.
The diagram excludes the throughput and the size of the data as this is normally not a performance-limiting factor. The time tc (time to count) is only included for reference.
- COUNT: tc = tlc + tdbc + tother
- MASKING: tm
- SELECT: ts = tls + tdbs + tother
- UPDATE: tu = tlu + tdbu + tother
tother = Java, JDBC.
Masking performance in rpm will be dictated by compounding and capping factors since some of the processing is done in parallel.
To improve masking performance and rpm, the bottleneck needs to be found and resolved.
Masking Performance at different stages
The performance will differ at different stages of the masking job. Unique stages are at play when the masking job starts, and this makes it easy to measure them. Measuring these stages can help determine where the bottleneck is located.
An In-Place masking job execution has the following characteristics.
SELECT
- The SELECT reads the dataset to be masked.
- The first 10,000 to 20,000 records are normally read fast as the records are read into the masking engine buffers.
- This is usually visible as a performance peak at the start of the job and shows the performance of the SELECT statement.
Masking Transformation
- The SELECT is usually faster than the masking operation.
- How fast depends on how many algorithms are used and the masked data.
UPDATE
- The last step is UPDATE.
- This is usually the performance bottleneck.
- The likely causes are the number of blocks/pages accessed and updated and the write performance of the database.
Performance Graph
By temporarily setting the Feedback Size = Commit Size to 10,000 and Row Limit = 200,000, the execution can be illustrated in a graph. This graph shows the performance as the buffers fill up and can show where the bottlenecks are located.
An example of this can be seen below, which shows the job in the illustration above. The graph shows the relative performance compared with the RPM of the overall masking job.
The characteristics in this example are:
- The SELECT performance is initially the max performance that can be fetched from the database.
- As soon as the records start to appear and are masked they will be sent to the UPDATE step.
- The UPDATE performance is seen as soon as records are available to be updated.
- The SELECT will be at max speed until the initial buffers are full (around 20,000 - 30,0000 rows).
- The performance is then (in this case) capped to the performance of the masking operation.
- When all (relevant) buffers are full the SELECT step will be forced to the same performance as the UPDATE step.
Examples and Actions
No masking job is the same and the characteristics will differ. With that said, the bottleneck can be identified in one of four different groups based on the bottleneck.
These are:
Update
The SELECT quickly peaks at max performance, then drops by masking operation (duration depends on Row Limit), and then capped by the Update performance.
Database Actions:
|
|
Signature in the logs: ThroughputLogger:interval 42.43 MB (1204439 rows) counted in 30.000 sec ... at the start fast and then continues slow (here 40 Krpm) ... ThroughputLogger:interval 0.53 MB (20012 rows) counted in 30.000 sec ... ThroughputLogger:interval 0.51 MB (19285 rows) counted in 30.000 sec ... ThroughputLogger:interval 0.52 MB (19663 rows) counted in 30.000 sec ... ... continues ... |
Hung Job (DB lock)
This is technically not a Performance issue - but frequently reported as such since the RPM is dropping (no records masked and the time duration increases). Database Actions:
|
|
Signature in the logs: ... ThroughputLogger:interval 0.00 MB (0 rows) counted in 30.000 sec ... ThroughputLogger:interval 0.00 MB (0 rows) counted in 30.000 sec ... ThroughputLogger:interval 0.00 MB (0 rows) counted in 30.000 sec ... ThroughputLogger:interval 0.00 MB (0 rows) counted in 30.000 sec ... ... continues ... |
Masking operation
The SELECT quickly peaks at max performance and then is capped by the masking operation. This is rare. If there are a moderate number of OOTB (Out of the Box) algorithms (say 10 per table), this is likely not the issue. Algorithms with RegEx or the old version of the Segment Mapping algorithm (and 5 were used per table) could hit this. Actions:
|
|
Signature in the logs (very similar to Update): ThroughputLogger:interval 42.43 MB (1204439 rows) counted in 30.000 sec ... at the start fast and then continues slow (here 40 Krpm) ... ThroughputLogger:interval 0.53 MB (20012 rows) counted in 30.000 sec ... ThroughputLogger:interval 0.51 MB (19285 rows) counted in 30.000 sec ... ThroughputLogger:interval 0.52 MB (19663 rows) counted in 30.000 sec ... ... continues ... |
Select
This is a rare case. If SELECT is slow, then the performance will not be faster than the Input Step can read data and hence capped on SELECT. Actions:
|
|
Signature in the logs: ThroughputLogger:interval 0.53 MB (20012 rows) counted in 30.000 sec ... ThroughputLogger:interval 0.51 MB (19285 rows) counted in 30.000 sec ... ThroughputLogger:interval 0.52 MB (19663 rows) counted in 30.000 sec ... ... starts slow and continues slow (here 40 Krpm) ... |
Data Collection
To find the bottleneck, you need to look at stats from each of these areas. There is no single report or query that will give the answer here. To find the root cause, you need all the data.
Some points to remember:
- On the database, make sure to flush buffers and the cache before testing.
- To test masking performance, try masking using a fast algorithm (Character Mapping (CM) on Null SL).
- Running the job with Stream = 1 will help to isolate the table and the issue.
What to collect
The following information needs to be collected:
- Masking Rule Set and Inventory.
- For example: how many masked columns, algorithms, and column size.
- Database performance.
- The number of rows in the table.
- How much time does it take to run SELECT COUNT(*), SELECT, and UPDATE statements on the database?
- Any execution plan.
- Testing the masking performance.
- Temporarily set Feedback Size = Commit Size to 10,000 and Row Limit to 200,000.
- Test run the masking job.
- Remember to reset the Feedback Size to the original value (10,000 is too small for large jobs and will cause the logs to grow).
Database performance tests
Network performance
Ping the Masking Engine and collect latency data. Run this a few times to get a good average value for the latency.
ping Your_Masking_Engine
SELECT COUNT Performance
This will show how fast the database can count the records in the table. This is usually the max performance as there is no network latency.
Actions
- Change TABLE to the table to test.
- Take note of the number of records.
- Take note of the duration to complete this statement (in seconds).
SELECT COUNT(*) AS COUNT_TEST FROM [TABLE]
Database performance estimates
With the number of records and the duration, the select read performance of all records can be calculated:
- Count Rows per Minute: RowCount / Duration_Secs * 60
Question
Is the SELECT COUNT performance the same (or close to) the Masking Job Performance?
- Yes - On the database, investigate the root cause as to why reading data in the SELECT COUNT is slow.
- No - keep reading.
SELECT Performance
The SELECT is here encapsulated in a SELECT COUNT to reduce the output size. The duration is expected to be the same as above.
Actions
- Change TABLE to the table to test.
- Take note of the number of records.
- Take note of the duration to complete this statement (in seconds).
SELECT count(*) AS SELECT_TEST FROM (select * from [TABLE]) as tst
Database performance estimates
With the metadata above, the select fetch performance can be calculated (note that this is the performance on the database and excludes network latency and java processing times):
- Select Rows per Minute: RowCount / Duration_Secs * 60
UPDATE Performance
The UPDATE performance is usually slower and a likely bottleneck. This will measure how long time it will take to write the same data as read (out = in).
Actions:
- Change TABLE to the table to test.
- Change msk1, msk2,... to the masked columns (use as many as being masked).
- Take note of the number of records.
- Take note of the duration to complete this statement (in seconds).
UPDATE [TABLE] SET msk1 = msk1, msk2 = msk2, ...
Database performance estimates
With the number of updated records and the duration, the UPDATE performance can be calculated:
- Update Rows per Minute: RowCount / Duration_Secs * 60
Question
Is the SELECT COUNT performance the same (or close to) the Masking Job Performance?
- Yes -on the database, investigate as to why updating the table is slow.
- No - open a support case.
Table Metadata
Some data about the table is also useful.
Actions
- Take note of the table description.
- Take note of any triggers and indexes on the table.
- If possible - collect the number of pages/blocks required to store this table on the database.
Example Output
The example below is from a small table called 'MyTable'. The SQL to collect this will differ from database to database.
+-----------+----------+------------+-----------+-------------+ | TableName | RowCount | TotalPages | UsedPages | RowsPerPage | +-----------+----------+------------+-----------+-------------+ | MyTable | 509,223 | 1,937 | 1,933 | 263 | +-----------+----------+------------+-----------+-------------+
Example calculations
Crunching the numbers above you get the following result.
- The UPDATE is 7.8 times slower than SELECT (650/83).
- The UPDATE performance will cap the performance of the masking job.
+---------------+--------------+----------+----------+---------+ | | | COUNT | SELECT | UPDATE | +---------------+--------------+----------+----------+---------+ | | Metadata | 47 ms | 47 ms | 360 ms | +---------------+--------------+----------+----------+---------+ | Rows per Min | 509,223 rows | 650 Mrpm | 650 Mrpm | 83 Mrpm | +---------------+--------------+----------+----------+---------+
Collect data for Support Investigation
When opening a Masking Performance Support Case - please include:
- Table description (with Indexes and Triggers).
- SELECT and UPDATE statistics collected from the database.
- Run a Masking Job with Feedback Size = Commit Size = 10,000 and Row Limit = 200,000.
- Upload a Support Bundle.
- Detail the Masking Job ID, Execution ID, and the table to investigate.
Related Articles
The following articles may provide more information or related information to this article: