Skip to main content
Delphix

Troubleshooting Masking Performance In-Place MS SQL Server (KBA6371)

 

 

KBA

KBA# 6371

At a Glance

Summary: This KBA describes step-by-step troubleshooting procedures for MS SQL Server executing In-Place Masking jobs. 
DIY Investigation: Masking performance greatly depends on the Database and the Network. This includes how the data is stored and accessed. 

These are the initial steps to verify performance:

  1. Measure latency using ping.
  2. Measure timing and collect Execution Plans (or Live Query Stats) for:
    • SELECT
    • UPDATE
  3. Check if the UPDATE causes additional operations on the database through Triggers, Constraints, or Indexes. 
Support Investigation: Steps:
  1. Upload the details collected above.
  2. Run a Masking Job with the following temporary configuration:
    • Feedback Size = Commit Size = 10,000
    • Row Limit = 200,000
  3. Upload a Support Bundle. 
  4. Detail the Masking Job ID and Execution ID.

Note: Remember to reset the Feedback Size after creating the bundle.

Bottlenecks and actions: When the investigation of the bottleneck has been completed, the following are possible next actions: 

UPDATE (likely):

  • Check database Recovery Model - use Simple.
  • Investigate the details around UPDATE and other operations on the database. 
  • Check if there are Indexes on Masked Columns and, if there are, these should be dropped. 
  • Check if there are any Triggers on masked tables, these need to be disabled.
SELECT (rare):
  • Investigate the execution plan for SELECT and network latency.
    • How much data (bytes) is fetched per row? 
    • If there are Large Text or Binary objects consider using CAST in Custom SQL. 
  • If a WHERE clause is used, consider indexing these column(s) or use algorithms that can filter data (i.e., Regex Decomp). 

Masking (rare):

  • Check the number of masked columns.
  • Check the algorithms used (are they optimal for performance).
  • Check the capacity of the masking engine.
How to test if Update is bottleneck: The likely performance issue is the time it takes to UPDATE a record on the database.

To verify this you can tell SQL Server to not Update using this easy trick. Use Custom SQL in the Rule Set and change PK/LK/ID column to NULL "[PK/LK/ID col]". Change back after the test.

If this is fast then the bottleneck is on the SQL Server side 
Note: When used, no data will be masked.

Example - PK is here called 'MY_PK':
SELECT NULL "MY_PK", [masked columns] FROM [table]
More info:  More info on how to access logs: 

Casting using SQL:

MS SQL Driver Support Plugin (Enable Tasks - Indexes, Constraints, and Triggers)

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases
6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0

5.3

5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

5.2

5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1

5.1

5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0

5.0

5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1 ,5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

4.3

4.3.1.0, 4.3.2.0, 4.3.2.1, 4.3.3.0, 4.3.4.0, 4.3.4.1, 4.3.5.0

4.2

4.2.0.0, 4.2.0.3, 4.2.1.0, 4.2.1.1, 4.2.2.0, 4.2.2.1, 4.2.3.0, 4.2.4.0 , 4.2.5.0, 4.2.5.1

4.1

4.1.0.0, 4.1.2.0, 4.1.3.0, 4.1.3.1, 4.1.3.2, 4.1.4.0, 4.1.5.0, 4.1.6.0

MS SQL Server Masking Performance 

This Knowledge Article looks at how to troubleshoot Masking Performance on MS SQL Server. 

It will only look at the rpm (rows per minute) of the actual masking operation (In-Place). It will not include the runtime of Disabling Constraints, Dropping Indexes, or Executing Pre/Post SQL Scripts. 

The steps in this guide can be performed with available tools and will help to determine and answer the following.

Is the bottleneck:

  • Reading the data (SELECT)?
  • Masking operation (algorithms and number of columns)?
  • Updating the data (UPDATE)?

Note: 99% of the time, the performance bottleneck is when SQL Server is performing the UPDATE.

Performance factors

Some key factors to slow performance are:

Database Factors:  - Database IO Performance
- Execution Plan and Indexes (for masking, the best is Full Table Scan)
- The number of records per page
- DB fragmentation
- Triggers and Indexes on masked tables and columns
Note:  A frequent cause of slow performance stems from the number of blocks/pages accessed and updated. This can be due to execution plans, what data and how it is stored, and fragmentation. It can also be caused by Triggers and Indexes.
Network Factors: - Latency
- Throughput
Note: This is rarely an issue as most masking engines are in the same Data Centre as the Database.
Masking Operation Factors: - Number Algorithms
- Data Type (e.i. large Text Object)
- Algorithm Type
Note: The new algorithms (Alg v2) are highly tuned and can mask at a rate of many millions of rows per minute.

 

The diagram below shows where the delays are accumulating to become slow performance. 

KBA6371 MS SQL-Timing-Evetns v2.png

The diagram excludes the throughput and the size of the data as this is normally not a performance-limiting factor. The time tc (time to count) is only included for reference as the fetch is normally the same as for SELECT.

  • COUNT: tc = tlc + tdbc + tother 
  • MASKING: tm
  • SELECT: ts = tls + tdbs + tother
  • UPDATE: tu = tlu + tdbu + tother

tother = Java, JDBC. 

Masking performance will be dictated by compounding and capping factors based on the timing events above.

To improve masking performance, the current bottleneck needs to be found and resolved (and the process repeated if needed).

Masking Performance at different stages 

The performance will differ at different stages of the masking job. These stages can be measured at the start of the masking job. Measuring these stages can help determine where the bottleneck is located.

The performance can be measured when the masking job finishes as well but it is usually not practical to measure these as one has to wait until the job finishes successfully. 

 

A normal In-Place masking job execution has the following characteristics:

  1. SELECT
    • The SELECT reads the dataset to be Masked.
    • The first 10,000 to 20,000 records are normally read fast as the records are read into the masking engine buffers.
    • This is usually visible as a performance peak at the start of the job and shows the performance of the SELECT statement.
  2. Masking Transformation  
    • The SELECT is usually faster than the Masking operation.
    • How fast depends on how many algorithms are used and the masked data.
  3. Update
    • The last step is UPDATE.
    • This is usually the performance bottleneck. 
    • The likely causes are the number of blocks/pages accessed and updated and the write performance of the database. 

 

MS SQL Moving v3 (fast input w gauge).gif

SELECT and UPDATE Performance Graph 

By temporarily setting the Feedback Size = Commit Size, usually 10,000, the execution can be illustrated in a graph. This graph shows the performance as the buffers fill up and this will indicate the bottleneck.

An example of this can be seen below, which shows the job above. The graph shows the relative performance compared with the rpm of the overall masking job.

 

The characteristics are: 

  • The SELECT performance is initially the max performance that can be fetched from the database. 
  • As soon as the records start to appear in the masking engine they will be sent to the UPDATE step.
  • The UPDATE performance is seen as soon as records are available to be updated.
  • The SELECT will be at max speed until the initial buffers are full (around 20,000 - 30,0000 rows).
  • The performance is then (in this case) capped to the performance of the masking operation.
  • When all (relevant) buffers are full the SELECT step will be forced to the same performance as the UPDATE step.

Masking Relative Performance - SELECT Mask UPDATE.png

Example of Bottlenecks  

No masking job is exactly the same and the characteristics will differ. With that said, the bottleneck can be identified in one of three areas. 

These areas are (with example graphs below):

SELECT

In this case, the performance will not be faster than the SELECT step and hence capped on SELECT.

Actions: investigate the execution plan for SELECT and network latency.

Masking operation 

In this case, the SELECT will reach max performance until the initial buffers are full and then be capped by the masking operation.

Actions: investigate the algorithms used and the capacity of the masking engine.

UPDATE 

This is the performance case detailed above. 

Actions: investigate the details around UPDATE and other operations on the database. 

 

Masking Relative Performance - SELECT Cap.png Masking Relative Performance - Masking Cap.png Masking Relative Performance - UPDATE Cap.png

Data Collection

To find the bottleneck, we need to look at stats from each of these areas. There is no single report or query that will give the answer here. To find the root cause, we need all data.

Some points to remember:  

  • On the database, make sure to flush buffers and the cache before testing. 
  • To test masking performance, try masking using a fast algorithm (Character Mapping (CM) on Null SL). 
  • Running the job with Stream = 1 will help to isolate the table and the issue.

What to collect 

The following information needs to be collected: 

  • Masking Rule Set and Inventory.
    • For example: how many masked columns, algorithms, size of the data.
  • Database performance.
    • The number of rows in the table.
    • The durations from the database tests. 
    • Any execution plan.
  • Testing the masking performance.
    • Temporarily set Feedback Size = Commit Size to 10,000 and Row Limit to 200,000.
    • Test run the masking job (and the table of interest) for around 5 min.
    • Remember to reset the Feedback Size to the original value (10,000 is too small for large jobs and will cause the logs to grow).

Commands to test performance 

Check 1 - Network performance

Ping the Masking Engine and collect latency data. Run this a few times to get a good average value for the latency.

ping Your_Masking_Engine

Check 2 - Table Page Count

The following data is very useful.

Actions

  • Change 'TABLE' to the table with perf issue.
  • Take note of the table description (DDL).
  • Take note of any triggers and indexes on the table.
  • Collect the number of pages required to store this table on the database (see SQL below). 
 SELECT 
    t.NAME AS TableName,
    p.rows AS RowCount,
    SUM(a.total_pages) AS TotalPages, 
    SUM(a.used_pages) AS UsedPages, 
    (SUM(a.total_pages) - SUM(a.used_pages)) AS UnusedPages,
p.rows/SUM(a.used_pages) AS RowsPerPage,
SUM(a.used_pages)*10000/p.rows AS PagesPerCommit
FROM 
    sys.tables t
INNER JOIN      
    sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN 
    sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN 
    sys.allocation_units a ON p.partition_id = a.container_id
WHERE 
    t.NAME = '[TABLE]' 
GROUP BY 
    t.Name, p.Rows

Example Output

The example below is from a small table called 'MyTable'. It shows: 

  • RowCount: Total number of rows in the table.
  • TotalPages: Total number of Pages assigned for the table. 
  • UsedPages: The number of Pages used by the table.
  • UnusedPages: The number of Pages currently not used. 
  • RowsPerPage: The average number of Rows per Page (if this number is small then the database will need to read many pages to get to and update the data which will affect the performance).
  • PagesPerCommit: The average number of pages updated on a standard COMMIT SIZE (10,000).
    • This value is related to 'RowsPerPage'.
    • A larger number here will affect the performance as UPDATE will need to update more pages.  
    • Under 50 is a small number.
+------------+----------+------------+-----------+-------------+-------------+----------------+
| TableName  | RowCount | TotalPages | UsedPages | UnusedPages | RowsPerPage | PagesPerCommit |
+------------+----------+------------+-----------+-------------+-------------+----------------+
| LargeTable | 509223   | 1937       | 1933      | 4           | 263         | 37             |
+------------+----------+------------+-----------+-------------+-------------+----------------+

Check 3 - SELECT COUNT Performance

This will show how fast the database can count the number of records in the table. This is frequently done in the same way as SELECT below. The latency will not affect the duration of this query as only one value is returned.

Actions

  • Change 'TABLE' to the table with perf. issue.
  • Take note of the number of records.
  • Take note of the duration to complete this statement (in milliseconds).
DECLARE @stm datetime
DECLARE @etm datetime
SELECT @stm=GETDATE()  

SELECT COUNT(*) AS COUNT_TEST FROM [TABLE]

SELECT @etm=GETDATE()
SELECT DATEDIFF(MS,@stm,@etm) AS [Duration in millisecs]

Database performance estimates 

With the metadata above, the select fetch performance for counting the records can be calculated (note that this is the performance on the database and excludes network latency and java processing times): 

  • Count rpm: RowCount / Duration * 1000 * 60
  • Count ppmUsedPages / Duration * 1000 * 60

SELECT Performance

The SELECT is here encapsulated in a SELECT COUNT to reduce the output size. The duration is expected to be the same as above. 

Actions

  • Change 'TABLE' to the table with perf. issue.
  • Take note of the number of records.
  • Take note of the duration to complete this statement (in milliseconds).
DECLARE @stm datetime
DECLARE @etm datetime
SELECT @stm=GETDATE() 

SELECT COUNT(*) AS SELECT_TEST FROM (SELECT * FROM [TABLE]) as tst

SELECT @etm=GETDATE()
SELECT DATEDIFF(MS,@stm,@etm) AS [Duration in millisecs]

Database performance estimates 

With the metadata above, the select fetch performance can be calculated (note that this is the performance on the database and excludes network latency and java processing times): 

  • SELECT rpm: RowCount / Duration * 1000 * 60
  • SELECT ppmUsedPages / Duration * 1000 * 60

Check 4 - UPDATE Performance

The UPDATE performance is usually slower and a likely bottleneck. It is also harder to measure this as there is no generic way to perform this test.

Note that only 100000 records have been selected (SELECT TOP 100000) and that all masked columns need to be included in the query.

Actions

  • Modify and use the SQL below.
    • Change 'TABLE' to the table with perf. issue.
    • Change Col1, Col2,... to the masked columns.
  • Take note of the number of records.
  • Take note of the duration to complete this statement (in milliseconds). 
DECLARE @stm datetime
DECLARE @etm datetime
SELECT @stm=GETDATE() 

;WITH CTE AS
(SELECT TOP 100000 Col1, Col2,... from [TABLE])
UPDATE CTE SET Col1 = Col1, Col2 = Col2, ...

SELECT @etm=GETDATE()
SELECT DATEDIFF(MS,@stm,@etm) AS [Duration in millisecs]

Database performance estimates 

With the metadata above, the UPDATE performance can be calculated (note that this is the performance on the database and excludes network latency and java processing times): 

  • UPDATE rpm: RowCount / Duration * 1000 * 60
  • UPDATE ppm (only applicable if all records are updated).

Example calculations

Crunching the number above we get the following result (here presented in a matrix). The details below are from a small Virtual SQL Server. 

  • 'Pages per MIn' can only be calculated if all rows are updated in UPDATE.
  • The UPDATE is 7.8 times slower than SELECT (650/83). 
  • If the UPDATE is the bottleneck, this difference will be reflected in the masking job which should have the same difference.
+---------------+-------------+----------+----------+---------+
|               |             |    COUNT |   SELECT |  UPDATE |
+---------------+-------------+----------+----------+---------+
|               |    Metadata |    47 ms |    47 ms |  360 ms |
+---------------+-------------+----------+----------+---------+
| Rows per Min  | 509223 rows | 650 Mrpm | 650 Mrpm | 83 Mrpm |
+---------------+-------------+----------+----------+---------+
| Pages per Min |  1933 pages | 2.5 Mrpm | 2.5 Mrpm |       - |
+---------------+-------------+----------+----------+---------+

Execution Plans / Live Query Statistics

SELECT COUNT

Live Query Statistics for SELECT COUNT statement generated by the Masking Engine to calculate the number of rows to mask. 

SELECT COUNT with PK

MS SQL Live Query Stats - COUNT w PK.png

SELECT COUNT with PK and Index on Mask

MS SQL Live Query Stats - COUNT w PK and IX on Mask.png

SELECT COUNT with no PK or Index

MS SQL Live Query Stats - COUNT wo PK or IX.png

SELECT

Live Query Statistics for SELECT statement generated by the Masking operation to retrieve all records to mask.

SELECT with PK

MS SQL Live Query Stats - SELECT w PK.png

SELECT with PK and Index on Mask

MS SQL Live Query Stats - SELECT w PK and IX on Mask.png

SELECT with no PK or Index

MS SQL Live Query Stats - SELECT wo PK or IX.png

UPDATE

Live Query Statistics for UPDATE statement generated by the Masking Engine. The examples exclude the batch process.

The second example in each group shows the Live Query Statistics from the performance query shown above. These are slightly different but not significantly so.  

UPDATE with PK

MS SQL Live Query Stats - UPDATE w PK.png

MS SQL Live Query Stats - UPDATE_w_SELECT w PK.png

UPDATE with PK and Index on Mask

MS SQL Live Query Stats - UPDATE w PK and IX on Mask.png

MS SQL Live Query Stats - UPDATE_w_SELECT w PK and IX on Mask.png

UPDATE with no PK or Index on Mask

MS SQL Live Query Stats - UPDATE wo PK or IX.png

MS SQL Live Query Stats - UPDATE_w_SELECT wo PK or IX.png

 

Collect data for Support Investigation

When opening a Masking Performance Support Case - please include: 

  1. Table description (with Indexes and Triggers).
  2. SELECT and UPDATE statistics collected from the database. 
  3. Run a masking job with Feedback Size = Commit Size = 10,000 and Row Limit = 200,000.
  4. Upload a Support Bundle. 
  5. Detail the Masking Job ID, Execution ID, and the table to investigate. 

 

 

Related Articles

The following articles may provide more information or related information to this article: