Skip to main content
Delphix

Troubleshooting Masking Performance In-Place MS SQL Server (KBA6371)

 

KBA

KBA# 6371

At a Glance

Summary: Masking Performance troubleshooting steps for MS SQL Server.
DIY Investigation: Steps: 
  1. Measure latency using ping.
  2. Measure and collect Execution Plans (or Live Query Stats) for:
    • SELECT COUNT
    • SELECT
    • UPDATE
Support Investigation: Steps:
  1. Upload the details collected above.
  2. Set Feedback Size to 10,000 and run one job with Streams = 1.
  3. Upload a Support Bundle. 
  4. Detail the Masking Job ID, Execution ID, and the Table.

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases
6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0

5.3

5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

5.2

5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1

5.1

5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0

5.0

5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1 ,5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

4.3

4.3.1.0, 4.3.2.0, 4.3.2.1, 4.3.3.0, 4.3.4.0, 4.3.4.1, 4.3.5.0

4.2

4.2.0.0, 4.2.0.3, 4.2.1.0, 4.2.1.1, 4.2.2.0, 4.2.2.1, 4.2.3.0, 4.2.4.0 , 4.2.5.0, 4.2.5.1

4.1

4.1.0.0, 4.1.2.0, 4.1.3.0, 4.1.3.1, 4.1.3.2, 4.1.4.0, 4.1.5.0, 4.1.6.0

MS SQL Server Masking Performance 

This Knowledge Article will look at how to troubleshoot MS SQL Server Masking Performance. With masking performance, we will only look at the rpm (rows per minute) of the actual masking operation (In-Place only). It will not include the duration it takes to execute SQL for Pre/Post-Scripts, Constraints, and Indexes. 

 

The steps in this guide can be performed with available tools and will help to determine and answer the following questions:

  • Is the bottleneck reading the data (SELECT)?
  • Is the bottleneck the masking operation (algorithms and number of columns)?
  • Is the bottleneck the output (UPDATE)?

Performance factors

The groups and the key factors to performance are: 

  1. Masking Engine - the number of masked columns, data types, and algorithms.
  2. Network - latency and throughput.
  3. Database - read and write performance, the number of records per block, and fragmentation.  

 

The diagram below shows where the delays (in milliseconds) are accumulating to become slow performance. 

KBA6371 MS SQL-Timing-Evetns v2.png

The diagram excludes the throughput and the size of the data as this is normally not a performance-limiting factor. The time tc (time to count) is only included for reference as the fetch is normally the same as for SELECT.

  • COUNT: tc = tlc + tdbc + tother 
  • MASKING: tm
  • SELECT: ts = tls + tdbs + tother
  • UPDATE: tu = tlu + tdbu + tother

tother = Java, JDBC. 

Masking performance will be dictated by compounding and capping factors based on the timing events above.

To improve masking performance, the current bottleneck needs to be found and resolved (and the process repeated if needed).

Masking Performance at different stages 

The performance will differ at different stages of the masking job. These stages can be measured at the start of the masking job. Measuring these stages can help determine where the bottleneck is located.

The performance can be measured when the masking job finishes as well but it is usually not practical to measure these as one has to wait until the job finishes successfully. 

 

A normal In-Place masking job execution has the following characteristics:

  1. Initial SELECT
    • The first 10,000 to 20,000 records are normally read fast as the records are read into the buffers of the masking job.
    • This is usually visible at the start of the job and will show the performance of the SELECT before masking any records.
  2. Masking Transformation  
    • There are also buffers to be filled at the end of the job. 
    • The masking performance usually drops when records are masked. How much depends on the masking factors (see above).
  3. Update
    • The last step that can dictate the performance is the UPDATE step.
    • If this step is the slowest (normal case) all buffers will be full and the job is capped by the UPDATE performance. 

 

MS SQL Moving v3 (fast input w gauge).gif

SELECT and UPDATE Performance Graph 

By temporarily setting the Feedback Size to 10,000, the execution can be illustrated in a graph. This graph shows the performance as the buffers fill up and this will indicate the bottleneck.

An illustration of this can be seen below, which shows the job above. The graph shows the relative performance compared with the rpm of the overall masking job.

 

The characteristics are: 

  • The SELECT performance is initially the max performance that can be fetched from the database. 
  • As soon as the records start to appear in the masking engine they will be sent to the UPDATE step.
  • Hence, the UPDATE performance can also be seen at the beginning of the job. 
  • The SELECT will be at max speed until the initial buffers are full (around 20,000 - 30,0000 rows).
  • The performance will then be (in this case) capped to the performance of the masking operation.
  • When all buffers (relevant) are full the SELECT step will be forced to the same performance as the UPDATE step.

Masking Relative Performance - SELECT Mask UPDATE.png

Example of Bottlenecks  

No masking job is exactly the same and the characteristics will differ. With that said, there are three key bottleneck cases based on where the main bottleneck is located. 

These cases are represented below (with example graphs): 

SELECT

In this case, the performance will not be faster than the SELECT step and hence capped on SELECT.

Actions: investigate the execution plan for SELECT and network latency.

Masking operation 

In this case, the SELECT will reach max performance until the initial buffers are full and then be capped by the masking operation.

Actions: investigate the algorithms used and the capacity of the masking engine.

UPDATE 

This is the performance case detailed above. 

Actions: investigate the details around UPDATE and other operations on the database. 

 

Masking Relative Performance - SELECT Cap.png Masking Relative Performance - Masking Cap.png Masking Relative Performance - UPDATE Cap.png

Data Collection

To find the bottleneck, we need to look at a few areas. Many of the reports will only show a limited amount of information. To see the complete picture, we need all data.

Some points to remember:  

  • Masking jobs:
    • Run one job with Stream = 1.
    • If multiple tables are executed, indicate which table should be focused on. 
  • Testing performance on the database:
    • Make sure to flush buffers and the cache before testing. 
  • (Optional) Testing masking algorithm performance:
    • Make sure only one column is masked and set the algorithm to a fast algorithm (for example, TK) to see if that makes a difference. 

What to collect 

The following information needs to be collected: 

  • Masking Rule Set and Inventory.
    • For example: how many masked columns, algorithms, size of the data.
  • Database performance.
    • The number of rows in the table.
    • The durations from the database tests. 
    • Any execution plan.
  • Testing the masking performance.
    • Temporarily set Feedback Size to 10,000.
    • Test run the masking job (and the table of interest) for around 5 min.
    • Remember to reset the Feedback Size to the original value (10,000 is too small for large jobs and will cause the logs to grow).

Commands to test performance 

Network performance to Masking Engine 

Ping the Masking Engine and collect latency data. Run this a few time to get a good average value for the latency.

ping Your_Masking_Enging

Table Metadata

The data about the table is useful.

Actions

  • Take note of the table description (DDL).
  • Take note of any triggers and indexes on the table.
  • Collect the number of pages required to store this table on the database (see SQL below). 
 SELECT 
    t.NAME AS TableName,
    p.rows AS RowCount,
    SUM(a.total_pages) AS TotalPages, 
    SUM(a.used_pages) AS UsedPages, 
    (SUM(a.total_pages) - SUM(a.used_pages)) AS UnusedPages,
p.rows/SUM(a.used_pages) AS RowsPerPage,
SUM(a.used_pages)*10000/p.rows AS PagesPerCommit
FROM 
    sys.tables t
INNER JOIN      
    sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN 
    sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN 
    sys.allocation_units a ON p.partition_id = a.container_id
WHERE 
    t.NAME = 'YOUR_TABLE' 
GROUP BY 
    t.Name, p.Rows

Example Output

The example below is from a small table called 'MyTable'. It shows: 

  • RowCount: Total number of rows in the table.
  • TotalPages: Total number of Pages assigned for the table. 
  • UsedPages: The number of Pages used by the table.
  • UnusedPages: The number of Pages currently not used. 
  • RowsPerPage: The average number of Rows per Page (if this number is small then the database will need to read many pages to get to and update the data which will affect the performance).
  • PagesPerCommit: The average number of pages updated on a standard COMMIT SIZE (10,000). This value is related to 'RowsPerPage'. A larger number here will affect the performance as UPDATE will need to update more pages.  
+------------+----------+------------+-----------+-------------+-------------+----------------+
| TableName  | RowCount | TotalPages | UsedPages | UnusedPages | RowsPerPage | PagesPerCommit |
+------------+----------+------------+-----------+-------------+-------------+----------------+
| LargeTable | 509223   | 1937       | 1933      | 4           | 263         | 37             |
+------------+----------+------------+-----------+-------------+-------------+----------------+

SELECT COUNT Performance

This will show how fast the database can count the number of records in the table. This is frequently done in the same way as SELECT below. The latency will not affect the duration of this query as only one value is returned.

Actions

  • Change YOUR_TABLE to the table to test.
  • Take note of the number of records.
  • Take note of the duration to complete this statement (in milliseconds).
DECLARE @stm datetime
DECLARE @etm datetime
SELECT @stm=GETDATE()  

SELECT COUNT(*) AS COUNT_TEST FROM YOUR_TABLE

SELECT @etm=GETDATE()
SELECT DATEDIFF(MS,@stm,@etm) AS [Duration in millisecs]

Database performance estimates 

With the metadata above, the select fetch performance for counting the records can be calculated (note that this is the performance on the database and excludes network latency and java processing times): 

  • Count rpm: RowCount / Duration * 1000 * 60
  • Count ppmUsedPages / Duration * 1000 * 60

SELECT Performance

The SELECT is here encapsulated in a SELECT COUNT to reduce the output size. The duration is expected to be the same as above. 

Actions

  • Change YOUR_TABLE to the table to test.
  • Take note of the number of records.
  • Take note of the duration to complete this statement (in milliseconds).
DECLARE @stm datetime
DECLARE @etm datetime
SELECT @stm=GETDATE() 

SELECT COUNT(*) AS SELECT_TEST from (SELECT * from YOUR_TABLE) as tst

SELECT @etm=GETDATE()
SELECT DATEDIFF(MS,@stm,@etm) AS [Duration in millisecs]

Database performance estimates 

With the metadata above, the select fetch performance can be calculated (note that this is the performance on the database and excludes network latency and java processing times): 

  • SELECT rpm: RowCount / Duration * 1000 * 60
  • SELECT ppmUsedPages / Duration * 1000 * 60

UPDATE Performance

The UPDATE performance is usually slower and a likely bottleneck. It is also harder to measure this as there is no generic way to perform this test.

Note that only 100000 records have been selected (SELECT TOP 100000) and that all masked columns need to be included in the query.

Actions

  • Modify and use the SQL below.
  • Change YOUR_TABLE to the table to test.
  • Change Col1, Col2,... to the masked columns.
  • Take note of the number of records.
  • Take note of the duration to complete this statement (in milliseconds). 
DECLARE @stm datetime
DECLARE @etm datetime
SELECT @stm=GETDATE() 

;WITH CTE AS
(SELECT TOP 100000 Col1, Col2,... from YOUR_TABLE)
UPDATE CTE SET Col1 = Col1, Col2 = Col2, ...

SELECT @etm=GETDATE()
SELECT DATEDIFF(MS,@stm,@etm) AS [Duration in millisecs]

Database performance estimates 

With the metadata above, the UPDATE performance can be calculated (note that this is the performance on the database and excludes network latency and java processing times): 

  • UPDATE rpm: RowCount / Duration * 1000 * 60
  • UPDATE ppm (only applicable if all records are updated).

Example calculations

Crunching the number above we get the following result (here presented in a matrix). The details below are from a small Virtual SQL Server. 

  • 'Pages per MIn' can only be calculated if all rows are updated in UPDATE.
  • The UPDATE is 7.8 times slower than SELECT (650/83). 
  • If the UPDATE is the bottleneck, this difference will be reflected in the masking job which should have the same difference.
+---------------+-------------+----------+----------+---------+
|               |             |    COUNT |   SELECT |  UPDATE |
+---------------+-------------+----------+----------+---------+
|               |    Metadata |    47 ms |    47 ms |  360 ms |
+---------------+-------------+----------+----------+---------+
| Rows per Min  | 509223 rows | 650 Mrpm | 650 Mrpm | 83 Mrpm |
+---------------+-------------+----------+----------+---------+
| Pages per Min |  1933 pages | 2.5 Mrpm | 2.5 Mrpm |       - |
+---------------+-------------+----------+----------+---------+

Execution Plans / Live Query Statistics

SELECT COUNT

Live Query Statistics for SELECT COUNT statement generated by the Masking Engine to calculate the number of rows to mask. 

SELECT COUNT with PK

MS SQL Live Query Stats - COUNT w PK.png

SELECT COUNT with PK and Index on Mask

MS SQL Live Query Stats - COUNT w PK and IX on Mask.png

SELECT COUNT with no PK or Index

MS SQL Live Query Stats - COUNT wo PK or IX.png

SELECT

Live Query Statistics for SELECT statement generated by the Masking operation to retrieve all records to mask.

SELECT with PK

MS SQL Live Query Stats - SELECT w PK.png

SELECT with PK and Index on Mask

MS SQL Live Query Stats - SELECT w PK and IX on Mask.png

SELECT with no PK or Index

MS SQL Live Query Stats - SELECT wo PK or IX.png

UPDATE

Live Query Statistics for UPDATE statement generated by the Masking Engine. The examples exclude the batch process.

The second example in each group shows the Live Query Statistics from the performance query shown above. These are slightly different but not significantly so.  

UPDATE with PK

MS SQL Live Query Stats - UPDATE w PK.png

MS SQL Live Query Stats - UPDATE_w_SELECT w PK.png

UPDATE with PK and Index on Mask

MS SQL Live Query Stats - UPDATE w PK and IX on Mask.png

MS SQL Live Query Stats - UPDATE_w_SELECT w PK and IX on Mask.png

UPDATE with no PK or Index on Mask

MS SQL Live Query Stats - UPDATE wo PK or IX.png

MS SQL Live Query Stats - UPDATE_w_SELECT wo PK or IX.png

 

Collect data for Support Investigation

If needed, the performance can be investigated by support. 

  1. Upload the details collected above:
    • Table description.
    • Details about Indexes and Triggers.
    • SELECT and UPDATE statistics collected from the database. 
       
  2. Set Feedback Size to 10,000 and run one job with Streams = 1.
    • Remember to reset the Feedback Size to the original size (10,000 is too small for large jobs and will cause the logs to grow).
  3. Upload a Support Bundle. 
  4. Detail the Masking Job ID, Execution ID, and the table to investigate. 

Details to collect from the UI:

  • The Job ID, Execution ID, and the table name. 
  • The number of masked rows.
  • The rpm for the table. 

 

 

Related Articles

The following articles may provide more information or related information to this article: