Skip to main content

Best Practice: Masking Performance (Database) (KBA1561)



Masking Performance - Background

This article looks at masking performance and more specifically the overall time it takes to perform a masking job. The page will, therefore, go beyond looking at masking rows per minute (rpm) and include startup time as well. 

Masking Job Performance.png

It addresses the following performance areas: 

  • Startup Time - could be a cause if using algorithms with a large number of lookup values. 
  • Read Performance - unlikely an issue. 
  • Transformation Performance - frequent issue if there are many algorithms per table using heavy SM or Custom Algorithms. 
  • Update Performancefrequent issue related to database configuration or storage issues (also poor network).
  • Post-Script Time - could be a cause if post-scripts are used or indexes/constraints need to be recreated. 


Performance and this page should only be looked at when the masking job has been successfully executed. Any error should be addressed first. 



Startup Time

This is a silent performance killer as it is before we see any masked rows. It is the time it takes to prepare masking algorithms:

  • Cause: Large Secure Lookup and Mapping Algorithms.
    • Remedy: Try to use a smaller number of lookup values in these algorithms. 
Read Performance

The is the SELECT statement that reads the data from the database. Mostly, this is a minor factor in the whole masking operation. 

  • Cause 1: Large number of columns with large data objects.
    • Remedy: Use Custom SQL to reduce the amount of data read from the database (ex. SELECT 1 as MaskCol....)
  • Cause 2: Use of complex Custom SQL.
    • Remedy: If used, simplify the Custom SQL as much as possible. Look out for nested SELECT statements.
  • Cause 3: Network performance - especially latency
    • Remedy: Check with your Network team to improve latency. 
Transformation Performance

This is usually a key part in performance investigation. The key factors here are: 

  • Cause 1: The number of fields masked together with Algorithms used.
    • Algorithms with fewer lookups perform better. 
    • Remedy 1: Try to use simple algorithms with few lookups as possible.
    • Remedy 2: Lookout for Segment Mapping - use few segments and Numeric lookups
  • Cause 2: Masking engine performance.
    • Remedy 1: Ensure sufficient number of CPUs on the masking engine. 
    • Remedy 2: Use Streams in the Job Configuration to perform masking of tables in parallel. 
  • Cause 3: Java Garbage Collection (GC). 
    • Remedy 1: Ensure optimal algorithms (see Cause 1) and an appropriate amount of Job Memory.
    • GC will always happen. It is part of JAVA, but: 
      • Too much Job Memory = longer time to perform GC.
      • Too little Job Memory = more frequent GC. 
Update Performance

This is another key aspect of performance: 

  • Cause 1: Database Performance
    • Remedy: Ensure that the database has enough power to change all columns and rows with the masked data.
      • Factor 1: Storage IO 
      • Factor 2: CPUs
      • Factor 3: RAM to cache data.
  • Cause 2: Indexes and Triggers on Masked Columns and Tables
    • Always look out for masked columns with indexes (in the Inventory and on the database) and Triggers (on the database). 
    • Remedy: In Job Configuration, select Drop Indexes and Disable Triggers. It might be needed to perform this using Pre-Scripts or on the database.  
  • Cause 3: Index for Update statement 
    • A Unique Index is needed to look up the row in the Update statement. It should be Unique, Indexed, and not be masked. 
    • Remedy: It might be needed to create one if the normal Unique ID is being masked.
      • MS SQL: The masking engine will create DMS_ROW_ID.
      • Oracle: Use ROWID.
  • Cause 4: Use parallelism in Job Configuration;
    • Remedy 1: Set Updates Threads to 2 or 4
    • Remedy 2: Set Streams to 2, 4 or 8.
    • Remedy 3: Run concurrent masking Jobs. 
    • Watch out for Deadlocks. 
  • Cause 5: Network performance - especially latency
    • Remedy: Check with your Network team to improve latency. 
  • Cause 6: Casting of values
    • Casting doesn't usually have an impact. If it happens on large objects and they are indexed, this can slow down the job significantly. 
    • Remedy: Check execution plan on the database. Make changes to improve casting and use of indexes. 


More Details

Below are more details about:

  • Time to complete the Masking Job
    • Startup Time - Duration 
    • Masking Time - Duration
      • Performance and number of records
        • Read Performance
        • Masking Performance 
        • Update Performance
    • Pre and Post Scripts
  • Factors that affect performance
    • Masking Engine 
    • Database

Time to complete the Masking Job

Perf_Algorithm_Loadtime rpm.png

The graph above shows examples of time to complete a masking job as a function of:

  • Startup time (here 15 min)
  • Rows per minute (rpm)
  • Number of records 

In order to finish the masking job in time, it might be needed to look at all these factors. For a small number of records, the startup time plays a significant role. For a large number of records, the rows per minute is a big factor. 

Startup Time - Duration

For most of the time, this is not large enough to affect the overall performance of a masking job. The startup time depends on the algorithm used and the number of rows in the lookup data.

Perf_Algorithm_Loadtime (SL).png 

Remedy and consideration 

Most algorithms are finely tuned and have been defined with a small set of lookup values, most are not exceeding 50,000 values, and as such, only take a few seconds to load. 

When creating a custom algorithm, consider the following:

  • Selection of algorithm (minimize lookup algorithms).
  • Select as small lookup pools as possible. 


Masking Time - Duration

Usually, the number of records to mask is fixed and as such, we need to consider how fast we can perform the masking operation - that is the number of rows per minute (rpm). Usually, this number is specified in hundreds of thousands or millions. 

What is acceptable performance depends greatly on how many fields are being masked and how many tables at the same time.

In most cases, good performance starts around 500,000 rpm, a normal optimal performance is around 1 million rpm (or 1 Mrpm). With a very fast database server, highly optimized networks and fine-tuned masking jobs, we have seen jobs running in the excess of 8 million rpm.

Performance and number of records

How fast the masking job, in rpm, is affected by three components:

  • Read Performance
  • Masking Performance 
  • Update Performance

See above, in the Summary, for specific details. 

Pre and Post Scripts

Pre and Post scripts are also affecting performance and these will add masking time at the start of a job (pre-script) and at the end of a job (post-script).

Usually, these scripts can be tested prior to adding them to the masking job and it is important to ensure that these are as effective as possible. 

Remedy and Considerations

Masking Engine

On the masking engine, make sure that the following items are as optimized as possible:

  • Masking Server - raw power:
    • Ensure that the server has plenty of fast cores (CPUs) - it is better to have a few fast CPUs than many slow CPUs.
    • Lots of cache memory.
    • Memory.
  • Masking Jobs - efficiency
    • Masking algorithms:
      • Fast algorithms or custom ones from Professional Services. 
      • Mask only key columns as each masking operation adds to the degradation of the performance.
    • Rule Set:
      • Select an index for row lookup (for Oracle and DB2 use rowid).
      • Be mindful of long nested Custom SQL statement and filters.
    • Job configuration:
      • Masking method: In Place
      • Batch Update - selected.
      • Bulk Data - not selected. 
      • Drop Indexes - selected or do this manually.
      • Disable Triggers and Constraints - selected.

Note: Min, Max Memory, Commit Size and Feedback Size have not been selected above. Memory should only be defined to load the lookup values (more is NOT always faster), Commit Size is likely optimal or near optimal using 10,000 (default value), and Feedback Size is not affecting performance - it sets the logging frequency. 

  • Job Configuration and parallelism: 
    • Update Threads - important: 
      • Defines the number of Update Statements running at the same time.
      • The database and the masking engine need to be able to handle the extra load.
      • Verify what works (2, 3, 4...).
      • Larger is not always the fastest - 1 can sometimes be the fastest. 
    • No. of Streams - important:
      • Defines the max number of Tables being masked at the same time.
      • The database and the masking engine need to be able to handle the extra load.
      • Verify what works (2, 3, 4...).
      • Larger is not always the fastest - 1 can sometimes be the fastest. 
    • Multiple Jobs
      • The job can be broken down into multiple jobs.
  • Database Server  - raw power
    • Check with DBA for optimal settings for
      • Storage performance
      • RAM, CPU and Cache.
    • On Oracle: check AWR report for tuning.
    • For deep dive - run one or more SQL traces.
  • Network Performance
    • Latency - minimize for small and large packets.
      • Use ping from database server to the masking server.
    • Hops - minimize the number of hops to the server. 
      • Use traceroute from database server to the masking server to check. 
    • Throughput - Use Delphix Engine CLI throughput test.
  • Database Configurations
    • Turn off database logging.
    • For Oracle:
      • Provision with 3 x 5GB online redo logs (minimum) to avoid pause when transaction logs wraparound
      • Provision in NOARCHIVELOG mode to reduce transaction log IO. 
      • Always check initialization parameters inherited from the parent, remove any expensive checks or irrelevant settings:
        • DB_BLOCK_CHECKSUM default is TYPICAL (default)
        • DB_BLOCK_CHECKING default is OFF (default)
        • DB_LOST_WRITE_PROTECT default is NONE (default)
        • DB_ULTRA_SAFE default value is OFF (default)
        • PARALLEL_MAX_SERVERS to default, which is PARALLEL_THREADS_PER_CPU * CPU_COUNT * concurrent_parallel_users * 5
        • PARALLEL_EXECUTION_MESSAGE_SIZE to maximum (32768)
        • Use non-blocking commits for Masking: 

Links to related documentation  

Best Practice articles

This article is an article in a series of articles. 

  1. Best Practice: Rule Set and Inventory settings.
  2. Best Practice: Algorithm usage and configurations.
  3. Best Practice: Job configuration settings on the Masking Engine.
  4. Best Practice: Masking Performance (database).
  5. Best Practice: Masking algorithm optimization. 

When errors are encountered in the execution of the job, please look at "Troubleshooting Masking Jobs" and "Troubleshooting Profiling Jobs".

Other Documents

Useful external links: