Skip to main content
Delphix

KBA1561 Best Practice: Masking Performance (Database)

 

Masking Performance - Background

This article looks at masking performance and more specifically the overall time it takes to perform a masking job. The page will, therefore, go beyond looking at masking rows per minute (rpm) and include startup time as well. 

This article describes:

  • Time to complete the Masking Job
    • Startup Time - Duration 
    • Masking Time - Duration
      • Performance and number of records
        • Read Performance
        • Masking Performance 
        • Update Performance
    • Pre and Post Scripts
  • Factors that affect performance
    • Masking Engine 
    • Database

Time to complete the Masking Job

Perf_Algorithm_Loadtime rpm.png

The graph above shows examples of time to complete a masking job as a function of:

  • Startup time (here 15 min)
  • Rows per minute (rpm)
  • Number of records 

In order to finish the masking job in time, it might be needed to look at all these factors. For a small number of records, the startup time plays a significant role. For a large number of records, the rows per minute is a big factor. 

Startup Time - Duration

For most of the time, this is not large enough to affect the overall performance of a masking job. The startup time depends on the algorithm used and the number of rows in the lookup data.

Perf_Algorithm_Loadtime (SL).png 

Remedy and consideration 

Most algorithms are finely tuned and have been defined with a small set of lookup values, most are not exceeding 50,000 values, and as such, only take a few seconds to load. 

When creating a custom algorithm, consider the following:

  • Selection of algorithm (minimize lookup algorithms).
  • Select as small lookup pools as possible. 

Troubleshooting 

Masking Time - Duration

Usually, the number of records to mask is fixed and as such, we need to consider how fast we can perform the masking operation - that is the number of rows per minute (rpm). Usually, this number is specified in hundreds of thousands or millions. 

What is acceptable performance depends greatly on how many fields are being masked and how many tables at the same time.

In most cases, good performance starts around 500,000 rpm, a normal optimal performance is around 1 million rpm (or 1 Mrpm). With a very fast database server, highly optimized networks and fine-tuned masking jobs, we have seen jobs running in the excess of 8 million rpm.

Performance and number of records

How fast the masking job, in rpm, is affected by three components:

  • Read Performance
  • Masking Performance 
  • Update Performance
Read Performance

For most masking jobs the database read performance is a minor factor in the whole masking operation. The performance here is usually affected by:

  • Select statement (complex custom SQL will slow down the performance)
  • Network performance 
Masking Performance

This is usually a key part in performance investigation. The key factors here are: 

  • The number of fields masked.
  • Algorithms used (ones with fewer lookups perform better).
  • Masking engine performance.
  • Java Garbage Collection (GC).
Update Performance

Key factors here are: 

  • Network performance:
    • Latency.
    • Throughput.
  • Indexes (some are needed and others need to be removed):
    • A properly configured index for row id lookup. 
    • No indexes on masked columns.
  • Constraints and Triggers.
  • Database server performance.
  • Parallelism:
    • Update Threads.
    • Concurrent Masking Jobs. 
Pre and Post Scripts

Pre and Post scripts are also affecting performance and these will add masking time at the start of a job (pre-script) and at the end of a job (post-script).

Usually, these scripts can be tested prior to adding them to the masking job and it is important to ensure that these are as effective as possible. 

Remedy and Considerations

Masking Engine

On the masking engine, make sure that the following items are as optimized as possible:

  • Masking Server - raw power:
    • Ensure that the server has plenty of fast cores (CPUs) - it is better to have a few fast CPUs than many slow CPUs.
    • Lots of cache memory.
    • Memory.
       
  • Masking Jobs - efficiency
    • Masking algorithms:
      • Fast algorithms or custom ones from Professional Services. 
      • Mask only key columns as each masking operation adds to the degradation of the performance.
    • Rule Set:
      • Select an index for row lookup (for Oracle and DB2 use rowid).
      • Be mindful of long nested Custom SQL statement and filters.
    • Job configuration:
      • Masking method: In Place
      • Batch Update - selected.
      • Bulk Data - not selected. 
      • Drop Indexes - selected or do this manually.
      • Disable Triggers and Constraints - selected.

Note: Min, Max Memory, Commit Size and Feedback Size have not been selected above. Memory should only be defined to load the lookup values (more is NOT always faster), Commit Size is likely optimal or near optimal using 10,000 (default value), and Feedback Size is not affecting performance - it sets the logging frequency. 

  • Job Configuration and parallelism: 
    • Update Threads - important: 
      • Defines the number of Update Statements running at the same time.
      • The database and the masking engine need to be able to handle the extra load.
      • Verify what works (2, 3, 4...).
      • Larger is not always the fastest - 1 can sometimes be the fastest. 
    • No. of Streams - important:
      • Defines the max number of Tables being masked at the same time.
      • The database and the masking engine need to be able to handle the extra load.
      • Verify what works (2, 3, 4...).
      • Larger is not always the fastest - 1 can sometimes be the fastest. 
    • Multiple Jobs
      • The job can be broken down into multiple jobs.
         
  • Database Server  - raw power
    • Check with DBA for optimal settings for
      • Storage performance
      • RAM, CPU and Cache.
    • On Oracle: check AWR report for tuning.
    • For deep dive - run one or more SQL traces.
       
  • Network Performance
    • Latency - minimize for small and large packets.
      • Use ping from database server to the masking server.
    • Hops - minimize the number of hops to the server. 
      • Use traceroute from database server to the masking server to check. 
    • Throughput - Use Delphix Engine CLI throughput test.
       
  • Database Configurations
    • Turn of database logging.
    • For Oracle:
      • Provision with 3 x 5GB online redo logs (minimum) to avoid pause when transaction logs wraparound
      • Provision in NOARCHIVELOG mode to reduce transaction log IO. 
      • Always check initialization parameters inherited from the parent, remove any expensive checks or irrelevant settings:
        • DB_BLOCK_CHECKSUM default is TYPICAL (default)
        • DB_BLOCK_CHECKING default is OFF (default)
        • DB_LOST_WRITE_PROTECT default is NONE (default)
        • DB_ULTRA_SAFE default value is OFF (default)
        • PARALLEL_DEGREE_POLICY to AUTO
        • PARALLEL_MAX_SERVERS to default, which is PARALLEL_THREADS_PER_CPU * CPU_COUNT * concurrent_parallel_users * 5
        • PARALLEL_EXECUTION_MESSAGE_SIZE to maximum (32768)
        • Use non-blocking commits for Masking: 
          • COMMIT_WAIT=NOWAIT
          • COMMIT_LOGGING=BATCH

Links to related documentation  

Best Practice articles

This article is an article in a series of articles. 

  1. Best Practice: Rule Set and Inventory settings.
  2. Best Practice: Algorithm usage and configurations.
  3. Best Practice: Job configuration settings on the Masking Engine.
  4. Best Practice: Masking Performance (database).
  5. Best Practice: Masking algorithm optimization. 

When errors are encountered in the execution of the job, please look at "Troubleshooting Masking Jobs" and "Troubleshooting Profiling Jobs".

Other Documents

Useful external links: