Skip to main content
Delphix

Best Practice: Job configuration settings on the Masking Engine

Masking Configuration - Background

The requirements for job configurations have changed over time. Originally the Masking Engine was mostly located on the same server as the database and as such the requirements and limiting factors were different.  Today, we are seeing more remote connections and masking much larger databases.

As a result, the requirements for job and engine configuration settings have changed. This article has been created to address the optimal initial configuration.  

Job and Engine Configuration Settings

This article defines the best practice initial configuration for masking and profiling jobs and the engine in general. For best practice performance, please refer to this article. 

Detailed here are: 

  • Profiling Job configuration
  • Masking Job configuration 
  • Masking Engine configuration 

Profiling Job configuration 

There are two different types of Profiling - Column Level and Data Level. The type is defined in the Profiling Set.

The Column Level profiling uses RegEx on the Column name in the Inventory. The Data Level profiling uses RegEx on a sample set of the data in the column. If there is no match using the Column Level and there is a Data Level defined, a profile job is started for that column.  

When using Data Level profiling there can be many thousands of columns to profile. The best practice initial configuration is, therefore: 

  • Streams: 1
  • Memory (JVM xms and xmx)
    • Min:  1024 MB (needs to be defined)
    • Max: 2048 MB

 Masking Job configuration

There are two different types of masking jobs - Database and File. The configurations are fairly similar. 

Job configuration - Database 

The recommended initial settings for a database masking jobs are: 

  • No. of Streams: 1
  • Update Threads: 1
  • Memory (JVM xms and xmx)
    • Min:  2048 MB (needs to be defined)
    • Max: 4096 MB (or larger as needed)
  • Feedback Size: see table below
  • Commit Size: leave blank (default)

Job configuration - File

The recommended initial settings for a file masking job are: 

  • No. of Streams: 1
  • Memory (JVM xms and xmx )
    • Min:  2048 MB (needs to be defined)
    • Max: 4096 MB (or larger as needed)
  • Feedback Size: see table below
Feedback Size

The Feedback Size defines how frequent logs are written to the log files. These values are guidelines and one way to determine the size is that the logs should preferably fit into one log file. 

Database Size Max number of Records in a table Feedback Size value
Small to medium Up to 10,000,000  50,000 (default)
Large Up to 500,000,000 500,000
Very large Over  500,000,000 5,000,000
Max Memory (xmx)

The amount of memory needed is defined by the number of columns masked and the masking algorithms used. The size, the number of records, has no impact on the memory requirements as the data is processed in a stream.

When would the Max need to be increased?

  • A large number of columns per tables masked - more than 10.
  • A large number of lookup values in Secure Lookup algorithm.
  • A large number of lookup values in Mapping algorithm.
  • Complex Segmented Mapping algorithms.
  • Complex custom mapplets.

Masking Engine configuration

The configuration on the Masking Engine VM, which is set on the VMware vSphere Hypervisor (ESX). The minimum requirements can be found [here](https://docs.delphix.com/docs/archit...es-for-masking).

The minimum memory configuration for the Masking Engine is 16 GB, however, more memory could be needed.

How much memory is needed depends on:

  • The OS (2 GB)
  • The Masking Engine (ME)  (default: 1 GB)
  • For room to move (1 GB)
  • Each job executed at the same time (see section above for size).

This defines the memory used and this memory can never exceed the amount of memory available. If it exceeds, the java process will send an abort which will restart the complete stack.

The amount of RAM needed is, therefore: OS + ME + Extra + Sum(Max for each Job running).

Example: 2 GB + 1 GB + 1 GB + (4 GB + 4 GB + 8 GB) = 20 GB

Diagnosis - Error Signatures 

Signs that the engine is misconfigured are: 

  • Masking Engine (dmsuite/debug) logs contains:
    • "null - [STRING]" in the masking/profiling job execution.
    • "OutOfMemoryError" such as "OutOfMemoryError: GC overhead limit exceeded".
    • "ERROR: insert or update on table 'xxx' violates foreign key constraint...".
  • The Masking/Profiling Job exit with incorrect status.
  • Hanging Jobs - the status stays showing Running.
  • The Masking Engine restarts.