Skip to main content

KBA1307 Best Practice: Job Configuration on the Masking Engine


This article defines the best practice for initial configuration of masking and profiling jobs and for configuration of the engine in general.

Job and Engine Configuration

Detailed here are: 

  • Profiling Job configuration
  • Masking Job configuration 
  • Masking Engine configuration 
Background Information 

Originally the Masking Engine was usually located on the same server as the database and as such the requirements and limiting factors were different.  Today, we are seeing more remote connections and larger databases.

As a result, the requirements for configuration have changed. This article has been created to address the optimal initial configuration. Once the job has been confirmed working, these can be tweaked to increase performance. 

Profiling Job configuration 

There are two different types of Profiling:

  1. Column Level; and
  2. Data Level.

The type is defined in the Profiling Set.

The Column Level profiling uses RegEx on the Column name in the Inventory. The Data Level profiling uses RegEx on a sample set of the data in the column. If there is no match using the Column Level and there is a Data Level defined, a profile job is started for that column.  

When Profiling using Column Level only (default) there is no need to change the job configurations. 

When using Data Level profiling there can be many thousands of columns to profile. The best practice initial configuration is to start with 1 stream: 

  • Streams: 1
  • Memory (JVM xms and xmx)
    • Min:  2048 MB
    • Max: 4056 MB

 Masking Job configuration

There are two different types of masking jobs - Database and File. The configurations are fairly similar. 

Job configuration - Database 

The recommended initial settings for a database masking jobs are: 

  • No. of Streams: 1
  • Update Threads: 1
  • Memory (JVM xms and xmx)
    • Min:  2048 MB (needs to be defined)
    • Max: 4096 MB (or larger as needed)
  • Feedback Size: see table below
  • Commit Size: leave blank (default)

Job configuration - File

The recommended initial settings for a file masking job are: 

  • No. of Streams: 1
  • Memory (JVM xms and xmx )
    • Min:  2048 MB (needs to be defined)
    • Max: 4096 MB (or larger as needed)
  • Feedback Size: see table below
Feedback Size

The Feedback Size defines how frequent logs are written to the log files. These values are guidelines and one way to determine the size is that the logs should preferably fit into one log file. 

Database Size Max number of Records in a table Feedback Size value
Small to medium Up to 10,000,000  50,000 (default)
Large Up to 500,000,000 500,000
Very large Over  500,000,000 5,000,000
Max Memory (xmx)

The amount of memory needed is defined by the number of columns masked and the masking algorithms used. The size, the number of records, has no impact on the memory requirements as the data is processed in a stream.

When would the Max need to be increased?

  • A large number of columns per tables masked - more than 10.
  • A large number of lookup values in Secure Lookup algorithm.
  • A large number of lookup values in Mapping algorithm.
  • Complex Segmented Mapping algorithms.
  • Complex custom mapplets.

Masking Job Editing Safety

Masking Engine configuration

Details about requirements can be found here:

Memory Configuration

Remember to look at the above requirements documentation. Some memory needs to be reserved for the host. 

The minimum memory configuration for the Masking Engine is 16 GB, however, more memory could be needed.

How much memory is needed depends on:

  • The OS (2 GB)
  • The Masking Engine (ME)  (default: 1 GB)
  • For room to move (1 GB)
  • Each job executed at the same time (see section above for size).

This defines the memory used and this memory can never exceed the amount of memory available. If it exceeds, the java process will send an abort which will restart the complete stack.

The amount of RAM needed is, therefore: OS + ME + Extra + Sum(Max for each Job running).

Example: 2 GB + 1 GB + 1 GB + (4 GB + 4 GB + 8 GB) = 20 GB

CPU Configuration

The minimum requirements is: 8 CPUs.

Diagnosis - Error Signatures 

Signs that the engine is misconfigured are: 

  • Masking Engine (dmsuite/debug) logs contains:
    • "null - [STRING]" in the masking/profiling job execution.
    • "OutOfMemoryError" such as "OutOfMemoryError: GC overhead limit exceeded".
    • "ERROR: insert or update on table 'xxx' violates foreign key constraint...".
  • The Masking/Profiling Job exit with incorrect status.
  • Hanging Jobs - the status stays showing Running.
  • The Masking Engine restarts.