Skip to main content
Delphix

Best Practice: Job Configuration Settings (KBA1048)

 

Masking Configuration

The requirements for job configurations have changed over time. Originally, the Masking Engine was mostly located on the same server as the database, the configuration requirements and limiting factors were different. Today, we are seeing more remote connections and masking much larger databases.

As a result, the requirements for job and engine configuration settings have changed. This article has been created to address the optimal initial configuration.  

Job and Engine Configuration Settings 

This article defines the best practice initial configuration for masking and profiling jobs and the engine in general. For best practice performance, please refer to this article. 

Detailed here are: 

Profiling Job Configuration  

There are two different types of Profiling - Column Level and Data Level. The type is defined in the Profiling Set.

The Column Level profiling uses RegEx on the Column name in the Inventory. The Data Level profiling uses RegEx on a sample set of the data in the column. If there is no match using the Column Level and there is a Data Level defined, a profile job is started for that column.  

 When using Data Level profiling there can be many thousands of columns to profile. The best practice initial configuration is, therefore: 

  • Streams: 1
  • Memory (JVM xms and xmx)
    • Min:  1024 MB (needs to be defined)
    • Max: 2048 MB

Masking Job Configuration 

There are two different types of masking jobs - Database and File. The configurations are fairly similar. 

Job configuration - Database  

Initial setting recommendation for the database masking job is: 

  • No. of Streams: 1
  • Update Threads: 1
  • Memory (JVM xms and xmx)
    • Min:  2048 MB (needs to be defined)
    • Max: 4096 MB (or larger as needed)
  • Feedback Sizesee table below
  • Commit Size: leave blank (default)  

Job configuration - File 

The recommended initial settings for a file masking job are: 

  • No. of Streams: 1
  • Memory (JVM xms and xmx )
    • Min:  2048 MB (needs to be defined)
    • Max: 4096 MB (or larger as needed)
  • Feedback Size

Feedback Size 

The Feedback Size defines how frequent logs are written to the log files. These values are a guidelines and one way to determine the size is that the logs should preferably fit into one log file. 

Database Size Max number of Records in a table Feedback Size value
Small to medium Up to 10,000,000  50,000 (default)
Large Up to 500,000,000 500,000
Very large Over  500,000,000 5,000,000

Max Memory (xmx) 

The amount of memory needed is defined by the number of columns masked and the masking algorithms used. The size, the number of records, has no impact on the memory requirements as the data is processed in a stream.

When would the Max need to be increased?

  • A large number of columns per tables masked - more than 10.
  • A large number of lookup values in Secure Lookup algorithm.
  • A large number of lookup values in Mapping algorithm.
  • Complex Segmented Mapping algorithms.
  • Complex custom mapplets.

Masking Engine Configuration 

The configuration on the Masking Engine VM, which is set on the VMware vSphere Hypervisor (ESX). For more information she minimum requirements.

The minimum memory configuration for the Masking Engine is 16 GB, however, more memory could be needed.

How much memory is needed depends on:

  • The OS (2 GB)
  • The Masking Engine (ME)  (default: 1 GB)
  • For room to move (1 GB)
  • Each job executed at the same time (see section above for size).

This defines the memory used and this memory can never exceed the amount of memory available. If it exceeds, the java process will send an abort which will restart the complete stack.

The amount of RAM needed is, therefore: OS + ME + Extra + Sum(Max for each Job running).

Example: 2 GB + 1 GB + 1 GB + (4 GB + 4 GB + 8 GB) = 20 GB

Considerations - Misconfiguration Signs  

Some signs of engine mis-configuration are: 

  • Masking Engine (dmsuite/debug) logs contain:
    • "null - [STRING]" in the masking/profiling job execution
    • "OutOfMemoryError" such as "OutOfMemoryError: GC overhead limit exceeded"
    • "ERROR: insert or update on table 'xxx' violates foreign key constraint..."
  • The Masking/Profiling Job exit with incorrect status
  • Hanging Jobs
  • The Masking Engine restart 

 

Related Articles