Skip to main content
Delphix

Best Practice: Job Configuration on the Masking Engine (KBA1307)

 

 

This article defines the best practice for:

  • initial configuration of masking and profiling jobs
  • general engine configuration

Job and Engine Configuration 

Detailed here are: 

  • Profiling Job configuration
  • Masking Job configuration 
  • Masking Engine configuration 

Background Information  

Originally the Masking Engine was usually located on the same server as the database and as such the requirements and limiting factors were different.  Today, we are seeing more remote connections and larger databases. As a result, the requirements for configuration have changed.

This article has been created to address the optimal initial configuration. Once the job has been confirmed working, these can be tweaked to increase performance. 

Profiling Job Configuration  

There are two different types of Profiling:

  • Column Level
  • Data Level.

The type is defined in the Profiling Set.

The Column Level profiling uses RegEx on the Column name in the Inventory. The Data Level profiling uses RegEx on a sample set of the data in the column. If there is no match using the Column Level and there is a Data Level defined, a profile job is started for that column. 

When Profiling using Column Level only (default) there is no need to change the job configurations. 

When using Data Level profiling there can be many thousands of columns to profile. The best practice initial configuration is to start with 1 stream: 

  • Streams: 1
  • Memory (JVM xms and xmx)
    • Min:  2048 MB
    • Max: 4056 MB

Masking Job Configuration 

There are two different types of masking jobs: Database and File. Note that the configurations are fairly similar. The objective for these initial settings is to have a successful execution.

Job Configuration: Database  

Best practice for initial masking job configuration: 

  • Masking Method: In-Place
  • No. of Streams: 1
  • Update Threads: 1
  • Memory (JVM xms and xmx)
    • Min:  2048 MB (needs to be defined)
    • Max: 4096 MB (or larger as needed)
  • Feedback and Commit Size: leave blank (default) 
  • Drop IndexesDisable Constraints, and Triggers:
    • Optional - Set these as it will improve performance and Indexes, Constraints and Triggers are a common cause for issues.
Note

Note:

Key parameters are: No. of StreamsFeedback Size and Max Memory.

'Min' needs to be less that 'Max' and it defines the initial amount of memory allocation to the masking job.

If Java fails to allocate this memory it will either report an error message, crash the masking job or could see the masking stack being restarted.  

 

Job Configuration: File 

Best practice for initial masking job configuration: 

  • Masking Method: On-the-fly
  • No. of Streams: 1
  • Memory 
    • Min:  2048 MB (needs to be defined)
    • Max: 4096 MB (or larger as needed)
  • Feedback Size: leave blank (default)
Feedback Size 

The Feedback Size defines how frequent logs are written to the log files. These values are guidelines and one way to determine the size is that the logs should preferably fit into one log file. 

Database Size Max number of Records in a table Feedback Size value
Small to medium Up to 10,000,000  50,000 (default)
Large Up to 500,000,000 500,000
Very large Over  500,000,000 5,000,000
Max Memory (xmx) 

The amount of memory needed is defined by the number of columns masked and the masking algorithms used. The size, the number of records, has no impact on the memory requirements as the data is processed in a stream.

When would the Max need to be increased?

  • A large number of columns per tables masked - more than 10.
  • A large number of lookup values in Secure Lookup algorithm.
  • A large number of lookup values in Mapping algorithm.
  • Complex Segmented Mapping algorithms.
  • Complex custom mapplets.

Troubleshooting Diagnosis (Error Signatures)

Some signs that the engine is not configured correctly are: 

  • Masking Engine (dmsuite/debug) logs contains:
    • "OutOfMemoryError" such as "OutOfMemoryError: GC overhead limit exceeded".
    • "ERROR: insert or update on table 'xxx' violates foreign key constraint...".
  • The Masking/Profiling Job exits with incorrect status.
  • Hanging Jobs where the status shows "Running".
  • The Masking Engine restarts.

Related Articles