KBA1048 Best Practice: Job Configuration Settings for Masking
Masking Configuration - Background
The requirements for job configurations have changed over time. Originally the Masking Engine was mostly located on the same server as the database and as such the requirements and limiting factors were different. Today, we are seeing more remote connections and masking much larger databases.
As a result, the requirements for job and engine configuration settings have changed. This article has been created to address the optimal initial configuration.
Job and Engine Configuration Settings
This article defines the best practice initial configuration for masking and profiling jobs and the engine in general. For best practice performance, please refer to this article.
Detailed here are:
- Profiling Job configuration
- Masking Job configuration
- Masking Engine configuration
Profiling Job configuration
There are two different types of Profiling - Column Level and Data Level. The type is defined in the Profiling Set.
The Column Level profiling uses RegEx on the Column name in the Inventory. The Data Level profiling uses RegEx on a sample set of the data in the column. If there is no match using the Column Level and there is a Data Level defined, a profile job is started for that column.
When using Data Level profiling there can be many thousands of columns to profile. The best practice initial configuration is, therefore:
- Streams: 1
- Memory (JVM xms and xmx)
- Min: 1024 MB (needs to be defined)
- Max: 2048 MB
Masking Job configuration
There are two different types of masking jobs - Database and File. The configurations are fairly similar.
Job configuration - Database
The recommended initial settings for a database masking jobs are:
- No. of Streams: 1
- Update Threads: 1
- Memory (JVM xms and xmx)
- Min: 2048 MB (needs to be defined)
- Max: 4096 MB (or larger as needed)
- Feedback Size: see table below
- Commit Size: leave blank (default)
Job configuration - File
The recommended initial settings for a file masking job are:
- No. of Streams: 1
- Memory (JVM xms and xmx )
- Min: 2048 MB (needs to be defined)
- Max: 4096 MB (or larger as needed)
- Feedback Size: see table below
Feedback Size
The Feedback Size defines how frequent logs are written to the log files. These values are a guidelines and one way to determine the size is that the logs should preferably fit into one log file.
Database Size | Max number of Records in a table | Feedback Size value |
Small to medium | Up to 10,000,000 | 50,000 (default) |
Large | Up to 500,000,000 | 500,000 |
Very large | Over 500,000,000 | 5,000,000 |
Max Memory (xmx)
The amount of memory needed is defined by the number of columns masked and the masking algorithms used. The size, the number of records, has no impact on the memory requirements as the data is processed in a stream.
When would the Max need to be increased?
- A large number of columns per tables masked - more than 10.
- A large number of lookup values in Secure Lookup algorithm.
- A large number of lookup values in Mapping algorithm.
- Complex Segmented Mapping algorithms.
- Complex custom mapplets.
Masking Engine configuration
The configuration on the Masking Engine VM, which is set on the VMware vSphere Hypervisor (ESX). The minimum requirements can be found [here](https://docs.delphix.com/docs/archit...es-for-masking).
The minimum memory configuration for the Masking Engine is 16 GB, however, more memory could be needed.
How much memory is needed depends on:
- The OS (2 GB)
- The Masking Engine (ME) (default: 1 GB)
- For room to move (1 GB)
- Each job executed at the same time (see section above for size).
This defines the memory used and this memory can never exceed the amount of memory available. If it exceeds, the java process will send an abort which will restart the complete stack.
The amount of RAM needed is, therefore: OS + ME + Extra + Sum(Max for each Job running).
Example: 2 GB + 1 GB + 1 GB + (4 GB + 4 GB + 8 GB) = 20 GB
Links to related documentation
Useful external links:
- https://docs.delphix.com/docs/delphi...m-requirements
- https://docs.delphix.com/docs/delphi...filer-settings
- https://docs.delphix.com/docs/delphi...ties/mask-data
For Best Practice Performance:
- To be created
Considerations - Misconfiguration Signs
Signs that the engine is misconfigured are:
- Masking Engine (dmsuite/debug) logscontains:
- "null - [STRING]" in the masking/profiling job execution
- "OutOfMemoryError" such as "OutOfMemoryError: GC overhead limit exceeded"
- "ERROR: insert or update on table 'xxx' violates foreign key constraint..."
- The Masking/Profiling Job exit with incorrect status
- Hanging Jobs
- The Masking Engine restart