Skip to main content
Delphix

KBA1307 Best Practice: Profiling and Masking Job Configuration

 

This article defines the best practice for initial configuration of masking and profiling jobs.

Document Scope

This article has been created to address the optimal initial configuration. Once the job has been confirmed working, these can be tweaked to increase performance. 

Detailed here are: 

  • Profiling Job configuration.
  • Masking Job configuration - Database. 
  • Masking Job configuration - File. 

Profiling Job Configuration 

There are two different types of Profiling:

  1. Column Level; and
  2. Data Level.
Column Level

When Profiling using Column Level only (default) there is no need to change the job configurations. 

Best Practice for Initial Profiling Job Configuration: 

  • No. of Streams: 1
  • Job Memory:
    • Min:  blank (default, 1024 MB)
    • Max: blank (default, 1024 MB)
Data Level 

When using Data Level profiling there can be many thousands of columns to profile. The best practice initial configuration is to start with 1 stream.

Best Practice for Initial Profiling Job Configuration: 

  • No. of Streams: 1
  • Job Memory:
    • Min:  2048 MB
    • Max: 4056 MB

 

 

Note

Note:

'Min' needs to be less that 'Max' and it defines the initial amount of memory allocation to the masking job.

If Java fails to allocate this memory it will either report an error message, crash the masking job or could see the masking stack being restarted.

Masking Job Configuration

There are two different types of masking jobs - Database and File. The configurations are similar. 

Masking Job Configuration - Database 

The objective for these initial settings for masking is to have a successful execution. 
 

Best Practice for Initial Masking Job Configuration:

  • Masking Method: In Place
  • No. of Streams: 1
  • Update Threads: 1
  • Job Memory:
    • Min:  2048 MB (needs to be defined)
    • Max: 4096 MB (or larger as needed)
  • Feedback and Commit Size: leave blank (default) 
  • Drop Indexes, Disable Constraints and Triggers:
    • Optional - Set these as it will improve performance and Indexes, Constraints and Triggers are a common cause for issues. 

Note

Note:

Key parameters are: No. of Streams, Feedback Size and Max Memory.

Note

Note:

'Min' needs to be less that 'Max' and it defines the initial amount of memory allocation to the masking job.

If Java fails to allocate this memory it will either report an error message, crash the masking job or could see the masking stack being restarted.

Masking Job Configuration - File

The objective for these initial settings for masking is to have a successful execution. 
 

Best Practice for Initial Masking Job Configuration:

  • Masking Method: On The Fly
  • No. of Streams: 1
  • Memory:
    • Min:  2048 MB (needs to be defined)
    • Max: 4096 MB (or larger as needed)
  • Feedback: leave blank (default) 

 

Note

Note:

Key parameters are: No. of Streams, Feedback Size and Max Memory.

Note

Note:

'Min' needs to be less that 'Max' and it defines the initial amount of memory allocation to the masking job.

If Java fails to allocate this memory it will either report an error message, crash the masking job or could see the masking stack being restarted.

Key Configuration Parameters 

Feedback Size

The Feedback Size defines how frequent logs are written to the log files. These values are guidelines and one way to determine the size is that the logs should preferably fit into one log file. 

Database Size Max number of Records in a table Feedback Size value
Small to medium Up to 10,000,000  50,000 (default)
Large Up to 500,000,000 500,000
Very large Over  500,000,000 5,000,000

Max Memory (xmx)

The amount of memory needed is defined by the number of columns masked and the masking algorithms used. The size, the number of records, has no impact on the memory requirements as the data is processed in a stream.

When would the Max need to be increased?

When there are: 

  • A large number of columns per tables masked - more than 10.
  • A large number of lookup values in Secure Lookup algorithm.
  • A large number of lookup values in Mapping algorithm.
  • Complex Segmented Mapping algorithms.
  • Complex custom mapplets.

Diagnosis - Error Signatures 

Signs that the engine is not configured correctly are: 

  • Masking Engine (dmsuite/debug) logs contains:
    • "OutOfMemoryError" such as "OutOfMemoryError: GC overhead limit exceeded".
    • "ERROR: insert or update on table 'xxx' violates foreign key constraint...".
  • The Masking/Profiling Job exit with incorrect status.
  • Hanging Jobs - the status stays showing Running.
  • The Masking Engine restarts.

Related Articles

The following articles may provide more information or related information to this article: