Skip to main content
Delphix

Masking Engine and Job Memory, Size, and Errors (KBA1036)

 

 

KBA

KBA#1036

At a Glance

Description This article offers guidance on how much memory to allocate to a Masking Engine and a Masking Job.
Engine Memory The Engine Memory should be based on the requirements of your specific platform.
  Size The documented requirement is:
  • Recommended: 128 GB vRAM  
  • Minimum: 64 GB vRAM 

Memory needs to be reserved.

  Calculate If the Engine Memory needs to be calculated, use this formula: 
  • Engine Memory = (6 GB + MaxTotJobMem) / 75%
     
  • MaxTotJobMem (estimated) = Max Job Mem * Number of Simultaneous Jobs
Job Memory The Job Memory is configured in the job and by default (min, max) is 1 GB. Multiple Streams, large algorithms, and large data sets (per record) might need more memory. 
  Default Most masking jobs will run with:
  • Min = Max = 1 GB 

Too much Job Memory can result in "OutOfMemory ... GC overhead limit exceeded". 

  Optimal Mem How much Job Memory is used can be monitored from the logs: The sweet spot for Java memory management is around 15% to 25% Heap. Most jobs run with 0%. 
  • Use the logs to check Heap usage (UI: Admin > Logs).
  • Look for 'JOB_ID_... JobMemoryManager ... Heap nn%'.
  Tuning If the Heap is 0%:
  •  Reduce (half) Job Memory.

If the Heap is more than 25%.

  • Increase (double) Job Memory.
  • or; Reduce (half) Row Limit.
  Performance  More memory will not make the job run faster. 

If the Heap % is reaching up towards 100% this will cause the job to slow down and could crash.

The best is to size it right.
More info For more information: 

For requirements (lookup applicable platform): 

How much memory?

The memory requirements have been greatly improved by Job Features (such as Row Limit and Memory Guardrails) and newly updated algorithms, which use much less memory.

This article has been updated to reflect memory usage from version 6.0.12.0 onwards but will also cover usage on earlier versions. 

Masking Engine

The requirements differ slightly from hosting platform to platform. In general, from version 6.0.11.0, the engine requirements are detailed as:

  • 128 or higher GB vRAM (recommended).
  • 64 GB vRAM (minimum).

 

important

Important:

It is a requirement that the memory is Reserved. For some hosting platforms, the hosting subsystem might kill the host if the host/container needs more memory than is available.

 

Calculate required memory

To calculate how much memory is needed:

  • Engine Memory = (6 GB + MaxTotJobMem) / 75%
     
  • MaxTotJobMem (estimated) = Max Job Mem * Number of Simultaneous Jobs
Example

First estimate MaxTotJobMem:

  • MaxTotJobMem = 10 GB (max job mem) * 7 (number of jobs running at the same time)
  • MaxTotJobMem = 70 GB

Then calculate the required Engine Memory:

  • Engine Memory = (6 GB + 70 GB) / 75%
  • Engine Memory = 76 GB / 75%
  • Engine Memory = 101 GB

To run 7 jobs at the same time with Job Mem set to 10 Gb, you need 101 GB of Engine Memory.

Calculate available job memory

To calculate how much memory is available to run jobs:

  • Max_Avail_JobMem = Engine Memory * 75% - 6GB
Example

Then calculate the required Max_Avail_JobMem:

  • Engine Memory = 128 GB (engine memory) * 75% - 6 GB
  • Engine Memory = 96 GB - 6 GB
  • Engine Memory = 90 GB 

There is 90 GB to share between all jobs running at the same time.

Masking Jobs

The amount of memory needed depends on:

  • the number of records processed (managed by Row Limit)
  • the size of the data in each record (OTF will require more as data that is not masked is included in each record)
  • algorithms used

With the new Algorithm Framework (v2) (fully transitioned from 6.0.15) the memory requirements have been reduced significantly. 

Guidelines

Most jobs (using multiple algorithms and streams) run safely using:

  • Min = Max = 1 GB.
  • RowLimit = 20,000

Job memory factors

  • Job Configuration:
    • RowLimit - is a memory feature and sets the max number of rows to buffer in the engine. The safe range is 5,000 to 100,000. Reducing the RowLimit will lower the memory used. 
    • Streams - 'streams' indicates how many tables/files to mask at the same time. This feature shares the Job Memory. Using a large number of 'Streams' will require more memory.
       
  • Masked data - the data masked will be loaded into memory (see RowLimit above). More memory is needed for large data.
    • In-Place (IP): The size is the size of the data in masked columns + key (PK, LK, or other).
    • On-The-Fly (OTF) and File Masking: The size of all the data in a record.
       
    • Masking a simple field that is on average 10 characters compared to a text object that is 8,000 or more characters is a huge difference. Note that this can be managed by casting large values and making them shorter. 
       
  • Algorithms:
    • Algorithm Lookup values - these are loaded in memory and large files will require additional memory (these are loaded per column masked).
    • The number of algorithms used at the same time (if around 10 algorithms then you might need to double the memory). 
    • Custom Algorithms - please check memory requirements with the author.
       
Note

Notes:

  • Job memory Min (xms) allocates the startup memory.  
  • Job memory Max (xmx) reserves the upper virtual address space.
  • Setting Min = Max will allocate Max memory when the job is starting and could improve performance slightly. 
  • Setting Min = Max / 2 will enable the job to start with less memory and expand to Max

    The Max value is always used to simplify calculations and to define the safe upper limit.

 

Profiling Jobs

There are different types of Profiling Expressions and when Data Level Profiling is used this requires memory as profiled values need to be loaded into memory to be profiled. 

Guidelines

Since a sample number of rows from all columns/fields needs to be loaded I would recommend  Most jobs (using a few algorithms and streams) run safely using:

  • Min = Max = 1 GB.
  • RowLimit = 20,000

Job memory factors

  • Job Configuration:
    • Streams - 'streams' indicates how many tables/files to mask at the same time - sharing the memory. Using a large number of 'Streams' will require more memory.
       
  • Profiling Sets:
    • The number of Data Level Profiling Expressions defined in a Profile Set will have an impact on memory. 

Troubleshooting Out of Memory Error

To find out if a job needs more (or less) memory, the best place to look is in the Application Log (UI: Admin > Logs).

  • The Job will report periodically how much Heap has been used.
    • This value is usually 0% 
  • At the end of the job the Heap usage is specified in bytes together with how much was allocated. The log entry also details how many Garbage Collections (GC) were done.
    • On a long job, the allocated memory is usually Max Memory (as Java is using all free space (another reason to not allocate too much memory)).

Log Examples

While a job is running:

  • The number of log entries depends on Feedback Size.
  • In this case, 0% is of the Job Memory is used.
JobMemoryManager: Pause 0/5ms (0%) Heap 0%

End of execution:

  • In this example, the job used 36,171,272 bytes (34 MB) and the 8232370176 byes allocated (7.6 GB).
  • This means only 0.4% of the memory was used (the optimal range is 15% to 25%).
  • Min and Max on this job would be better set to 1 GB (the lowest value possible).
     
  • There was 1 Garbage Collection (GC is only a problem if this number is much larger)
JobMemoryManager: Total Pause 0/316934ms (0%) Heap 36171272b of 8232370176b (0%) GC Count 1

Errors  

Masking or Profiling job fails with the error messages:

  • Masking Engine: "java.lang.OutOfMemoryError: GC overhead limit exceeded".
  • Oracle message:  "Thrown when the Java Virtual Machine cannot allocate an object because it is out of memory and no more memory could be made available by the garbage collector.".

OutOfMemoryError objects may be constructed by the virtual machine as if suppression were disabled and/or the stack trace was not writable. 

How to change (UI)

To change the memory and how memory is used there are two sections in the Job Configuration.

  • Min and Max Memory:
    • Configure how much memory is allocated to run the job. 
  • Streams:
    • This affects how memory is used.
      • Streams number - how many tables at the same time. Double the number of tables will require double amount of memory.
      • RowLimit - how many records will be buffered in the masking engine. Can be decreased to 1,000 if needed.

 

Note

Note:

The Job Monitor always checks how much memory is used. If not enough the job will not be able to start.