Masking Engine and Job Memory, Size, and Errors (KBA1036)
KBA
KBA#1036At a Glance
Description | This article offers guidance on how much memory to allocate to a Masking Engine and a Masking Job. | |
---|---|---|
Engine Memory | The Engine Memory should be based on the requirements of your specific platform. | |
Size | The documented requirement is:
Memory needs to be reserved. |
|
Calculate | If the Engine Memory needs to be calculated, use this formula:
|
|
Job Memory | The Job Memory is configured in the job and by default (min, max) is 1 GB. Multiple Streams, large algorithms, and large data sets (per record) might need more memory. | |
Default | Most masking jobs will run with:
Too much Job Memory can result in "OutOfMemory ... GC overhead limit exceeded". |
|
Optimal Mem | How much Job Memory is used can be monitored from the logs: The sweet spot for Java memory management is around 15% to 25% Heap. Most jobs run with 0%.
|
|
Tuning | If the Heap is 0%:
If the Heap is more than 25%.
|
|
Performance | More memory will not make the job run faster. If the Heap % is reaching up towards 100% this will cause the job to slow down and could crash. The best is to size it right. |
|
More info | For more information:
For requirements (lookup applicable platform): |
How much memory?
The memory requirements have been greatly improved by Job Features (such as Row Limit and Memory Guardrails) and newly updated algorithms, which use much less memory.
This article has been updated to reflect memory usage from version 6.0.12.0 onwards but will also cover usage on earlier versions.
Masking Engine
The requirements differ slightly from hosting platform to platform. In general, from version 6.0.11.0, the engine requirements are detailed as:
- 128 or higher GB vRAM (recommended).
- 64 GB vRAM (minimum).
Calculate required memory
To calculate how much memory is needed:
- Engine Memory = (6 GB + MaxTotJobMem) / 75%
- MaxTotJobMem (estimated) = Max Job Mem * Number of Simultaneous Jobs
Example
First estimate MaxTotJobMem:
- MaxTotJobMem = 10 GB (max job mem) * 7 (number of jobs running at the same time)
- MaxTotJobMem = 70 GB
Then calculate the required Engine Memory:
- Engine Memory = (6 GB + 70 GB) / 75%
- Engine Memory = 76 GB / 75%
- Engine Memory = 101 GB
To run 7 jobs at the same time with Job Mem set to 10 Gb, you need 101 GB of Engine Memory.
Calculate available job memory
To calculate how much memory is available to run jobs:
- Max_Avail_JobMem = Engine Memory * 75% - 6GB
Example
Then calculate the required Max_Avail_JobMem:
- Engine Memory = 128 GB (engine memory) * 75% - 6 GB
- Engine Memory = 96 GB - 6 GB
- Engine Memory = 90 GB
There is 90 GB to share between all jobs running at the same time.
Masking Jobs
The amount of memory needed depends on:
- the number of records processed (managed by Row Limit)
- the size of the data in each record (OTF will require more as data that is not masked is included in each record)
- algorithms used
With the new Algorithm Framework (v2) (fully transitioned from 6.0.15) the memory requirements have been reduced significantly.
Guidelines
Most jobs (using multiple algorithms and streams) run safely using:
- Min = Max = 1 GB.
- RowLimit = 20,000
Job memory factors
- Job Configuration:
- RowLimit - is a memory feature and sets the max number of rows to buffer in the engine. The safe range is 5,000 to 100,000. Reducing the RowLimit will lower the memory used.
- Streams - 'streams' indicates how many tables/files to mask at the same time. This feature shares the Job Memory. Using a large number of 'Streams' will require more memory.
- Masked data - the data masked will be loaded into memory (see RowLimit above). More memory is needed for large data.
- In-Place (IP): The size is the size of the data in masked columns + key (PK, LK, or other).
- On-The-Fly (OTF) and File Masking: The size of all the data in a record.
- Masking a simple field that is on average 10 characters compared to a text object that is 8,000 or more characters is a huge difference. Note that this can be managed by casting large values and making them shorter.
- Algorithms:
- Algorithm Lookup values - these are loaded in memory and large files will require additional memory (these are loaded per column masked).
- The number of algorithms used at the same time (if around 10 algorithms then you might need to double the memory).
- Custom Algorithms - please check memory requirements with the author.
Profiling Jobs
There are different types of Profiling Expressions and when Data Level Profiling is used this requires memory as profiled values need to be loaded into memory to be profiled.
Guidelines
Since a sample number of rows from all columns/fields needs to be loaded I would recommend Most jobs (using a few algorithms and streams) run safely using:
- Min = Max = 1 GB.
- RowLimit = 20,000
Job memory factors
- Job Configuration:
- Streams - 'streams' indicates how many tables/files to mask at the same time - sharing the memory. Using a large number of 'Streams' will require more memory.
- Streams - 'streams' indicates how many tables/files to mask at the same time - sharing the memory. Using a large number of 'Streams' will require more memory.
- Profiling Sets:
- The number of Data Level Profiling Expressions defined in a Profile Set will have an impact on memory.
Troubleshooting Out of Memory Error
To find out if a job needs more (or less) memory, the best place to look is in the Application Log (UI: Admin > Logs).
- The Job will report periodically how much Heap has been used.
- This value is usually 0%
- At the end of the job the Heap usage is specified in bytes together with how much was allocated. The log entry also details how many Garbage Collections (GC) were done.
- On a long job, the allocated memory is usually Max Memory (as Java is using all free space (another reason to not allocate too much memory)).
Log Examples
While a job is running:
- The number of log entries depends on Feedback Size.
- In this case, 0% is of the Job Memory is used.
JobMemoryManager: Pause 0/5ms (0%) Heap 0%
End of execution:
- In this example, the job used 36,171,272 bytes (34 MB) and the 8232370176 byes allocated (7.6 GB).
- This means only 0.4% of the memory was used (the optimal range is 15% to 25%).
- Min and Max on this job would be better set to 1 GB (the lowest value possible).
- There was 1 Garbage Collection (GC is only a problem if this number is much larger)
JobMemoryManager: Total Pause 0/316934ms (0%) Heap 36171272b of 8232370176b (0%) GC Count 1
Errors
Masking or Profiling job fails with the error messages:
- Masking Engine: "java.lang.OutOfMemoryError: GC overhead limit exceeded".
- Oracle message: "Thrown when the Java Virtual Machine cannot allocate an object because it is out of memory and no more memory could be made available by the garbage collector.".
OutOfMemoryError objects may be constructed by the virtual machine as if suppression were disabled and/or the stack trace was not writable.
How to change (UI)
To change the memory and how memory is used there are two sections in the Job Configuration.
- Min and Max Memory:
- Configure how much memory is allocated to run the job.
- Streams:
- This affects how memory is used.
- Streams number - how many tables at the same time. Double the number of tables will require double amount of memory.
- RowLimit - how many records will be buffered in the masking engine. Can be decreased to 1,000 if needed.
- This affects how memory is used.