Masking Commit Size, Row Limit and Feedback Size (KBA6387)
KBA
KBA# 6739At a Glance
Available in versions | Commit Size, Feedback Size: All versions Row Limit: From 6.0.4.0 forward |
---|---|
Commit Size and Batch Update | Commit Size and Batch Update (DB only) The Commit Size sets the number of records between Commits (in each Transaction). The Batch Update is a JDBC feature and enables each Transaction to be sent in a 'batch'. This will minimize the number of packets required to be transferred over the network to the DB.
|
Row Limit | Row Limit Sets the maximum number of records that will be in the masking engine for that object at any specific time. This will manage the amount of memory used by the job.
|
Feedback Size | Feedback Size This has no impact on performance. It will set how many rows are processed for each step before writing an entry into the logs.
Too many log entries can cause the engine to run out of storage space. On large tables/jobs adjust this value. |
More Info | For more information: |
Applicable Delphix Versions
- Click here to view the versions of the Delphix engine to which this article applies
-
Major Release All Sub Releases 6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.8.1, 6.0.9.0, 6.0.10.0, 6.0.10.1, 6.0.11.0 5.3
5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0, 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0 5.2
5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1
5.1
5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0
5.0
5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1, 5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4
Overview
This article details three Masking Job properties:
- Commit Size - the number of rows/records in each batch before commits.
- Row Limit - the number of max rows/records 'in flight' for each masked object.
- Feedback Size - sets the number of rows processed before writing a feedback entry into the logs.
Commit Size
The Commit Size determines how many rows there are in an Update or Insert batch before it is committed. To use the JDBC batch feature, Batch Update needs to be enabled.
The Batch Update is a JDBC feature and enables each Transaction to be sent in one batch over the network, minimizing the number of packets required to transfer the data to the database. If disabled, each record will be sent in one (or more) packet.
Default value
The default value (10,000) is for most masking jobs the optimal value and does not normally need to be changed. For some versions, the value will be blank which is the same as 10,000.
Impact on performance
The graph below shows the impact of Commit Size on Performance and what is the Sweet Spot.
The Sweet Spot is 10,000 and it can be safely reduced without a larger negative impact on performance (between 500 to 50,000 is normally safe). If the Commit Size is increased, usage of resources (such as Transaction Log on SQL Server) increases, as does the chance of job errors such as deadlock.
Illustration
The best way to illustrate the Commit Size and the Row Limit is to show the masking job's interaction with a database. The illustration below is a simplified view and shows a masking job in progress.
Databases store records in blocks or pages (there can be other terms), these are illustrated by the rectangles. The white ones contain records that are not yet masked. The grey ones are records that are read but not yet updated. These are buffered in the masking engine and the max number of records that can be in the engine is (from version 6.0.4 forward) controlled by the Row Limit feature.
The Batch Update and Commit Size controls the Update (IP) (Insert for OTF). The Batch Update enables the JDBC to send the data in a batch and the Commit Size defines how many rows are in each batch before it is committed.
Effects
The effects of changing Commit Size (as also seen above):
- Smaller Commit Size:
- Reduces the number of resources required on the database.
- Degrades the performance.
- Reduces the chance of errors (such as deadlocks).
- Larger Commit Size:
- Increases the number of resources required in the database.
- No or little impact on performance.
- To eliminate Unique Constraints if this can not be dropped/disabled.
When to change?
This is a guide to when this value might need to be adjusted:
- Resources fill up - reduce Commit Size:
- MS SQL: the 'logfile' size will determine how many updates/inserts can be held before a commit.
- Oracle: the undo segment will fill up.
- Other databases have different requirements and limitations.
- Reduce the chance of deadlocks - reduce Commit Size.
- Resolve Unique Constraint error - set Commit Size to the same size as the number of rows in the table.
- If the constraint cannot be dropped/disabled this might be the only option.
- If the constraint cannot be dropped/disabled this might be the only option.
- Performance - a smaller or larger value might improve performance.
- Test and plot to determine optimal value.
Row Limit
The Row Limit is a new feature (available from 6.0.4.0 forward). This value sets the max number of records there can be in the masking engine for a specific masked object at any given moment.
The best Row Limit value depends on:
- Memory usage requirements (row data size in bytes).
- Masking job Type:
- File / Main Frame / DB
- In-Place / On-The-Fly
The Row Limit should not be confused with the amount of data being fetched by the Input Step. This is managed by the JDBC.
Default value
For most masking jobs the default value is optimal and does not need to be changed. The default value is blank (20,000).
Impact on Performance
The performance will be significantly affected if the Row Limit is set too low. This is because each step will wait 1 ms for more records and the number of records is limited by the Row Limit.
Effects
The effects of changing Row Limit:
- Smaller Row Limit:
- Reduces the amount of data in the engine per masked object.
- This reduces the chance of Out of Memory.
- Could also reduce the chance of errors (such as deadlocks).
- Too small (less than aprox. 2,000) will drop the performance.
- Larger Row Limit:
- Increase the chance of Out of Memory.
- Increase in performance (usually no improvement after 5,000)
When to change?
This is a guide to when this value might need to be adjusted:
- Masking Out Of Memory errors - reduce the Row Limit.
- This is especially important on Large file masking jobs where each row has a large number of characters.
- Large OTF DB jobs might also benefit from a reduced Row Limit.
- It might be better to look at the size of each row - does all that data need to be transferred?
- Start dropping the value by 10 times.
- Performance Investigation - see below for more information for specific investigation settings.
- Locks and Deadlocks:
- Note: for MS SQL, check KBA6319 (see below for link).
- A lower value could potentially reduce the chances of locks.
Feedback Size
Default value
For most masking jobs the default value is optimal and does not need to be changed. The default value is blank (50,000).
When to change?
The Feedback Size should be adjusted in these scenarios:
- If the number of masked rows is large, then increase the Feedback Size to 500,000 or even 5,000,000.
- Performance testing - when specifically collecting data for performance tuning - set this value to 10,000.
- The job execution should be around 5 min (or too many logs are created).
- Make sure to set the Feedback Size back to the original value.
The Feedback Size defines how frequently logs are written to the log files. These values are a guideline and one way to determine the size is that the logs should preferably fit into one log file.
Database Size | Number of Records | Feedback Size |
Performance Test | - | 10,000 |
Small to medium | ~ 5,000,000 | 50,000 (default) |
Large | Up to 500,000,000 | 500,000 |
Very large | Up to 5,000,000,000 | 5,000,000 |
Super large | Over 5,000,000,000 | 50,000,000 |
Effects
This value does NOT affect performance. It is only related to how frequent logs are written to the logs. Values set too low will cause a large number of logs to be collected, which could ultimately affect the masking engine (in worst-case scenarios crash the engine).
Issues
Known issues are:
- Row Limit (Fetch Size) set too low on Oracle causing performance issues on low latency networks is fixed in 6.0.3.0.
- Feedback Size cause logs to be too large and slow start of masking jobs is fixed in 6.0.2.0. Feedback Size still needs to be correctly set.
Resolution
Most solutions to any issue related to these configurations are usually remedied by setting the value to its default.