Skip to main content
Delphix

Masking Commit Size, Row Limit and Feedback Size (KBA6387)

 

 

KBA

KBA# 6739

At a Glance  

Available in versions Commit Size, Feedback Size: All versions
Row Limit: From 6.0.4.0 forward
Commit Size and Batch Update Commit Size and Batch Update (DB only)

The Commit Size sets the number of records between Commits (in each Transaction).

The Batch Update is a JDBC feature and enables each Transaction to be sent in a 'batch'. This will minimize the number of packets required to be transferred over the network to the DB. 
  • Commit Size:
    • Default: 10,000
  • Batch Update:
    • Default: Enabled
    • Needs to be enabled for best performance.
Row Limit Row Limit

Sets the max number of records that will be in the masking engine for that object at any specific time. This will manage the amount of memory used by the job. 
  • Default: 20,000
Feedback Size Feedback Size

This has no impact on performance. It will set how many rows are processed for each step before writing an entry into the logs.
  • Default: 50,000
    • Reduce log size: Set to 500,000 or for even larger jobs 5,000,000 (a good estimate is: max rows/500).
    • Analyzing performance: 
      • Set Feedback = Commit Size and run for around 5 min (or at least 1 M rows).
      • Then cancel the job. Don't forget to reset the Feedback Size to the larger value).

Too many log entries can cause the engine to run out of storage space. On large tables/jobs adjust this value.

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases
6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.8.1, 6.0.9.0, 6.0.10.0, 6.0.10.1, 6.0.11.0

5.3

5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0, 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

5.2

5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1

5.1

5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0

5.0

5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1, 5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

Overview

This article details three Masking Job properties:

  • Commit Size - the number of rows/records in each batch before commits.
  • Row Limit - the number of max rows/records 'in flight' for each masked object. 
  • Feedback Size - sets the number of rows processed before writing a feedback entry into the logs.

Commit Size

The Commit Size determines how many rows there are in an Update or Insert batch before it is committed. To use the JDBC batch feature, Batch Update needs to be enabled. 

The Batch Update is a JDBC feature and enables each Transaction to be sent in one batch over the network, minimizing the number of packets required to transfer the data to the database. If disabled, each record will be sent in one (or more) packet. 

Default value

The default value (10,000) is for most masking jobs the optimal value and does not normally need to be changed. For some versions, the value will be blank which is the same as 10,000. 

Impact on performance 

The graph below shows the impact of Commit Size on Performance and what is the Sweet Spot.  

The Sweet Spot is 10,000 and it can be safely reduced without a larger negative impact on performance (between 500 to 50,000 is normally safe). If the Commit Size is increased, usage of resources (such as Transaction Log on SQL Server) increases, as does the chance of job errors such as deadlock. 

KBA6387 - Commit Size vs Performance.png

 

Note

Notes:

  • The Transaction Log size is linear - since the Y-axis is linear and the X-axis is logarithmical the graph will show this spike at the end.
  • For the Performance (rpm) - the performance when Commit Size is 1 is in this example 42 times slower than peak performance. 

 

Illustration 

The best way to illustrate the Commit Size and the Row Limit is to show the masking job's interaction with a database. The illustration below is a simplified view and shows a masking job in progress.

Databases store records in blocks or pages (there can be other terms), these are illustrated by the rectangles. The white ones contain records that are not yet masked. The grey ones are records that are read but not yet updated. These are buffered in the masking engine and the max number of records that can be in the engine is (from version 6.0.4 forward) controlled by the Row Limit feature.

The Batch Update and Commit Size controls the Update (IP) (Insert for OTF). The Batch Update enables the JDBC to send the data in a batch and the Commit Size defines how many rows are in each batch before it is committed. 

KBA6387 - Commit Size and Row Limit.png

Effects

The effects of changing Commit Size (as also seen above):

  • Smaller Commit Size:
    • Reduces the number of resources required on the database. 
    • Degrades the performance.
    • Reduces the chance of errors (such as deadlocks).
  • Larger Commit Size:
    • Increases the number of resources required in the database.
    • No or little impact on performance. 
    • To eliminate Unique Constraints if this can not be dropped/disabled. 

When to change?

This is a guide to when this value might need to be adjusted: 

  • Resources fill up - reduce Commit Size:
    • MS SQL: the 'logfile' size will determine how many updates/inserts can be held before a commit. 
    • Oracle: the undo segment will fill up. 
    • Other databases have different requirements and limitations. 
       
  • Reduce the chance of deadlocks - reduce Commit Size. 
     
  • Resolve Unique Constraint error - set Commit Size to the same size as the number of rows in the table. 
    • If the constraint cannot be dropped/disabled this might be the only option. 
       
  • Performance - a smaller or larger value might improve performance.
    • Test and plot to determine optimal value.

Row Limit

The Row Limit is a new feature (available from 6.0.4.0 forward). This value sets the max number of records there can be in the masking engine for a specific masked object at any given moment. 

The best Row Limit value depends on:

  • Memory usage requirements (row data size in bytes).
  • Masking job Type:
    • File / Main Frame / DB
    • In-Place / On-The-Fly
tip

Technical
Info:

There is a second upper limit on the engine which is 10,000 rows per step in the masking job. This value is not configurable and hence the introduction of Row Limit. It is implemented in the Input step and it checks with the Output step and never exceeds the number of rows in the engine as defined by the Row Limit. 

The Row Limit should not be confused with the amount of data being fetched by the Input Step. This is managed by the JDBC. 

Default value

For most masking jobs the default value is optimal and does not need to be changed. The default value is blank (20,000). 

Effects

The effects of changing Row Limit:

  • Smaller Row Limit:
    • Reduces the amount of data in the engine per masked object. 
    • This reduces the chance of Out of Memory. 
    • Could also reduce the chance of errors (such as deadlocks).
  • Larger Row Limit:
    • Could increase the chance of Out of Memory.
    • Could increase performance. 

When to change?

This is a guide to when this value might need to be adjusted:

  • Masking Out Of Memory errors - reduce the Row Limit.
    • This is especially important on Large file masking jobs where each row has a large number of characters.
    • Large OTF DB jobs might also benefit from a reduced Row Limit.
    • It might be better to look at the size of each row - does all that data need to be transferred?
    • Start dropping the value by 10 times. 
       
  • Performance - larger value might improve performance. 
    • If the performance is bottlenecked on SELECT.
    • The adjustment depends on the current value and the root cause. 
    • Network latency has an impact here. 
       
  • Performance Investigation - see below for more information for specific investigation settings. 
     
  • Locks and Deadlocks:
    • Note: for MS SQL, check KBA6319 (see below for link).
    • A lower value could potentially reduce the chances of locks. 

Feedback Size

Default value

For most masking jobs the default value is optimal and does not need to be changed. The default value is blank (50,000). 

When to change?

The Feedback Size should be adjusted in these scenarios: 

  • If the number of masked rows is large, then increase the Feedback Size to 500,000 or even 5,000,000.
  • Performance testing - when specifically collecting data for performance tuning - set this value to 10,000.
    • The job execution should be around 5 min (or too many logs are created).
    • Make sure to set the Feedback Size back to the original value. 

The Feedback Size defines how frequently logs are written to the log files. These values are a guideline and one way to determine the size is that the logs should preferably fit into one log file. 

Database Size Number of Records Feedback Size
Performance Test - 10,000
Small to medium ~ 5,000,000  50,000 (default)
Large Up to 500,000,000 500,000
Very large Up to  5,000,000,000 5,000,000
Super large Over 5,000,000,000 50,000,000

Effects

This value does NOT affect performance. It is only related to how frequent logs are written to the logs. Values set too low will cause a large number of logs to be collected, which could ultimately affect the masking engine (in worst-case scenarios crash the engine). 

 

Warning

Warning:

If Feedback Size is set too small on large tables. This can cause very large logs and can cause the engine to crash.

 

User Interface

Masking Job Configuration Dialog   

The Job Configuration is set when the Masking Job is either:

  • Created
  • Edited

Example

The image below is from a 6.0.4.0 engine. The fields detailed in the KBA are highlighted with their default values. These are normally blank when a job is created.  

Masking UI - Commit, Feedback, Row Limit.png

Issues

Known issues are:

  • Row Limit (Fetch Size) set too low on Oracle causing performance issues on low latency networks is fixed in 6.0.3.0.
  • Feedback Size cause logs to be too large and slow start of masking jobs is fixed in 6.0.2.0. Feedback Size still needs to be correctly set.

Resolution

Most solutions to any issue related to these configurations are usually remedied by setting the value to its default.