Masking Method: On-The-Fly and In-Place (KBA1774)

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies

Major Release	All Sub Releases
6.0	6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.8.1, 6.0.9.0, 6.0.10.0, 6.0.10.1, 6.0.11.0, 6.0.12.0, 6.0.12.1
5.3	5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0, 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0
5.2	5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1
5.1	5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0
5.0	5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1, 5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

At a Glance

Description:

This article describes the two masking methods:

In-Place - IP will overwrite the data in a table or file with masked content.
On-The-Fly - OTF will read from a source and write masked content to a target.

Use Cases:

The most common use cases (Best Practice) for each method are:

IP for Database masking.
OTF for File masking.

Database:

	SQL	URI¹	Comment
IP	SELECT>>Mask > UPDATE	Needed	Generally Faster.
OTF	SELECT>>Mask > INSERT	No	Best for Heap Tables (no URI), to resolve Deadlocks, etc.
Profiling	SELECT>>Profiling	No

¹ Unique Row Identifier - The unique Update Key (can be one or multiple columns).

File:

	File Access	Comment
OTF	Read>>Mask > Write	Much faster.
IP	Read>>Mask > Write (tmp) > Read (tmp) > Write	Uses an intermediary temp file (*.msk)
Profiling	Read>>Profiling

Masked vDB:

Masked vDBs are only available on IP Database masking.

Performance:

For database masking - IP and OTF:

Always verify Indexes, Constraints, and Triggers.
Drop and Disable as required.

Read more:

Other useful sources:

In-Place

When using the In-Place method, the data is read from the Target, masked, and then Updated back to the Target.

How to create

Create a VDB or a copy of the Database to be masked. This will be the target.
Create an Environment (for example called Target). View the environment.
Create a Connector (for example called Target_CON).
Create a Rule Set.
Open the Inventory and define Masked Columns, alternatively use Profiling.
Click Overview and create a Masking Job.

Create using API

The job created above looks like this using the API.

Environment: Target
- Connector: Target_Con
- Rule Set: MY_RS
- Job: IP_MASK

1. Environments	3. Rule Set
[ { "environmentId": 1, "environmentName": "Target", "application": "App", "purpose": "MASK", "isWorkflowEnabled": false } ]	[ { "databaseRulesetId": 1, "rulesetName": "MY_RS", "databaseConnectorId": 1, "refreshDropsTables": false } ]
2. Connectors	4. Job
[ { "databaseConnectorId": 1, "connectorName": "Target_CON", "databaseType": "MSSQL", "environmentId": 1, "databaseName": "Target", "host": "10.43.97.248", "instanceName": "SQL2016", "port": 1433, "schemaName": "dbo", "username": "sa", "kerberosAuth": false } ]	[ { "maskingJobId": 1, "jobName": "IP_MASK", "rulesetId": 1, "feedbackSize": 50000, "jobDescription": "", "maxMemory": 1024, "minMemory": 1024, "multiTenant": false, "numInputStreams": 1, "onTheFlyMasking": false, "databaseMaskingOptions": { "batchUpdate": true, "bulkData": false, "commitSize": 10000, "disableConstraints": false, "dropIndexes": false, "disableTriggers": false, "numOutputThreadsPerStream": 1, "truncateTables": false }, "failImmediately": false } ]

On-The-Fly

The one main rule when configuring On-The-Fly masking is that the Source can Never be masked. The On-The-Fly Rule Set and Job are, therefore, defined against the Target.

Configuration

Requirements	Source	Target
Masked Object: Profiling - Target needs data. Masking - DB w schema needs to exist. Tokenizatiion - DB w schema needs to exist.	Object: Database or File	Object: Database or File
Masking Engine: 2 x Environments (Source and Target) 2 x Connectors (Source and Target) 1 x Rule Set and Inventory 1 x Job Configured as On-The-Fly Source Environment and Source Connector	Objects: Environment Connector	Objects: Environment Connector Rule Set Job

How to create

To mask the Target:
It is recommended to start with the target.

Create a copy of the Database to be masked
This will be the target.
Create an Environment (for example called Target).
View the environment.
Create a Connector to the Target (for example called Target_CON).
Create a Rule Set.
Open the Inventory and define Masked Columns, alt use Profiling.

To mask the Source:

Create an Environment (for example called Source).
View the environment.
Create a Connector to the Source (for example called Source_CON).

To create the masking job:

Navigate to the Target Environment.
Click Overview and create a Masking Job.
It is recommended to use a VDB as the source.

Create using API

The job created above looks like this using the API.

Environment: Target
- Connector: Target_Con
- Rule Set: MY_RS
- Job: IP_MASK
Environment: Source
- Connector: Source_CON

1. Environments	3. Rule Set
[ { "environmentId": 1, "environmentName": "Target", "application": "App", "purpose": "MASK", "isWorkflowEnabled": false }, { "environmentId": 2, "environmentName": "Source", "application": "App", "purpose": "MASK", "isWorkflowEnabled": false } ]	[ { "databaseRulesetId": 1, "rulesetName": "MY_RS", "databaseConnectorId": 1, "refreshDropsTables": false } ]
2. Connectors	4. Job
[ { "databaseConnectorId": 1, "connectorName": "Target_CON", "databaseType": "MSSQL", "environmentId": 1, "databaseName": "Target", "host": "[Target Host]", "instanceName": "SQL2016", "port": 1433, "schemaName": "dbo", "username": "sa", "kerberosAuth": false }, { "databaseConnectorId": 2, "connectorName": "Source_CON", "databaseType": "MSSQL", "environmentId": 2, "databaseName": "Source", "host": "[Source Host]", "instanceName": "SQL2016", "port": 1433, "schemaName": "dbo", "username": "sa", "kerberosAuth": false } ]	[ { "maskingJobId": 1, "jobName": "OTF_MASK", "rulesetId": 1, "feedbackSize": 50000, "jobDescription": "", "maxMemory": 1024, "minMemory": 1024, "multiTenant": false, "numInputStreams": 1, "onTheFlyMasking": true, "databaseMaskingOptions": { "batchUpdate": true, "bulkData": false, "commitSize": 10000, "disableConstraints": false, "dropIndexes": false, "disableTriggers": false, "numOutputThreadsPerStream": 1, "truncateTables": false }, "onTheFlyMaskingSource": { "connectorId": 2, "connectorType": "DATABASE" }, "failImmediately": false } ]

More Information

In-Place

Requires a Unique Row Identifier (URI) - Update Key, usually the Primary Key (PK), to mask the data.
Uses UPDATE SQL Statement to change masked data.
Only the URI and the masked data are needed in the masking job (unless explicitly detailed in the Custom SQL or Algorithm).
Transfers less data but can have problems masking indexes.
The URI CANNOT be masked. Therefore:
- Masking jobs will pick PK or IDENTITY column as the URI - Update Key.
- If a URI does not exist - a URI has to be created manually (on some databases this process is automatic).
- Oracle uses ROWID.

DB2 has a similar column as Oracle but this is configurable so check with the DBA.

On-The-Fly

Reads the 'Source' Database.
Does not require a Unique Row Identifier (URI).
Uses INSERT SQL statement to insert data into the Target.
The Source, Target schema needs to match and match with the Rule Set (see OTF Error below).
As the masked data is inserted, the tables need to be empty:
- run the job with Truncate.
- Or manually delete (or create fresh Target).
All data from each row is read so this method transfers more data.
- Large fields can slow down the job significantly (on Database Cast can be used to reduce the size).

Requirements

Both Masking Methods require the following on any Masked columns:

Constraints to be disabled (or dropped).
Primary Keys (PK) to be dropped.

Note:
The constraint issue (also on PK) means that the masked data needs to be Unique and for In-Place the masked value CANNOT already be in the column, or the update will violate the Constraint.

If the masked data is not unique, the Constraint and the PK cannot be re-established.

Performance Notes

For best Database performance:

IP and OTF:
- Disable triggers.
- Drop indexes on masked columns.
OTF:
- Large fields requires a lot of bandwidth - use Cast to reduce the size.

Troubleshooting

OTF Error: Rule Set not matching Source/Target

Method: OTF
Version: 5.3.6.0
Cause: This error is cryptic. The Source schema and the Rule Set are not matching. This generates the NullPointerException.
Resolution:
1. Ensure that the schema (table and columns) match the Source and Target.
2. Refresh the Rule Set.

[JOB_ID_xx_yy] ERROR 17-09 21:35:03,956 - Table input - You need to specify a database connection.
[JOB_ID_xx_yy] ERROR 17-09 21:35:03,956 - Table input - Error initializing step [Table input]
[JOB_ID_xx_yy] ERROR 17-09 21:35:03,957 - TableOutput - Error initializing step [TableOutput]
[JOB_ID_xx_yy] ERROR 17-09 21:35:03,958 - TableOutput - java.lang.NullPointerException
[JOB_ID_xx_yy]  at org.pentaho.di.trans.steps.tableoutput.TableOutput.init(TableOutput.java:559)
[JOB_ID_xx_yy]  at org.pentaho.di.trans.step.StepInitThread.run(StepInitThread.java:62)
[JOB_ID_xx_yy]  at java.lang.Thread.run(Thread.java:748)

KBA

Applicable Delphix Versions

At a Glance

In-Place

How to create

Create using API

On-The-Fly

Configuration

How to create

Create using API

More Information

In-Place

On-The-Fly

Requirements

Performance Notes

Troubleshooting

OTF Error: Rule Set not matching Source/Target

Related Articles