Skip to main content
Delphix

KBA1774 In-Place and On-The-Fly Masking Job: How To Guide

 

 

At a Glance 

Available in:

Available in all known versions.: 4.x, 5.0, 5.1. 5.2, and 5.3.

Description:

This article describes two methods for creating the masked database. 

  • In-Place - The data in the masked table is read, masked and changed (updated).

  • On-The-Fly - The data is read from a 'source' table, masked and inserted into a fresh table in another database.

Characteristics:
 
SQL

URI1
Masked VDB2
Comment
In-Place SELECT + UPDATE Needed Yes Generally Faster. 
On-The-Fly SELECT + INSERT No No Best for Heap Tables with no URI.

1 Unique Row Identifier - This can be a single column or multiple columns. 
2 Masked VDB - This is a feature using Delphix Virtualization Engine.

Considerations:
  1. Always verify and check: Indexes, Primary Keys, Constraints, and Triggers. These will affect the masking job performance and the ability to execute the masking job successfully.
  2. All columns on the source, the target, and the Rule Set need to exist and match. 

 

How to Guide

Masking Method: In-Place

When masking a database using In-Place, the data is read from the 'Target' and when masked the masked data is Updated back on the 'Target'.

The best idea, with regards to simplicity and security, is to always mask a VDB from the Virtualization Engine. This also adds another level of security to not mask the source and if you use Masked VDB the VDB will be tagged that it is masked. You can also easily rollback when you are testing your masking job.

Masking In-Place v2.png

Requirements

For In-Place, this is what is needed:

Requirements Source  Target
  • A Target database.
    • For profiling, the Target needs to have data.
    • For masking, the Target needs to have data.
 
  • Database
  • 1 x Environment.
  • 1 x Connector.
  • 1 x Rule Set and Masking Rules.
  • 1 x Job
    • Configured as In-Place method.
    • Rule Set is the Rule Set, defined above.
 
  • Environment
  • Connector
  • Rule Set
  • Job
Steps

The steps are:

  1. Create a VDB or a copy of the Database to be masked. 
  2. Create an Environment, and view it.
  3. Create a Connector.
  4. Create a Rule Set.
  5. Open the Inventory and define Masked Columns, alternatively use Profiling
  6. Click on Overview and create a Masking Job
     

Masking Method: On-The-Fly

The one main rule and thing to think about when configuring On-The-Fly masking is that the Source can Never Ever be masked. The On-The-Fly is, therefore, defined against the Target.

Note that the Target Environment is configured more or less exactly as the In-Place Environment, only the Source Connector is different. 

The best idea is to create a Source Environment, for Sources only. This Environment will only have Connectors

Masking On-The-Fly.png
Requirements

For On-The-Fly, this is what is needed:

Requirements Source  Target
  • A Source and a Target database.
    • For profiling, the Target needs to have data.
    • For masking, the Target needs to exist and be empty. 
  • Database
  • Database
  • 2 x Environments (one Source - one Target).
  • 2 x Connectors (one for the Source - one for the Target).
  • 1 x Rule Set and Masking Rules.
  • 1 x Job
    • Configured as On-The-Fly method.
    • Rule Set is the Target Rule Set, defined above.
    • Source Environment and Source Connector.
  • Environment
  • Connector
  • Environment
  • Connector
  • Rule Set
  • Job


Note: with this configuration, there is no chance of masking the Source and the method can change from On-The-Fly to In-Place. 

Steps

The steps for the Target are (Note: the best procedure is to start with the target):

  1. Create a copy of the Database to be masked (this will be the target). 
  2. Create an Environment, and view it.
  3. Create a Connector to the Target.
  4. Create a Rule Set.
  5. Open the Inventory and define Masked Columns, alt use Profiling

The steps for the Source are:

  1. Create an Environment, and view it
  2. Create a Connector to the Source

Go back to the Target Environment to create the Job:

  1. Click on Overview and create a Masking Job

Note: it is recommended to use a VDB as the source.

 

Some technical points

Some salient technical points about the two Masking Methods.

In-Place
  • Requires a Unique Row Identifier (URI), usually the Primary Key (PK), to mask the data.
  • Uses UPDATE SQL Statement to change masked data.
  • Only the URI and the masked data is ingested into the masking job (unless explicitly detailed in the Custom SQL). 
  • Transfers less data but can have problems masking indexes. 
  • The URI CANNOT be masked. Therefore:
    • On Oracle use ROWID.
    • On other databases, a non-masked URI has to be created manually.

DB2 has a similar column as Oracle but this configurable so check with the DBA.

On-The-Fly
  • Reads the 'Source' Database.
  • Does not require a Unique Row Identifier (URI).
  • Uses INSERT SQL Statement to insert data into the Masked Copy. 
  • All columns on the source, on the target, and in the Rule Set needs to match.
  • As the masked data is inserted, the tables need to be empty (or the rows to be masked removed prior).
  • All data from each row is read - so this method transfers more data.
Requirements on both

Both Masking Methods requires the following on Masked columns (if any):

  • Constraints to be disabled (or dropped).
  • Primary Keys (PK) to be dropped.

Note: The constraint issue (also on PK) means that the masked data needs to be Unique and for In-Place the masked value CANNOT already be in the column, or the update will violate the Constraint. 

If the masked data is not unique, the Constraint and the PK cannot be re-established. 

Performance Notes

For best performance:

  • Disable triggers.
  • Drop indexes on masked columns. 

Troubleshooting - Investigating Logs

Error: You need to specify a database connection - TableOutput - NullPointerException

  • Method: OTF
  • Cause: This error is cryptic but the clue is at the TableOutput. The source, the target, and the Rule Set are not matching and it generates a NullPointerException. 
  • Resolution: Ensure that columns on the target match the source and then Refresh the Rule Set.
[JOB_ID_168_233] ERROR 17-09 21:35:03,956 - Table input - You need to specify a database connection.
[JOB_ID_168_233] ERROR 17-09 21:35:03,956 - Table input - Error initializing step [Table input]
[JOB_ID_168_233] ERROR 17-09 21:35:03,957 - TableOutput - Error initializing step [TableOutput]
[JOB_ID_168_233] ERROR 17-09 21:35:03,958 - TableOutput - java.lang.NullPointerException
[JOB_ID_168_233]  at org.pentaho.di.trans.steps.tableoutput.TableOutput.init(TableOutput.java:559)
[JOB_ID_168_233]  at org.pentaho.di.trans.step.StepInitThread.run(StepInitThread.java:62)
[JOB_ID_168_233]  at java.lang.Thread.run(Thread.java:748)
[JOB_ID_168_233]
---

[JOB_ID_168_233] ERROR 17-09 21:35:04,173 - MaskedTable - Step [Table input.0] failed to initialize!
[JOB_ID_168_233] ERROR 17-09 21:35:04,173 - MaskedTable - Step [TableOutput.0] failed to initialize!
[JOB_ID_168_233] ERROR 17-09 21:35:04,174 - MaskedTable - Unable to prepare for execution of the transformation
[JOB_ID_168_233] ERROR 17-09 21:35:04,174 - MaskedTable - org.pentaho.di.core.exception.KettleException:
[JOB_ID_168_233] We failed to initialize at least one step.  Execution can not begin!
[JOB_ID_168_233]
[JOB_ID_168_233]  at org.pentaho.di.trans.Trans.prepareExecution(Trans.java:932)
[JOB_ID_168_233]  at org.pentaho.di.trans.Trans.execute(Trans.java:504)
[JOB_ID_168_233]  at org.pentaho.di.job.entries.trans.JobEntryTrans.execute(JobEntryTrans.java:1034)
...