Masking Method: On-The-Fly and In-Place (KBA1774)
This topic details creating masked databases or files using masking methods: In-Place and On-The-Fly. The page uses the term database mostly but the details applies to files too. For more details on File Masking see the Related Articles.
At a Glance
Available in: |
Available in all known versions.: 4.x, 5.0, 5.1. 5.2, and 5.3. |
|||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Description: |
This article describes two methods for creating the masked database.
|
|||||||||||||||
Recommendation: |
The Best Practice recommendation is to use:
|
|||||||||||||||
Characteristics: |
1 Unique Row Identifier - This can be a single column or multiple columns. |
|||||||||||||||
Considerations: |
|
In-Place
When masking a database using In-Place, the data is read from the 'Target' and when masked the masked data is Updated back to the 'Target'.
Requirements
Requirements | Source | Target |
|
|
|
|
|
How to create an In-Place masking job
- Create a VDB or a copy of the Database to be masked.
- Create an Environment (for example called Target).
View the environment. - Create a Connector (for example called Target_CON).
- Create a Rule Set.
- Open the Inventory and define Masked Columns, alternatively use Profiling.
- Click Overview and create a Masking Job.
How does an In-Place masking job look like using the API
The job created above looks like this using the API.
- Job 1 > Rule Set 1 > Connector 1 > Environment 1
1. Environments | 3. Rule Set |
{ "_pageInfo": { "numberOnPage": 1, "total": 1 }, "responseList": [ { "environmentId": 1, "environmentName": "Target", "application": "App", "purpose": "MASK", "isWorkflowEnabled": false } ] } |
{ "_pageInfo": { "numberOnPage": 1, "total": 1 }, "responseList": [ { "databaseRulesetId": 1, "rulesetName": "OTF_RS", "databaseConnectorId": 1, "refreshDropsTables": false } ] } |
2. Connectors | 4. Job |
{ "_pageInfo": { "numberOnPage": 1, "total": 1 }, "responseList": [ { "databaseConnectorId": 1, "connectorName": "Target_CON", "databaseType": "MSSQL", "environmentId": 1, "databaseName": "Target", "host": "10.43.97.248", "instanceName": "SQL2016", "port": 1433, "schemaName": "dbo", "username": "sa", "kerberosAuth": false } ] } |
{ "_pageInfo": { "numberOnPage": 1, "total": 1 }, "responseList": [ { "maskingJobId": 1, "jobName": "IP_MASK", "rulesetId": 1, "createdBy": "admin", "createdTime": "2019-07-09T02:36:33.555+0000", "email": "anders@delphix.com", "feedbackSize": 50000, "jobDescription": "", "maxMemory": 1024, "minMemory": 1024, "multiTenant": false, "numInputStreams": 1, "onTheFlyMasking": false, "databaseMaskingOptions": { "batchUpdate": true, "bulkData": false, "commitSize": 10000, "disableConstraints": false, "dropIndexes": false, "disableTriggers": false, "numOutputThreadsPerStream": 1, "truncateTables": false }, "failImmediately": false } ] } |
On-The-Fly
The one main rule when configuring On-The-Fly masking is that the Source can Never Ever be masked. The On-The-Fly is, therefore, defined against the Target.
Requirements
Requirements | Source | Target |
|
|
|
|
|
|
How to create an On-The-Fly masking job
To mask the Target:
It is recommended to start with the target.
- Create a copy of the Database to be masked
This will be the target. - Create an Environment (for example called Target).
View the environment. - Create a Connector to the Target (for example called Target_CON).
- Create a Rule Set.
- Open the Inventory and define Masked Columns, alt use Profiling.
To mask the Source:
- Create an Environment (for example called Source).
View the environment. - Create a Connector to the Source (for example called Source_CON).
To create the masking job:
- Navigate to the Target Environment.
- Click Overview and create a Masking Job.
It is recommended to use a VDB as the source.
How does an On-The-Fly masking job look like using the API
The job created above looks like this using the API.
- Job 1
- Target: Rule Set 1 > Connector 1 > Environment 1
- Source: Connector 2 > Environment 2
1. Environments | 3. Rule Set |
{ "_pageInfo": { "numberOnPage": 2, "total": 2 }, "responseList": [ { "environmentId": 1, "environmentName": "Target", "application": "App", "purpose": "MASK", "isWorkflowEnabled": false }, { "environmentId": 2, "environmentName": "Source", "application": "App", "purpose": "MASK", "isWorkflowEnabled": false } ] } |
{ "_pageInfo": { "numberOnPage": 1, "total": 1 }, "responseList": [ { "databaseRulesetId": 1, "rulesetName": "OTF_RS", "databaseConnectorId": 1, "refreshDropsTables": false } ] } |
2. Connectors | 4. Job |
{ "_pageInfo": { "numberOnPage": 2, "total": 2 }, "responseList": [ { "databaseConnectorId": 1, "connectorName": "Target_CON", "databaseType": "MSSQL", "environmentId": 1, "databaseName": "Target", "host": "10.43.97.248", "instanceName": "SQL2016", "port": 1433, "schemaName": "dbo", "username": "sa", "kerberosAuth": false }, { "databaseConnectorId": 2, "connectorName": "Source_CON", "databaseType": "MSSQL", "environmentId": 2, "databaseName": "Source", "host": "10.43.97.248", "instanceName": "SQL2016", "port": 1433, "schemaName": "dbo", "username": "sa", "kerberosAuth": false } ] } |
{ "_pageInfo": { "numberOnPage": 1, "total": 1 }, "responseList": [ { "maskingJobId": 1, "jobName": "OTF_MASK", "rulesetId": 1, "createdBy": "admin", "createdTime": "2019-07-09T02:36:33.555+0000", "email": "anders@delphix.com", "feedbackSize": 50000, "jobDescription": "", "maxMemory": 1024, "minMemory": 1024, "multiTenant": false, "numInputStreams": 1, "onTheFlyMasking": true, "databaseMaskingOptions": { "batchUpdate": true, "bulkData": false, "commitSize": 10000, "disableConstraints": false, "dropIndexes": false, "disableTriggers": false, "numOutputThreadsPerStream": 1, "truncateTables": false }, "onTheFlyMaskingSource": { "connectorId": 2, "connectorType": "DATABASE" }, "failImmediately": false } ] } |
More Information: In-Place & On-The-Fly
In-Place
- Requires a Unique Row Identifier (URI), usually the Primary Key (PK), to mask the data.
- Uses UPDATE SQL Statement to change masked data.
- Only the URI and the masked data is ingested into the masking job (unless explicitly detailed in the Custom SQL).
- Transfers less data but can have problems masking indexes.
- The URI CANNOT be masked. Therefore:
- On Oracle use ROWID.
- On other databases, a non-masked URI has to be created manually.
DB2 has a similar column as Oracle but this configurable so check with the DBA.
On-The-Fly
- Reads the 'Source' Database.
- Does not require a Unique Row Identifier (URI).
- Uses INSERT SQL statement to insert data into the Masked Copy.
- All columns on the source, target, and Rule Set need to match.
- As the masked data is inserted, the tables need to be empty (or the rows to be masked must be removed prior).
- All data from each row is read so this method transfers more data.
Requirements
Both Masking Methods require the following on any Masked columns:
- Constraints to be disabled (or dropped).
- Primary Keys (PK) to be dropped.
Performance Notes
For best performance:
- Disable triggers.
- Drop indexes on masked columns.
Troubleshooting - Investigating Logs
Error: You need to specify a database connection - TableOutput - NullPointerException
- Method: OTF
- Cause: This error is cryptic but the clue is at the TableOutput. The source, the target, and the Rule Set are not matching and it generates a NullPointerException.
- Resolution: Ensure that columns on the target match the source and then Refresh the Rule Set.
[JOB_ID_168_233] ERROR 17-09 21:35:03,956 - Table input - You need to specify a database connection. [JOB_ID_168_233] ERROR 17-09 21:35:03,956 - Table input - Error initializing step [Table input] [JOB_ID_168_233] ERROR 17-09 21:35:03,957 - TableOutput - Error initializing step [TableOutput] [JOB_ID_168_233] ERROR 17-09 21:35:03,958 - TableOutput - java.lang.NullPointerException [JOB_ID_168_233] at org.pentaho.di.trans.steps.tableoutput.TableOutput.init(TableOutput.java:559) [JOB_ID_168_233] at org.pentaho.di.trans.step.StepInitThread.run(StepInitThread.java:62) [JOB_ID_168_233] at java.lang.Thread.run(Thread.java:748) [JOB_ID_168_233] --- [JOB_ID_168_233] ERROR 17-09 21:35:04,173 - MaskedTable - Step [Table input.0] failed to initialize! [JOB_ID_168_233] ERROR 17-09 21:35:04,173 - MaskedTable - Step [TableOutput.0] failed to initialize! [JOB_ID_168_233] ERROR 17-09 21:35:04,174 - MaskedTable - Unable to prepare for execution of the transformation [JOB_ID_168_233] ERROR 17-09 21:35:04,174 - MaskedTable - org.pentaho.di.core.exception.KettleException: [JOB_ID_168_233] We failed to initialize at least one step. Execution can not begin! [JOB_ID_168_233] [JOB_ID_168_233] at org.pentaho.di.trans.Trans.prepareExecution(Trans.java:932) [JOB_ID_168_233] at org.pentaho.di.trans.Trans.execute(Trans.java:504) [JOB_ID_168_233] at org.pentaho.di.job.entries.trans.JobEntryTrans.execute(JobEntryTrans.java:1034) ...
Related Articles
Knowledge Base Links: