Skip to main content
Delphix

Masking Method: On-The-Fly and In-Place (KBA1774)

 

This topic details creating masked databases or files using masking methods: In-Place and On-The-Fly. The page uses the term database mostly but the details applies to files too. For more details on File Masking see the Related Articles.

At a Glance  

Available in:

Available in all known versions.: 4.x, 5.0, 5.1. 5.2, and 5.3.

Description:

This article describes two methods for creating the masked database. 

  • In-Place - The data in the masked table is read, masked and changed (updated).

  • On-The-Fly - The data is read from a 'source' table, masked and inserted into a fresh table in another database.

Recommendation:

The Best Practice recommendation is to use:

  • In-Place for Database masking.
  • On-The-Fly for File masking.  
Characteristics:
 
SQL

URI1
Masked VDB2
Comment
In-Place SELECT + UPDATE Needed Yes Generally Faster. 
On-The-Fly SELECT + INSERT No No Best for Heap Tables with no URI.

1 Unique Row Identifier - This can be a single column or multiple columns. 
2 Masked VDB - This is a feature using Delphix Virtualization Engine.

Considerations:
  1. Always verify and check: Indexes, Primary Keys, Constraints, and Triggers. These will affect the masking job performance and the ability to execute the masking job successfully.
  2. All columns on the source, the target, and the Rule Set need to exist and match. 

In-Place 

When masking a database using In-Place, the data is read from the 'Target' and when masked the masked data is Updated back to the 'Target'.

 

tip

Tip:

The best idea with regards to simplicity and security, is to always mask a VDB from the Virtualization Engine.
This also adds another level of security to not mask the source and if you use Masked VDB, the VDB will be tagged as masked.
You can also easily rollback when you are testing your masking job.

 

Masking In-Place v2.png

Requirements 

Requirements Source  Target
  • A Target database.
    • For profiling, the Target needs to have data.
    • For masking, the Target needs to have data.
 
  • Database
    or File
  • 1 x Environment.
  • 1 x Connector.
  • 1 x Rule Set and Masking Rules.
  • 1 x Job
    • Configured as In-Place method.
    • Rule Set is the Rule Set, defined above.
 
  • Environment
  • Connector
  • Rule Set
  • Job

How to create an In-Place masking job

  1. Create a VDB or a copy of the Database to be masked. 
  2. Create an Environment (for example called Target).
    View the environment.
  3. Create a Connector (for example called Target_CON).
  4. Create a Rule Set.
  5. Open the Inventory and define Masked Columns, alternatively use Profiling
  6. Click Overview and create a Masking Job

How does an In-Place masking job look like using the API

The job created above looks like this using the API.

  • Job 1 > Rule Set 1 > Connector 1 > Environment 1
1. Environments 3. Rule Set
{
  "_pageInfo": {
    "numberOnPage": 1,
    "total": 1
  },
  "responseList": [
    {
      "environmentId": 1,
      "environmentName": "Target",
      "application": "App",
      "purpose": "MASK",
      "isWorkflowEnabled": false
    }
  ]
}
{
  "_pageInfo": {
    "numberOnPage": 1,
    "total": 1
  },
  "responseList": [
    {
      "databaseRulesetId": 1,
      "rulesetName": "OTF_RS",
      "databaseConnectorId": 1,
      "refreshDropsTables": false
    }
  ]
}
2. Connectors 4. Job
{
  "_pageInfo": {
    "numberOnPage": 1,
    "total": 1
  },
  "responseList": [
    {
      "databaseConnectorId": 1,
      "connectorName": "Target_CON",
      "databaseType": "MSSQL",
      "environmentId": 1,
      "databaseName": "Target",
      "host": "10.43.97.248",
      "instanceName": "SQL2016",
      "port": 1433,
      "schemaName": "dbo",
      "username": "sa",
      "kerberosAuth": false
    }
  ]
}
{
  "_pageInfo": {
    "numberOnPage": 1,
    "total": 1
  },
  "responseList": [
    {
      "maskingJobId": 1,
      "jobName": "IP_MASK",
      "rulesetId": 1,
      "createdBy": "admin",
      "createdTime": "2019-07-09T02:36:33.555+0000",
      "email": "anders@delphix.com",
      "feedbackSize": 50000,
      "jobDescription": "",
      "maxMemory": 1024,
      "minMemory": 1024,
      "multiTenant": false,
      "numInputStreams": 1,
      "onTheFlyMasking": false,
      "databaseMaskingOptions": {
        "batchUpdate": true,
        "bulkData": false,
        "commitSize": 10000,
        "disableConstraints": false,
        "dropIndexes": false,
        "disableTriggers": false,
        "numOutputThreadsPerStream": 1,
        "truncateTables": false
      },
      "failImmediately": false
    }
  ]
}

On-The-Fly 

The one main rule when configuring On-The-Fly masking is that the Source can Never Ever be masked. The On-The-Fly is, therefore, defined against the Target.

Note

Note:

The Target Environment is configured more or less exactly as the In-Place Environment, only the Source Connector is different. 

 

tip

Tip:

The best idea is to create a Source Environment, for Sources only. This Environment will only have Connectors

Masking On-The-Fly.png

Requirements 

Requirements Source  Target
  • A Source and a Target database.
    • For profiling, the Target needs to have data.
    • For masking, the Target needs to exist and be empty. 
  • Database
    or File
  • Database
    or File
  • 2 x Environments (one Source - one Target).
  • 2 x Connectors (one for the Source - one for the Target).
  • 1 x Rule Set and Masking Rules.
  • 1 x Job
    • Configured as On-The-Fly method.
    • Rule Set is the Target Rule Set, defined above.
    • Source Environment and Source Connector.
  • Environment
  • Connector
  • Environment
  • Connector
  • Rule Set
  • Job

 

Note

Note:

With this configuration, there is no chance of masking the Source and the method can change from On-The-Fly to In-Place. 

How to create an On-The-Fly masking job

To mask the Target:
It is recommended to start with the target.

  1. Create a copy of the Database to be masked
    This will be the target. 
  2. Create an Environment (for example called Target).
    View the environment.
  3. Create a Connector to the Target (for example called Target_CON).
  4. Create a Rule Set.
  5. Open the Inventory and define Masked Columns, alt use Profiling

To mask the Source:

  1. Create an Environment (for example called Source).
    View the environment.
  2. Create a Connector to the Source (for example called Source_CON).

To create the masking job:

  1. Navigate to the Target Environment.
  2. Click Overview and create a Masking Job
    It is recommended to use a VDB as the source.

How does an On-The-Fly masking job look like using the API

The job created above looks like this using the API.

  • Job 1 
    • Target: Rule Set 1 > Connector 1 > Environment 1
    • Source: Connector 2 > Environment 2
1. Environments 3. Rule Set
{
  "_pageInfo": {
    "numberOnPage": 2,
    "total": 2
  },
  "responseList": [
    {
      "environmentId": 1,
      "environmentName": "Target",
      "application": "App",
      "purpose": "MASK",
      "isWorkflowEnabled": false
    },
    {
      "environmentId": 2,
      "environmentName": "Source",
      "application": "App",
      "purpose": "MASK",
      "isWorkflowEnabled": false
    }
  ]
}
{
  "_pageInfo": {
    "numberOnPage": 1,
    "total": 1
  },
  "responseList": [
    {
      "databaseRulesetId": 1,
      "rulesetName": "OTF_RS",
      "databaseConnectorId": 1,
      "refreshDropsTables": false
    }
  ]
}
2. Connectors 4. Job
{
  "_pageInfo": {
    "numberOnPage": 2,
    "total": 2
  },
  "responseList": [
    {
      "databaseConnectorId": 1,
      "connectorName": "Target_CON",
      "databaseType": "MSSQL",
      "environmentId": 1,
      "databaseName": "Target",
      "host": "10.43.97.248",
      "instanceName": "SQL2016",
      "port": 1433,
      "schemaName": "dbo",
      "username": "sa",
      "kerberosAuth": false
    },
    {
      "databaseConnectorId": 2,
      "connectorName": "Source_CON",
      "databaseType": "MSSQL",
      "environmentId": 2,
      "databaseName": "Source",
      "host": "10.43.97.248",
      "instanceName": "SQL2016",
      "port": 1433,
      "schemaName": "dbo",
      "username": "sa",
      "kerberosAuth": false
    }
  ]
}
{
  "_pageInfo": {
    "numberOnPage": 1,
    "total": 1
  },
  "responseList": [
    {
      "maskingJobId": 1,
      "jobName": "OTF_MASK",
      "rulesetId": 1,
      "createdBy": "admin",
      "createdTime": "2019-07-09T02:36:33.555+0000",
      "email": "anders@delphix.com",
      "feedbackSize": 50000,
      "jobDescription": "",
      "maxMemory": 1024,
      "minMemory": 1024,
      "multiTenant": false,
      "numInputStreams": 1,
      "onTheFlyMasking": true,
      "databaseMaskingOptions": {
        "batchUpdate": true,
        "bulkData": false,
        "commitSize": 10000,
        "disableConstraints": false,
        "dropIndexes": false,
        "disableTriggers": false,
        "numOutputThreadsPerStream": 1,
        "truncateTables": false
      },
      "onTheFlyMaskingSource": {
        "connectorId": 2,
        "connectorType": "DATABASE"
      },
      "failImmediately": false
    }
  ]
}

More Information: In-Place & On-The-Fly

In-Place 

  • Requires a Unique Row Identifier (URI), usually the Primary Key (PK), to mask the data.
  • Uses UPDATE SQL Statement to change masked data.
  • Only the URI and the masked data is ingested into the masking job (unless explicitly detailed in the Custom SQL). 
  • Transfers less data but can have problems masking indexes. 
  • The URI CANNOT be masked. Therefore:
    • On Oracle use ROWID.
    • On other databases, a non-masked URI has to be created manually.

DB2 has a similar column as Oracle but this configurable so check with the DBA.

On-The-Fly 

  • Reads the 'Source' Database.
  • Does not require a Unique Row Identifier (URI).
  • Uses INSERT SQL statement to insert data into the Masked Copy. 
  • All columns on the source, target, and Rule Set need to match.
  • As the masked data is inserted, the tables need to be empty (or the rows to be masked must be removed prior).
  • All data from each row is read so this method transfers more data.

Requirements 

Both Masking Methods require the following on any Masked columns:

  • Constraints to be disabled (or dropped).
  • Primary Keys (PK) to be dropped.

 

Note

Note:

 The constraint issue (also on PK) means that the masked data needs to be Unique and for In-Place the masked value CANNOT already be in the column, or the update will violate the Constraint. 

If the masked data is not unique, the Constraint and the PK cannot be re-established. 

Performance Notes 

For best performance:

  • Disable triggers.
  • Drop indexes on masked columns. 

Troubleshooting - Investigating Logs 

Error: You need to specify a database connection - TableOutput - NullPointerException 
  • Method: OTF
  • Cause: This error is cryptic but the clue is at the TableOutput. The source, the target, and the Rule Set are not matching and it generates a NullPointerException. 
  • Resolution: Ensure that columns on the target match the source and then Refresh the Rule Set.
[JOB_ID_168_233] ERROR 17-09 21:35:03,956 - Table input - You need to specify a database connection.
[JOB_ID_168_233] ERROR 17-09 21:35:03,956 - Table input - Error initializing step [Table input]
[JOB_ID_168_233] ERROR 17-09 21:35:03,957 - TableOutput - Error initializing step [TableOutput]
[JOB_ID_168_233] ERROR 17-09 21:35:03,958 - TableOutput - java.lang.NullPointerException
[JOB_ID_168_233]  at org.pentaho.di.trans.steps.tableoutput.TableOutput.init(TableOutput.java:559)
[JOB_ID_168_233]  at org.pentaho.di.trans.step.StepInitThread.run(StepInitThread.java:62)
[JOB_ID_168_233]  at java.lang.Thread.run(Thread.java:748)
[JOB_ID_168_233]
---

[JOB_ID_168_233] ERROR 17-09 21:35:04,173 - MaskedTable - Step [Table input.0] failed to initialize!
[JOB_ID_168_233] ERROR 17-09 21:35:04,173 - MaskedTable - Step [TableOutput.0] failed to initialize!
[JOB_ID_168_233] ERROR 17-09 21:35:04,174 - MaskedTable - Unable to prepare for execution of the transformation
[JOB_ID_168_233] ERROR 17-09 21:35:04,174 - MaskedTable - org.pentaho.di.core.exception.KettleException:
[JOB_ID_168_233] We failed to initialize at least one step.  Execution can not begin!
[JOB_ID_168_233]
[JOB_ID_168_233]  at org.pentaho.di.trans.Trans.prepareExecution(Trans.java:932)
[JOB_ID_168_233]  at org.pentaho.di.trans.Trans.execute(Trans.java:504)
[JOB_ID_168_233]  at org.pentaho.di.job.entries.trans.JobEntryTrans.execute(JobEntryTrans.java:1034)
...