Skip to main content
Delphix

Best Practice File Masking - Rule Set and Job Configuration (KBA1821)

 

 

KBA

KBA#1821

This article details the best practices for File Masking using FTP and SFTP. 

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases
6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.8.1, 6.0.9.0, 6.0.10.0, 6.0.10.1, 6.0.11.0, 6.0.12.0, 6.0.12.1

5.3

5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0, 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

5.2

5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1

5.1

5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0

5.0

5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1, 5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

File Masking and Tokenization Recommendations 

File Masking and Database Masking process the data in the masking job differently. File Masking jobs are more like On-The-Fly (OTF) masking and there are some key configurations that will improve File Masking jobs.

This article is also applicable to Tokenization jobs. For simplicity, the term Masking will be used from now on.

Creation Process

To create a File Masking job, the best practice is to use On-The-Fly (OTF) masking. This can, however, be tricky to configure and the recommended way to create the masking job is to start with In-Place (IP) and then switch to On-The-Fly. In-Place will see all files and the Rule Set will be created where the masked files are/will be located. The last step is to change the job to point to the source folder and the On-The-Fly job is ready. 

  1. Copy the files to the target (FTP/SFTP) folder.
  2. Create Rule Set and In-Place masking job.
  3. When working, create a Source Environment and a Source Connector.
  4. Change the masking job to On-The-Fly.

This process also helps to run Profiling jobs, which are then run against the Target. 

Masking Process

File Masking jobs read all data (all rows and fields). This means that all data will be transferred over the wire. 

Note

Note:

On older engines, the memory needed is equal to the largest file or for patterns, all files in the pattern.

This has been resolved in version 6.0.4.0 and onwards.

6.0.4.0 introduced a feature called Row Limit which sets a max limit on the number of rows in the engine at any point in time (and hence the limits the memory used).

 

Files and Patterns 

Each file and pattern in the Rule Set has its own Transformation and a connector for Input and one for Output. This means that all files in a Pattern will be processed as a single object (the Output handles the creation of each file).

File Masking jobs read all data (all rows and fields). This means that all data will be transferred over the wire. 

Note

Note:

If the files are created for masking, the best practice is to create files that have few fields and without large amount of text/blobs.

 

The following can be configured in a Rule Set: 

  • List of individual files
  • List of RegEx patterns to mask multiple files
Example

Masking_UI_-_Edit_Rule_Set_-_File_Pattern_-_KBA1821.png

Masking Job Configuration (On-The-Fly)

The two file masking methods, In Place and On-The-Fly, mask data in the same way. The difference is that In Place reads and writes files to the same location and overwrites to the original file while On-The-Fly reads and writes to different locations.

To overwrite the original file, In Place masking reads and writes the masked files twice over the network (this is due to User and Group properties). The file is first read, masked, and written to temp file (*.msk). It is then read and written a second time to overwrite the original file. This almost doubles the time it takes to mask a file compared to On The Fly, which does only reads and writes the data once.

Best Practice

The best practice is to use On-The-Fly.

The reasons are: 

  • On-The-Fly is faster.
    • OTF masking runtime is almost half compared with IP. 
  • On-The-Fly is much more secure.
    • Having separate Masking Environments and (S)FTP Folders for Unmasked Source and Masked Target is much more secure - separating unmasked and masked files.    
  • On-The-Fly works on more and older versions of (S)FTP server. 

How it Works: On-The-Fly

The one rule to remember when configuring On-The-Fly masking is that the Source can Never Ever be masked. Therefore, the masking (Masking Job, Rule Set, Algorithms, etc) is always defined against the Target.

Note that the Target Environment is configured more or less exactly as the In-Place Environment, only the Source Connector is different. 

Pro-Tip: It is recommended to name the Environment and the Connector so that they can be identified as being the Source. The Source Environment should only have Connectors

Masking On-The-Fly.png

For On-The-Fly, this is what is needed:

Requirements Source  Target
  • A Source and a Target folder.
    • For profiling, the Target needs to have data.
    • For masking, the Target will be overwritten. 
  • Folder
  • Folder
  • 2 x Environments (one Source - one Target).
  • 2 x Connectors (one for the Source - one for the Target).
  • 1 x Rule Set and Masking Rules.
  • 1 x Job
    • Configured as On-The-Fly method.
    • Rule Set is the Target Rule Set, defined above.
    • Source Environment and Source Connector.
  • Environment
  • Connector
  • Environment
  • Connector
  • Rule Set
  • Job

 

Note

Note:

With this configuration, there is no chance of masking the Source and the method can change from On-The-Fly to In-Place.

 

Steps

Step 1 - Create File Format

This step is to create the File Format unless you already have done it.

  1. Go to Settings and File Formats.
  2. Click Import File Format
Step 2 - Create Target Environment with Connector and Rule Set

The steps for the Target are (Note: the best procedure is to start with the target):

  1. Create a copy of the Files to be masked and copy them to the target folder. 
  2. Create an Environment.
  3. View environment.
  4. Create a Connector to the Target.
  5. Create a Rule Set.
  6. Open the Inventory and define Masked Columns, alternatively use Profiling
Step 3 - Create Masking Job (for now as IP) and Test

This step is only to create the job and test it.

  1. Click on Overview.
  2. Create the Masking Job (use In-Place).
  3. Test and make sure files are masked as desired. 
Step 4 - Create Source Environment with Connector

The steps for the Source are:

  1. Create a Source Environment.
  2. View environment.
  3. Create a Source Connector to the Source.
Step 5 - Last thing - Change Job to On-The-Fly

Go back to the Target Environment:

  1. Click on Overview.
  2. Change the Masking Job to an On-The-Fly and set the Source Environment and Connector. 

Windows SFTP Server - OpenSSH

Windows is now supporting SFTP. The OpenSSH client and server are now available as a supported Feature-on-Demand in Windows Server 2019 and Windows 10. 

Best Practice

The best practice on Windows is to use the built in OpenSSH Server.