Skip to main content
Delphix

Best Practice: File Masking Job Configuration (KBA1821)

 

This article details the best practices for File Masking using FTP and SFTP. 

File Masking Recommendations 

File Masking and Database Masking process the data in the masking job differently. File masking is more like On-The-Fly masking and there are some key configurations that will improve File Masking jobs.

Masking Process

File Masking jobs read all data (all rows and fields) in the file, thus requiring memory. The first thing is to cater for this.  The memory needed is equal to the largest file or if patterns are used, the largest sum size of files. 

Recommendation 

  1. Files (in general):
    1. If files are created for masking, the best practice is to create files that are:  
      • small in size. 
      • with few fields. 
      • without large texts/blobs.
    2. If the file size is set, make sure the correct amount of memory is allocated.
       
  2. Patterns:
    1. Ensure:
      1. There aren't too many (large) files in each pattern (you can have as many patterns as you like).
      2. The correct amount of memory is allocated.
        Note that incorrect memory allocation will affect performance. 
    2. If there are a large number of files, split them into multiple patterns.

 

tip

Tip:

If the performance is slow for files in a pattern but fast for a single file - the issue is very likely that the job is paging.
Solution: Increase the Job Max Memory.

Masking Job Configuration (On-The-Fly)

The two file masking methods, In Place and On-The-Fly, masks data in the same way. The difference is that In Place reads and write files to the same location and overwrites to original file while On-The-Fly reads and writes to different locations.

To overwrite the original file, In Place masking reads and writes the masked files twice over the network. First time to mask the file and the second time to overwrite the original file. This almost doubles the time it takes to mask a file compared to On The Fly, which does only reads and writes the data once.

Best Practice

The best practice is to use On-The-Fly and use the masked *.msk file in downstream processes. 

The reasons are: 

  • On-The-Fly is faster.
    • The masking duration is up to half the time compared with In-Place. 
  • On-The-Fly is much more secure.
    • Having separate Masking Environments and (S)FTP Folders for Unmasked Source and Masked Target is much more secure - separating unmasked and masked files.    
  • On-The-Fly is compliant with older version of (S)FTP server. 

How it Works: On-The-Fly

The one rule to remember when configuring On-The-Fly masking is that the Source can Never Ever be masked. Therefore, the masking rules (Masking Job, Rule Set, Algorithms etc) are always defined against the Target.

Note that the Target Environment is configured more or less exactly as the In-Place Environment, only the Source Connector is different. 

The best idea is to create an Environment called Source. This environment is for Sources only and will only have Connectors

Masking On-The-Fly.png

For On-The-Fly, this is what is needed:

Requirements Source  Target
  • A Source and a Target folder.
    • For profiling, the Target needs to have data.
    • For masking, the Target will be overwritten. 
  • Folder
  • Folder
  • 2 x Environments (one Source - one Target).
  • 2 x Connectors (one for the Source - one for the Target).
  • 1 x Rule Set and Masking Rules.
  • 1 x Job
    • Configured as On-The-Fly method.
    • Rule Set is the Target Rule Set, defined above.
    • Source Environment and Source Connector.
  • Environment
  • Connector
  • Environment
  • Connector
  • Rule Set
  • Job


Note: with this configuration, there is no chance of masking the Source and the method can change from On-The-Fly to In-Place. 

Steps - File Format

This steps is to create the File Format unless you already have done it.

  1. Go to Settings and File Formats.
  2. Click Import File Format
Steps - Target Environment with Connector and Rule Set

The steps for the Target are (Note: the best procedure is to start with the target):

  1. Create a copy of the Files to be masked and copy them to the target folder. 
  2. Create an Environment.
  3. View environment.
  4. Create a Connector to the Target.
  5. Create a Rule Set.
  6. Open the Inventory and define Masked Columns, alternatively use Profiling
Steps - Source Environment with Connector

The steps for the Source are:

  1. Create an Environment.
  2. View environment.
  3. Create a Connector to the Source.
Last thing - Create the Masking Job

Go back to the Target Environment to create the Job:

  1. Click on Overview.
  2. Create the Masking Job

Rule Set Configuration 

The objective of these recommendations is to limit the amount of memory needed to run the File Masking Job and to improve masking performance.  

 

Note

Note:

The memory issue only affects large files and a pattern that contains a large number of files. 

The following can be configured in a Rule Set: 

  • List of individual files
  • List of RegEx patterns to mask multiple files
Example

Masking UI - Rule Set - File and Pattern.png

What happens here?

When the job reads the file, it needs to read the complete file, including all fields. If the file is large, then more memory allocation is needed in order to hold all the data in the masking job. 

For patterns, each pattern will read all files in the pattern and mask them as one unit. If the pattern contains many files and these files are large, this increases the need for more memory. 

If memory is not enough, the job will start paging the data to disk, which will slow down performance or in the worst case, the job will crash. 

Best Practice
  • Keep masked objects (files and patterns) as small as possible.
  • Split masked objects into multiple objects (patterns or files).
  • Set memory according to the size of the largest file or group of files in a pattern.
Memory requirement

The amount of memory required is dependent on:

  • The largest of the two:
    • The size of the largest single file. 
    • The total size of the largest group of files in a pattern.

Windows SFTP Server - OpenSSH

Windows is now supporting SFTP. The OpenSSH client and server are now available as a supported Feature-on-Demand in Windows Server 2019 and Windows 10. 

Best Practice

The best practice on Windows is to use the built in OpenSSH Server. 

 

 


Related Articles

Documentation Links:

External Links: