Skip to main content
Delphix

KBA1328 Mapping Algorithm (MA) Technical Overview

 

At a Glance

Versions: Applicable Delphix Masking versions: 4.x, 5.0.x, 5.1.x, 5.2.x, 5.3.x
Description:

This is a fairly commonly used algorithm very similar to Secure Lookup with the difference that this generates a guaranteed 1:1 mapping.
The lookup data is stored in the internal masking repository (database).
The content in the MA needs to be imported and the mapping 'stays' with the algorithm until recreated (5.3).

Characteristics:
 
Type
Unique
Lookup1
Referential
Integrity2

1:1 Mapping3

Strength

Comment
Mapping Lookup Yes Yes  Yes Strong  

1 Unique Lookup - Lookup values are all unique within the algorithm (dedup of new data each load time).
2 Referential Integrity - The masked value will be the same between job executions as well as tables.
3 1:1 Mapping - The masked value will be mapped uniquely to the input value within masked column.

New data can be loaded (need to be new (not loaded before) or it will not be added to the pool).

Character 
Encodings:

Works with all characters encodings (some bugs are known).
See below for the workaround.  

Lookup Pool Size:

The size of the lookup pool depends on the amount of memory. For performance and memory, the recommended size is up to around 5,000,000.

Note: there are two processes to consider:
1. Ingestion - this is the time it takes 
to load the data into the masking repository.
2. Loadtime - this is at runtime and the time to load the lookup values into the job.

If a very large number of values are needed to be uniquely masked it would be better to use Segmented Mapping or Tokenization.

Limitations:

* The load time and pre-job preparation time (check distinct and uniqueness) can be long.
* In pre 5.3, all values in the uploaded file needs to be unique.
* Two (or more) masking jobs with MA can't be started at the same time (DLPX-55612).

Version Updates: In version 5.2 the load time was improved (DLPX-43992). 
In version 5.3 the dedup was improved.

High-Level Overview 

This algorithm is called MA for short. There are no MA algorithms shipped out of the boxes with the Masking Engine. An MA algorithm needs to be created.

Use this algorithm when there cannot be any duplicate values and the masked data need to be 1:1 with the original. The MA algorithm has a pool of unmapped lookup values, which can be topped up when needed. 

UI - Creation and Modification 

The algorithms are accessed from the tab Settings and under the submenu Algorithm. From the Algorithm page, an MA algorithm can be created and also modified (edit):

  • To create click "Add Algorithm".
  • To modify click the green pen in the "Edit" column.

UI - Create  

The following popup is accessed when creating and modifying the algorithm:

UI_MappingAlg_Desc.png

  1. Algorithm Properties
    • Algorithm Name
    • Description
  2. Lookup Details
    • Lookup File Name
    • Delimiter - the character used in the file to separate search and lookup value

UI - Modification  

The following popup is presented when modifying the Mapping algorithm.

Note Pre 5.3 versions has the Append tick-box at the bottom. In 5.3. append is the default behavior. 

UI Mapping Algorithm (edit).png

Editable parameters are:

  • Description
  • List of lookup values
    • The list of lookup values can be appended to the existing list or replacing the existing list. 
    • For pre 5.3. please select Append if append is needed or mappings will change. 
  • Ignore Characters

Note - To change the Name the algorithm needs to be deleted and then recreated. 

Considerations 

Since the algorithm is retaining all mappings, new mappings are taken from a lookup pool. Therefore, the pool of lookup values needs to be large than the distinct set of new values to mask. 

Version 5.3 recommended

Version 5.3. has some key improvements in the preparation process and it is recommended to use this or later versions. 

Mapping Values - Ingestion and Prepare Process

Below is a diagram showing how the map values are loaded and prepared.

Algorithm_MA_Create_Prepare_Process.png

Memory Requirements 

The Mapping Algorithm requires memory to load the lookup values when the masking job starts. For large pool sizes, it will be required to increase the Min/Max Memory settings in the Job Configuration. The size required depends on the lengths of the data in the lookup and number of distinct lookup values. 

As a ballpark figure, set max to 1 GB + 1 GB per 3 million rows and min 1-2 GB below the max value. It might be required to add more memory to cater for larger data objects and multiple masked columns. 

More info:

Ignore Characters

Mapping Algorithm has a feature called 'Ignore Characters'. The feature Ignore Characters is applied on the source data to be masked and is used to ensure that even if the representation of the data is different due to added auxiliary characters, the value is masked to the same value.  

Example - without and with "Ignore Characters"

The example below shows masking without and with "Ignore Characters" (IC):

  • Without IC - masked result are different even if surnames are very similar (same).
  • With IC - the shows masked result with the following two characters added in "Ignore Characters" _ and ':
    • '_Smith' is converted to 'Smith' and both masked to 'Johnson'.
    • 'O'Connor' is converted to 'Oconnor' and both masked to 'Williams'.

The added string in "Ignore Characters" is shown below - each character is separated by a comma.  

_,'
Ref Input Masked without IC  Masked with IC
1 Smith Johnson Johnson
2 _Smith Anderson Johnson
3 Oconnor Williams Williams
4 O'Connor Brown Williams

Additional Information 

Similar algorithms:

Delphix documentation: