Skip to main content
Delphix

Algorithm: Mapping Algorithm (MA) (KBA1328)

 

This algorithm is called MA for short. There are no MA algorithms shipped out of the boxes with the Masking Engine. An MA algorithm needs to be created.

Use this algorithm when there cannot be any duplicate values and the masked data need to be 1:1 with the original. The MA algorithm has a pool of unmapped lookup values, which can be topped up when needed. 

At a Glance 

Versions: Applicable Delphix Masking versions: 4.x, 5.0.x, 5.1.x, 5.2.x, 5.3.x
Description:

This is a fairly commonly used algorithm very similar to Secure Lookup with the difference that this generates a guaranteed 1:1 mapping.
The lookup data is stored in the internal masking repository (database).
The content in the MA needs to be imported and the mapping 'stays' with the algorithm until recreated (5.3).

Characteristics:
 
Type
Unique
Lookup1
Referential
Integrity2

1:1 Mapping3

Comment
Mapping Lookup Yes Yes  Yes  

1 Unique Lookup - Lookup values are all unique within the algorithm (dedup of new data each load time).
2 Referential Integrity - The masked value will be the same between job executions as well as tables.
3 1:1 Mapping - The masked value will be mapped uniquely to the input value within masked column.

New data can be loaded (need to be new (not loaded before) or it will not be added to the pool).

Character 
Encodings:

Works with all characters encodings (some bugs are known).
See below for the workaround.  

Lookup Pool Size:

The size of the lookup pool depends on the amount of memory. For performance and memory, the recommended size is up to around 5,000,000.

Note: there are two processes to consider:
1. Ingestion - this is the time it takes 
to load the data into the masking repository.
2. Loadtime - this is at runtime and the time to load the lookup values into the job.

If a very large number of values are needed to be uniquely masked it would be better to use Segmented Mapping or Tokenization.

Limitations:

* The load time and pre-job preparation time (check distinct and uniqueness) can be long.
* In pre 5.3, all values in the uploaded file needs to be unique.
* Two (or more) masking jobs with MA can't be started at the same time (DLPX-55612).

Version Updates: In version 5.2 the load time was improved (DLPX-43992). 
In version 5.3 the dedup was improved.

Creating and Modifying Algorithms using the User Interface 

The algorithms are accessed from the tab Settings and under the submenu Algorithm. From the Algorithm page, an MA algorithm can be created and also modified (edit):

  • To create click "Add Algorithm".
  • To modify click the green pen in the "Edit" column.

User Interface

The following popup screens are accessed when creating and modifying the algorithm:

Masking UI - Mapping Algorithm Create.png

  1. When creating an algorithm, lookup values are loaded when the algorithm is created from a text file. You need to provide details for Algorithm Properties:
    • Algorithm Name
    • Description
  2. When modifying an algorithm, the Lookup details (description and lookup value) can be modified:
    • Lookup File Name
    • Delimiter - the character used in the file to separate search and lookup value 

The following popup is presented when modifying the Mapping algorithm.

Note

Note:

Prior to Version 5.3, the Append tick-box was at the bottom, and has been removed. 
Append is the default behavior from Version 5.3. 

 

Masking UI - Mapping Algorithm Edit.png

The editable parameters are:

  • Description.
  • Append list of lookup values.
    • The list of lookup values can be appended to the existing list or replacing the existing list. 
    • Note that for versions prior to 5.3. you need to select Append if necessary,  else mappings will change.
  • Ignore Characters
Note

Note:

 To change the Name the algorithm, you need to delete the Algorithm and recreate it. 

Considerations  

Since the algorithm is retaining all mappings, new mappings are taken from a lookup pool. Therefore, the pool of lookup values needs to be larger than the distinct set of new values to mask. 

Recommendation 

Version 5.3. has some key improvements in the preparation process. We recommend the use of 5.3 or later versions. 

Mapping Values - Load and Map Processes 

The diagram below shows how the Mapping Algorithm use three storage containers in the masking engine. These are used in two different processes:  

  1. Load Values - when algorithm is Created or Edited.
  2. Map Values - where new data to be masked is mapped to available mapping values.  

Mapping Algorithm - Load and Map Process2.png

Load Values (into MA_map_value)

This process is started when new values are loaded in to Mapping Algorithm. 

  1. Remove duplicated values from loaded values. 
  2. Remove values already mapped (already in MA_values)
  3. Uniquely insert values into MA_map_value.
Note

Note:

 Only values not used before will be imported.

Map Values (into MA_values)

This process is started when the masking job is started and will run before any tables are masked. It is Preparing the Mapping values for the masking job. 

  1. Retrieve all values to be masked from source column (distinct values).
  2. Remove values already mapped (already in MA_values).
  3. Add the values from 1 and 2 (‘distinct and unmapped’) into MA_staging_tbl.
    • Count number of values in MA_map_value and MA_staging_tbl and compare (see below).
  4. Randomly map (pair) values in MA_staging_tbl with values in MA_map_value.
  5. Insert new mapped values into MA_values.
  6. Remove used values from MA_map_value.

Error 'Insufficient mapping values'

If there are not enough unique values left to mask the data in the field - that is not enough values in MA_map_value to map all values in MA_staging_tbl - the job is terminated with an error.

The error message when this happens is: 

  • 'x' below represent the number of unique values missing.
[JOB_ID_xx_yy] ... WriteToLogTrue.0 - ------------> Linenr 1------------------------------
[JOB_ID_10_48] ... WriteToLogTrue.0 - mapping_algo_id = 1.0
[JOB_ID_10_48] ... WriteToLogTrue.0 - proc_true = Difference is < 0
[JOB_ID_10_48] ... WriteToLogTrue.0 - difference = -x.0
[JOB_ID_10_48] ... WriteToLogTrue.0 -
[JOB_ID_10_48] ... WriteToLogTrue.0 - ====================
[JOB_ID_10_48] ... Filter rows.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
[JOB_ID_10_48] ... Abort.0 - ERROR : Row nr 1 causing abort : [1.0], [-x.0], [Difference is < 0]
[JOB_ID_10_48] ... Abort.0 - ERROR : ---------------Insufficient mapping values, please add more mapping values.[MAPPING_ID:1]
Warning

Warning:

If there are not enough unique values left to mask the data in the field - then the job will terminate with an error.

Solution 

Add additional unique and not already loaded values to the algorithm. 

Memory Requirements  

The Mapping algorithm requires memory to load the lookup values when the masking job starts. For large pool sizes, it is required to increase the Min/Max Memory settings in the Job Configuration. The size required depends on the lengths of the data in the lookup and number of distinct lookup values. 

As a ballpark figure, set max to 1 GB + 1 GB per 3 million rows and min 1-2 GB below the max value. It might be required to add more memory to cater for larger data objects and multiple masked columns. 

For more information, see Best Practice: Job Configuration on the Masking Engine

Ignore Characters 

Mapping Algorithm has a feature called 'Ignore Characters'.

The feature "Ignore Characters" is applied on the source data to be masked. It is used to ensure that even if the representation of the data is different due to added auxiliary characters, the value is masked to the same value.  

Example with and without "Ignore Characters" 

The example below shows masking with and without "Ignore Characters" (IC):

  • Without IC: Masked result is different even if surnames (last names) are very similar or the same.
  • With IC: It shows masked result with the following two characters added in "Ignore Characters" _and ':
    • '_Smith' is converted to 'Smith' and both masked to 'Johnson'.
    • 'O'Connor' is converted to 'Oconnor' and both masked to 'Williams'.

The added string in "Ignore Characters" is shown below - each character is separated by a comma.  

_,'
Ref Input Masked without IC  Masked with IC
1 Smith Johnson Johnson
2 _Smith Anderson Johnson
3 Oconnor Williams Williams
4 O'Connor Brown Williams

 


Related Articles

The following articles may provide more information or related information to this article:

Knowledge Base:

Documentation: