This algorithm is called MA for short. There are no MA algorithms shipped out of the boxes with the Masking Engine. An MA algorithm needs to be created.
Use this algorithm when there cannot be any duplicate values and the masked data need to be 1:1 with the original. The MA algorithm has a pool of unmapped lookup values, which can be topped up when needed.
At a Glance
|Versions:||Applicable Delphix Masking versions: 4.x, 5.0.x, 5.1.x, 5.2.x, 5.3.x|
This is a fairly commonly used algorithm very similar to Secure Lookup with the difference that this generates a guaranteed 1:1 mapping.
1 Unique Lookup - Lookup values are all unique within the algorithm (dedup of new data each load time).
New data can be loaded (need to be new (not loaded before) or it will not be added to the pool).
Works with all characters encodings (some bugs are known).
|Lookup Pool Size:||
The size of the lookup pool depends on the amount of memory. For performance and memory, the recommended size is up to around 5,000,000.
Note: there are two processes to consider:
If a very large number of values are needed to be uniquely masked it would be better to use Segmented Mapping or Tokenization.
* The load time and pre-job preparation time (check distinct and uniqueness) can be long.
|Version Updates:||In version 5.2 the load time was improved (DLPX-43992).
In version 5.3 the dedup was improved.
Creating and Modifying Algorithms using the User Interface
The algorithms are accessed from the tab Settings and under the submenu Algorithm. From the Algorithm page, an MA algorithm can be created and also modified (edit):
- To create click "Add Algorithm".
- To modify click the green pen in the "Edit" column.
The following popup screens are accessed when creating and modifying the algorithm:
- When creating an algorithm, lookup values are loaded when the algorithm is created from a text file. You need to provide details for Algorithm Properties:
- Algorithm Name
- When modifying an algorithm, the Lookup details (description and lookup value) can be modified:
- Lookup File Name
- Delimiter - the character used in the file to separate search and lookup value
The following popup is presented when modifying the Mapping algorithm.
The editable parameters are:
- Append list of lookup values.
- The list of lookup values can be appended to the existing list or replacing the existing list.
- Note that for versions prior to 5.3. you need to select Append if necessary, else mappings will change.
- Ignore Characters
Since the algorithm is retaining all mappings, new mappings are taken from a lookup pool. Therefore, the pool of lookup values needs to be larger than the distinct set of new values to mask.
Version 5.3. has some key improvements in the preparation process. We recommend the use of 5.3 or later versions.
Mapping Values - Load and Map Processes
The diagram below shows how the Mapping Algorithm use three storage containers in the masking engine. These are used in two different processes:
- Load Values - when algorithm is Created or Edited.
- Map Values - where new data to be masked is mapped to available mapping values.
Load Values (into MA_map_value)
This process is started when new values are loaded in to Mapping Algorithm.
- Remove duplicated values from loaded values.
- Remove values already mapped (already in MA_values)
- Uniquely insert values into MA_map_value.
Map Values (into MA_values)
This process is started when the masking job is started and will run before any tables are masked. It is Preparing the Mapping values for the masking job.
- Retrieve all values to be masked from source column (distinct values).
- Remove values already mapped (already in MA_values).
- Add the values from 1 and 2 (‘distinct and unmapped’) into MA_staging_tbl.
- Count number of values in MA_map_value and MA_staging_tbl and compare (see below).
- Randomly map (pair) values in MA_staging_tbl with values in MA_map_value.
- Insert new mapped values into MA_values.
- Remove used values from MA_map_value.
Error 'Insufficient mapping values'
If there are not enough unique values left to mask the data in the field - that is not enough values in MA_map_value to map all values in MA_staging_tbl - the job is terminated with an error.
The error message when this happens is:
- 'x' below represent the number of unique values missing.
[JOB_ID_xx_yy] ... WriteToLogTrue.0 - ------------> Linenr 1------------------------------ [JOB_ID_10_48] ... WriteToLogTrue.0 - mapping_algo_id = 1.0 [JOB_ID_10_48] ... WriteToLogTrue.0 - proc_true = Difference is < 0 [JOB_ID_10_48] ... WriteToLogTrue.0 - difference = -x.0 [JOB_ID_10_48] ... WriteToLogTrue.0 - [JOB_ID_10_48] ... WriteToLogTrue.0 - ==================== [JOB_ID_10_48] ... Filter rows.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0) [JOB_ID_10_48] ... Abort.0 - ERROR : Row nr 1 causing abort : [1.0], [-x.0], [Difference is < 0] [JOB_ID_10_48] ... Abort.0 - ERROR : ---------------Insufficient mapping values, please add more mapping values.[MAPPING_ID:1]
Add additional unique and not already loaded values to the algorithm.
The Mapping algorithm requires memory to load the lookup values when the masking job starts. For large pool sizes, it is required to increase the Min/Max Memory settings in the Job Configuration. The size required depends on the lengths of the data in the lookup and number of distinct lookup values.
As a ballpark figure, set max to 1 GB + 1 GB per 3 million rows and min 1-2 GB below the max value. It might be required to add more memory to cater for larger data objects and multiple masked columns.
For more information, see Best Practice: Job Configuration on the Masking Engine
Mapping Algorithm has a feature called 'Ignore Characters'.
The feature "Ignore Characters" is applied on the source data to be masked. It is used to ensure that even if the representation of the data is different due to added auxiliary characters, the value is masked to the same value.
Example with and without "Ignore Characters"
The example below shows masking with and without "Ignore Characters" (IC):
- Without IC: Masked result is different even if surnames (last names) are very similar or the same.
- With IC: It shows masked result with the following two characters added in "Ignore Characters"
- '_Smith' is converted to 'Smith' and both masked to 'Johnson'.
- 'O'Connor' is converted to 'Oconnor' and both masked to 'Williams'.
The added string in "Ignore Characters" is shown below - each character is separated by a comma.
|Ref||Input||Masked without IC||Masked with IC|
The following articles may provide more information or related information to this article:
- Algorithm: Understanding Secure Lookup (SL) Algorithms
- Algorithm: Segmented Mapping
- Best Practice: Job Configuration on the Masking Engine
- Docs 5.3: Configuring Your Own Algorithms