At a Glance
|Versions:||Applicable Delphix Masking versions: 4.x, 5.0.x, 5.1.x, 5.2.x, 5.3.x|
This is a fairly commonly used algorithm very similar to Secure Lookup with the difference that this generates a guaranteed 1:1 mapping.
1 Unique Lookup - Lookup values are all unique within the algorithm (dedup of new data each load time).
New data can be loaded (need to be new (not loaded before) or it will not be added to the pool).
Works with all characters encodings (some bugs are known).
|Lookup Pool Size:||
The size of the lookup pool depends on the amount of memory. For performance and memory, the recommended size is up to around 5,000,000.
Note: there are two processes to consider:
If a very large number of values are needed to be uniquely masked it would be better to use Segmented Mapping or Tokenization.
* The load time and pre-job preparation time (check distinct and uniqueness) can be long.
|Version Updates:||In version 5.2 the load time was improved (DLPX-43992).
In version 5.3 the dedup was improved.
This algorithm is called MA for short. There are no MA algorithms shipped out of the boxes with the Masking Engine. An MA algorithm needs to be created.
Use this algorithm when there cannot be any duplicate values and the masked data need to be 1:1 with the original. The MA algorithm has a pool of unmapped lookup values, which can be topped up when needed.
UI - Creation and Modification
The algorithms are accessed from the tab Settings and under the submenu Algorithm. From the Algorithm page, an MA algorithm can be created and also modified (edit):
- To create click "Add Algorithm".
- To modify click the green pen in the "Edit" column.
UI - Create
The following popup is accessed when creating and modifying the algorithm:
- Algorithm Properties
- Algorithm Name
- Lookup Details
- Lookup File Name
- Delimiter - the character used in the file to separate search and lookup value
UI - Modification
The following popup is presented when modifying the Mapping algorithm.
Note Pre 5.3 versions has the Append tick-box at the bottom. In 5.3. append is the default behavior.
Editable parameters are:
- List of lookup values
- The list of lookup values can be appended to the existing list or replacing the existing list.
- For pre 5.3. please select Append if append is needed or mappings will change.
- Ignore Characters
Note - To change the Name the algorithm needs to be deleted and then recreated.
Since the algorithm is retaining all mappings, new mappings are taken from a lookup pool. Therefore, the pool of lookup values needs to be large than the distinct set of new values to mask.
Version 5.3 recommended
Version 5.3. has some key improvements in the preparation process and it is recommended to use this or later versions.
Mapping Values - Ingestion and Prepare Process
Below is a diagram showing how the map values are loaded and prepared.
The Mapping Algorithm requires memory to load the lookup values when the masking job starts. For large pool sizes, it will be required to increase the Min/Max Memory settings in the Job Configuration. The size required depends on the lengths of the data in the lookup and number of distinct lookup values.
As a ballpark figure, set max to 1 GB + 1 GB per 3 million rows and min 1-2 GB below the max value. It might be required to add more memory to cater for larger data objects and multiple masked columns.
Mapping Algorithm has a feature called 'Ignore Characters'. The feature Ignore Characters is applied on the source data to be masked and is used to ensure that even if the representation of the data is different due to added auxiliary characters, the value is masked to the same value.
Example - without and with "Ignore Characters"
The example below shows masking without and with "Ignore Characters" (IC):
- Without IC - masked result are different even if surnames are very similar (same).
- With IC - the shows masked result with the following two characters added in "Ignore Characters"
- '_Smith' is converted to 'Smith' and both masked to 'Johnson'.
- 'O'Connor' is converted to 'Oconnor' and both masked to 'Williams'.
The added string in "Ignore Characters" is shown below - each character is separated by a comma.
|Ref||Input||Masked without IC||Masked with IC|
- Managing Algorithm Settings (including Mapping Algorithm)