Skip to main content
Delphix

Algorithm: Secure Lookup (SL) (KBA1243)

 

 

KBA

KBA#1243

 

Secure Lookup (SL) is a very commonly used algorithm. The algorithm is fast and lightweight. Out of the box, Delphix Data Platform has approximately 20 predefined secure lookup algorithms. 

Applicable Delphix Versions 

From 6.0.4 the Secure Lookup moved to a new Framework. All upgraded engines will have the old SL upgraded to the new Framework automatically. 

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases
6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1

5.3

5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0, 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

5.2

5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1

5.1

5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0

5.0

5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1, 5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

At a Glance

Summary: This document describes how Secure Lookup works and how to create a new one. The Secure Lookup algorithm is a simple algorithm that uses a hashed code of the input to map to a lookup value. These values are imported during the creation of the lookup algorithm. This method ensures that the lookup value returns the same result each time and that the number of lookup values never runs out.
Characteristics:
  FrmWrk UL1 RI2 1:13 NSV4 AC5 Ex6
SL New SL v2 No Yes  No No Yes Yes
SL Legacy SL No Yes No No No No
  • 1 Unique values in Lookup - The Lookup Values are Deduped before Load.  
  • 2 Referential Integrity - The masked value will be the same between tables, databases, and jobs.
  • 3 1:1 Mapping - The masked value will be mapped uniquely to the input value.
  • 4 Never mask to Same Value - The framework is designed so that the masked value will never be the same as the Input value.
  • 5 Algorithm Chaining - Indicates if the algorithm can be chained (see Doc). 
  • 6 Extensible - Supports the installation of plugins, written in Java (See Doc).

Due to the randomness in the mapping allocation, there is no way to guarantee that the same masked value will not appear even if the number of lookup values exceeds the number of unique values masked. To have 1:1 mapping please use the Mapping Algorithm. 

Output: The masked output value will be cut to the length of the column (as defined by the data type). 

Warning: Some Special Characters can be extended on the database causing the length to be longer than max allowed for the field. See KBA5285 (link at the end).
Character 
Encodings:
UTF-8
Case Sensitivity: The new SL Framework enables SL with Case Sensitivity. 
  • Pre 6.0.4 - SL is Case Sensitive. (Anders and ANDERS are different).
  • 6.0.4 - Introduce a new SL Framework which enables the sensitivity to be configured. 

See "Case Sensitivity" section below for the workaround.  

PK Warning: Warning: This algorithm is NOT suitable for Primary Key columns or columns where the values need to be unique. The masked values will repeat. 
 
Lookup Pool Size: Note: Recommended size is to limit the number of lookup values to under 500,000.
Limitations: Large pool size (rows in the lookup file) will take a long time to load into the Transformation 
Engine. Please see below for more information.
Customization:

Creation: Lookup values are uploaded from a text file to the Masking Engine when the algorithm is created.

Modify: The Description and the set of Lookup Values can be changed. 

Creating and Modifying Algorithms using the User Interface

The algorithms are accessed from the Settings tab >  Algorithm. A custom algorithm can be created or modified.

  • To create click Add Algorithm.
  • To modify click the edit icon the Edit column.

User Interface: 6.0.11 and Newer

The new create Secure Lookup Algorithm dialog looks like this.

Masking_UI_-_6.0.11.0_-_Create_Algorithm_SL_.png

Creation 

When creating a Secure Lookup Algorithm, you need to provide details for Algorithm Properties:

  • Algorithm Name.
  • Description.
  • Output (Masked) Case. 
  • Hash Method (how the lookup id is calculated).
  • Case Sensitive Lookup.
  • The algorithm lookup file (loaded from a text file).
     

Modification  

When modifying an algorithm, the following details can be modified:

  • Description.
  • Output (Masked) Case. 
  • Hash Method (how the lookup id is calculated).
  • Case Sensitive Lookup.
  • The algorithm lookup file (loaded from a text file).

 

 

Note

Note:

To modify the algorithm name, the algorithm needs to be deleted and recreated.

User Interface: Previous to 6.0.11

The following dialog is displayed when creating and modifying the algorithm.

image.png

Creation 

When creating a Secure Lookup Algorithm, you need to provide details for Algorithm Properties:

  • Algorithm Name.
  • Description.
  • The algorithm lookup file (loaded from a text file).

Modification

When modifying an algorithm, the following details can be modified:

  • Description.
  • The algorithm lookup file (loaded from a text file).
Note

Note:

To modify the algorithm name, the algorithm needs to be deleted and recreated.

Considerations

There are a few considerations:

  • Pool Size and Load Time 
  • Memory Requirements
  • Case Sensitivity
  • Masked Data Encoded with Non-Standard Code Page

Pool Size and Load Time 

The pool size (number of rows in the lookup file) affects both the creation time and the load time. An estimate is that the increase in load time is 3 : 10 - increase the number of values to load by 3 times and it will take 10 times longer. On an average system, it will take 4 min to load 500,000 values. Loading 1,500,00 will take 40 min. 

The graph below shows the load time (duration) in hh:mm:ss for a Secure Lookup with x number of rows.  

Perf_Algorithm_Loadtime (SL).png

Memory Requirements  

The ingested values are loaded into RAM when the masking job starts. For large pool sizes, it might be required to increase the Min/Max Memory settings in the Job Configuration. The size required depends on the lengths of the data in the lookup and the number of lookup values. 

Best Practice: Job Configuration on the Masking Engine

Case Sensitivity   

New: From 6.0.4

From 6.0.4 the case sensitivity can be configured in the Secure Lookup algorithm. 

For details, the best is to look here: 

Old: Pre 6.0.4

The Secure Lookup is case sensitive. This can cause an issue if the same result is expected, independent of the case. There is a workaround below or you can ask for our Technical Services group to create a custom algorithm that masks the database and retains the case. We are currently investigating future enhancements around case sensitivity.

Workaround:

To resolve this issue;

  1. Open and edit the Rule Set.
  2. Add a Custom SQL statement with the following amendment:

Change: ..., maskMeCol, ...

 to: ..., UPPER(maskMeCol) as maskMeCol, ...

Example:
SELECT ID, UPPER(maskMeCol) as maskMeCol from myTable;

Masked Data Encoded with Non-Standard Code Page  

The data to be masked might be encoded and contain characters that are not converted correctly to UTF-8. The masked data is masked as "????". This issue is resolved in version 5.1.

The algorithms which have characters outside UTF-8 are ADDRESS LINE SL, ADDRESS LINE2 SL, and US_COUNTIES_SL.

If experiencing this issue please create a case with Delphix support. 

Use cases outside the feature scope 

There are two use cases frequently requested:

  • Retain case
  • Mask multiple columns values (Full Name, First Name, and Last Name)

These use cases are not covered with the Out-Of-The-Box Secure Lookup Algorithms. Technical Services can assist in creating a custom algorithm that masks the data based on specific requirements.

We are also currently investigating future enhancements and new algorithms.

Examples   

The example below displays an example of a masked result.

FIRST NAME SL was used in this example on a column defined with varchar(8).

  • Note 1: NULL and white space strings. 
  • Note 2: case sensitivity. 
  • Note 3: spaces are trimmed.
  • Note 4: 1:N mapping - the masked values 'collide'.
  • Note 5: strings are cut (cropped) to fit the column (database masking only).
+-----+--------+----------+
| Ref | Source | Masked   |
+=====+========+==========+
| 1   | NULL   | NULL     | << Note 1: NULL and white space strings(' used to indicate space)
| 2   | ''     | ''       | <<    "
| 3   | ' '    | ' '      | <<    "
| 4   | '   '  | ' '      | <<    "
| 5   | Peter  | Flo      |
| 6   | PETER  | Yoshiko  | << Note 2: Case insensitivity examples.
| 7   | peter  | Paul     | <<   "
| 8   |  Peter | Flo      | << Note 3: Leading and/or trailing whitespaces are trimmed.
| 9   | Bar    | Yoshiko  | << Note 4: 1:N Mapping - no unique mapping.
| 10  | Foo    | Christop | << Note 5: Masked value ('Christopher') cropped.
+-----+--------+----------+

 

 

Related Articles

The following articles may provide more information or related information to this article: