Skip to main content
Delphix

Algorithm: Secure Lookup (SL) (KBA1243)

 

KBA

KBA#1243

 

Secure Lookup (SL) is a very commonly used algorithm. When used correctly, the algorithm is fast and lightweight. Out of the box, Delphix Data Platform has approximately 20 predefined secure lookup algorithms. 

This document describes how Secure Lookup works and how to create a new one.  The Secure Lookup algorithm is a simple algorithm that uses a hashed code of the input to map to a lookup value. These values are imported during the creation of the lookup algorithm. This method ensures that the lookup value returns the same result each time and that the number of lookup values never runs out.

 

Warning

Warning:

Since values in the output are likely to repeat, this algorithm is not suitable for Primary Key columns or columns where the values need to be unique.

 

Note

Note:

It is recommended to limit the number of items in the Secure Lookup file to under 500,000.  

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases
6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0

5.3

5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

5.2

5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1

5.1

5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0

5.0

5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1 ,5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

4.3

4.3.1.0, 4.3.2.0, 4.3.2.1, 4.3.3.0, 4.3.4.0, 4.3.4.1, 4.3.5.0

4.2

4.2.0.0, 4.2.0.3, 4.2.1.0, 4.2.1.1, 4.2.2.0, 4.2.2.1, 4.2.3.0, 4.2.4.0 , 4.2.5.0, 4.2.5.1

4.1

4.1.0.0, 4.1.2.0, 4.1.3.0, 4.1.3.1, 4.1.3.2, 4.1.4.0, 4.1.5.0, 4.1.6.0

At a Glance

Characteristics:
 
Type
Unique
Lookup1
Referential
Integrity2

1:1 Mapping3

Comment
SL General Lookup No Yes  No3  
NULL-SL Lookup No Yes No Masks everything to NULL.

Unique Lookup - The loaded lookup values can be duplicated in SL. If a value needs to be represented more frequent - add more records. 
Referential Integrity - The masked value will be the same between tables, databases, and jobs.
1:1 Mapping - The masked value will be mapped uniquely to the input value.

3 Due to the randomness in the mapping allocation, there is no way to guarantee that same masked value will not appear even if the number of lookup values exceeds the number of unique values masked. To have 1:1 mapping please use the Mapping Algorithm. 

Output: The masked output value will be cut to the length of the column (as defined by the data type). 
Character 
Encodings:

Works with all character encodings. Encoding should give the same result. Data is trimmed. 
SL is Case Sensitive (Anders and ANDERS are different). See "Case Sensitivity" section below for the workaround.  

Lookup Pool Size: Recommended size up to 500,000
Limitations: Large pool size (rows in the lookup file) will take a long time to load into the Transformation 
Engine. Please see below for more information.
Customization:

Creation: Lookup values are uploaded from a text file to the Masking Engine when the algorithm is created.

Modify: The Description and the set of Lookup Values can be changed. 

Creating and Modifying Algorithms using the User Interface

The algorithms are accessed from the Settings tab >  Algorithm. A custom algorithm can be created or modified.

  • To create click Add Algorithm.
  • To modify click the edit icon the Edit column.

User Interface

The following popup is displayed when creating and modifying the algorithm.

image.png

  1. When creating an algorithm, lookup values are loaded from a text file you supply when the algorithm is created. You need to provide details for Algorithm Properties:
    • Algorithm Name
    • Description
  2. When modifying an algorithm, the Lookup details (description and lookup value) can be modified:
    • Lookup File Name

 

Note

Note:

By default, editable parameters are Description and List of Lookup Values only. To modify any other parameter, the algorithm needs to be deleted and recreated.

Considerations

There are a few considerations:

  • Pool Size and Load Time 
  • Memory Requirements
  • Case Sensitivity
  • Masked Data Encoded with Non-Standard Code Page

Pool Size and Load Time 

The pool size (number of rows in the lookup file) affects both the creation time and the load time. An estimate is that the increase in load time is 3 : 10 - increase the number of values to load by 3 times and it will take 10 times longer. On an average system, it will take 4 min to load 500,000 values. Loading 1,500,00 will take 40 min. 

The graph below shows the load time (duration) in hh:mm:ss for a Secure Lookup with x number of rows.  

Perf_Algorithm_Loadtime (SL).png

Memory Requirements  

The ingested values are loaded into RAM when the masking job starts. For large pool sizes, it might be required to increase the Min/Max Memory settings in the Job Configuration. The size required depends on the lengths of the data in the lookup and the number of lookup values. 

Best Practice: Job Configuration on the Masking Engine

Case Sensitivity   

The Secure Lookup is case sensitive. This can cause an issue if the same result is expected, independent of the case. There is a workaround below or you can ask for our Technical Services group to create a custom algorithm that masks the database and retains the case. We are currently investigating future enhancements around case sensitivity.

Workaround:

To resolve this issue;

  1. Open and edit the Rule Set.
  2. Add a Custom SQL statement with the following amendment:

Change: ..., maskMeCol, ...

 to: ..., UPPER(maskMeCol) as maskMeCol, ...

Example:
SELECT ID, UPPER(maskMeCol) as maskMeCol from myTable;

Masked Data Encoded with Non-Standard Code Page  

The data to be masked might be encoded and contain characters that are not converted correctly to UTF-8. The masked data is masked as "????". This issue is resolved in version 5.1.

The algorithms which have characters outside UTF-8 are ADDRESS LINE SL, ADDRESS LINE2 SL, and US_COUNTIES_SL.

If experiencing this issue please create a case with Delphix support. 

Use cases outside the feature scope 

There are two use cases frequently requested:

  • Retain case
  • Mask multiple columns values (Full Name, First Name, and Last Name)

These use cases are not covered with the Out-Of-The-Box Secure Lookup Algorithms. Technical Services can assist in creating a custom algorithm that masks the data based on specific requirements.

We are also currently investigating future enhancements and new algorithms.

Examples   

The example below displays an example of a masked result.

FIRST NAME SL was used in this example on a column defined with varchar(8).

  • Note 1: NULL and white space strings. 
  • Note 2: case sensitivity. 
  • Note 3: spaces are trimmed.
  • Note 4: 1:N mapping - the masked values 'collide'.
  • Note 5: strings are cut (cropped) to fit the column (database masking only).
+-----+--------+----------+
| Ref | Source | Masked   |
+=====+========+==========+
| 1   | NULL   | NULL     | << Note 1: NULL and white space strings(' used to indicate space)
| 2   | ''     | ''       | <<    "
| 3   | ' '    | ' '      | <<    "
| 4   | '   '  | ' '      | <<    "
| 5   | Peter  | Flo      |
| 6   | PETER  | Yoshiko  | << Note 2: Case insensitivity examples.
| 7   | peter  | Paul     | <<   "
| 8   |  Peter | Flo      | << Note 3: Leading and/or trailing whitespaces are trimmed.
| 9   | Bar    | Yoshiko  | << Note 4: 1:N Mapping - no unique mapping.
| 10  | Foo    | Christop | << Note 5: Masked value ('Christopher') cropped.
+-----+--------+----------+

 

 

Related Articles

The following articles may provide more information or related information to this article: