Skip to main content
Delphix

Algorithm - Segmented Mapping

 At a Glance

Available in:

Available in all known versions. .., 4.x, 5.0.x, 5.1.x and 5.2.x.

Characteristics:
 
Type
Referential
Integrity1

1:1 Mapping2

Strength

Comment
Numeric Lookup Yes  Yes/No3 Strong  
Alpha-Numeric Lookup Yes Yes/No3 Weak Slower than numeric.

1 Referential Integrity - The masked value will be the same between job executions as well as tables.
2 1:1 Mapping - The masked value will be mapped uniquely to the input value within masked column.

3 If the number of entries in 'Real Values' and 'Masked Values' are the same  - Yes the algorithm produces 1:1 Mapping. If not or the 'Masked Values' overlaps 'Real Values'  there will be masked duplicates (1:N Mapping) of some Masked values. 

Character 
Encodings:

In: US-ASCII 7 bit Alphabetic characters (letters and numbers)
Out: US-ASCII 7 bit alphabetic characters.

Lookup Pool Size:

Each segment (or sub-segment) can be Alpha-Numeric or Numeric. The pool sizes are:
Alpha-Numeric: 36 
Numeric: 2 digits > 100 - 3 digits > 1,000 - 4 digits > 10,000

When Min, Max and Range is selected the pool is smaller.

Limitations:

Does not work on any other character set than US-ASCII 7 bit.  
Patterns need to match (see below) or data is unmasked.

A large number of segments and Segmented Mapping algorithms loaded at the same time can cause internal database connection issues. 

Configuration:

The values in the mappings are generated when the algorithm is created and are NOT editable. 
To change segments and mappings the algorithm needs to be recreated.

The following fields are editable: DescriptionIgnore Characters, and Preserve Original Values.

 

High-Level Overview

Segmented Mapping (SM) is one of the more common algorithms. It is lightweight,  fast and it creates a 1:1 unique mapping. 

The mapping in the segments are stored in the Masking Engine's internal repository and the values are not editable. From version 5.2 of the Masking Engine - this algorithm will be part of the Engine Synchronization feature. 

UI - Creation and Modification

The algorithms are accessed from the tab Settings and under the submenu Algorithm. There are no built-in SM algorithms and the algorithm need to be created. Some fields can be modified after creation. 

  • To create click "Add Algorithm".
  • To modify click the green pen in the "Edit" column.

Note 1: Mapping Segments cannot be changed. If these need to be changed the algorithm needs to be recreated. 
Note 2: Recreating the algorithm will generate new mappings and masking result. 

User Interface 

The following popup is accessed when creating (and modifying) the algorithm:

UI Algorithm - Segmented Mapping Popup notes.png

  1. Algorithm Properties
    • Algorithm Name
    • *Description (editable)
  2. Masked Segments Details
    • See below
  3. *Ignore Characters (editable)
  4. *Unmasked Segments (editable)
    • Start position 
    • Length

* These fields are editable. 

Numeric Segment

The numeric segment has the following segment configuration. The number of mappings defined depends on the number of characters (digits) selected in the segment.

  • 2 digits > 100
  • 3 digits > 1,000
  • 4 digits > 10,000

UI_Segmented Mapping Numeric_Detailed.png

Configuration examples

Using length 2 as an example, the following configurations will generate these values:

  1. Empty - leaving all fields empty will generate 100 values:
    • 00, 01, 02, 03... ,10,11,..., 97, 98, 99
  2. Min #, Max# - setting the min and max to 11 and 45 will generate 35 values: 
    • 11, 12, 13, 14, 15, ... , 43, 44, 45
  3. Range - defining specific values in the range will generate those values - for example, 21, 25, 33, 24:
    • 21, 25, 33, 24

Alpha-Numeric Segment

The Alpha-Numeric is defined per character in the segment according to settings below: 

UI_Segmented Mapping Alpha-Numeric_Detailed.png

Configuration examples

Alpha-Numeric is configured per characters and the following configurations will generate these values:

  1. Empty - leaving all fields empty will generate 36 characters:     
    • 1,2,3,4,5,6,7,8,9,0,A,B,C,D, ... ,X,Y,Z
  2. Min #, Max# - setting the min and max to 1 and 4, and A and E will generate 9 values: 
    • 1,2,3,4,A,B,C,D,E
  3. Range - defining specific values in the range will generate those values - for example 2,1,5,3,A,F:
    • 2,1,5,3,A,F

Ignore Characters

The Segment Mapping algorithm has a feature called 'Ignore Characters'. In the Segment Mapping algorithm, this feature is used to remove specified characters before masking and then insert these back at the same location to ensure that all control and other characters are as the input.

Note: Ignore characters are only applied to Mapped Segments! They are NOT applied to Preserved Values.

Mapping Segments

Some salient points:

  • Segments are filled in full, from the left to right.
  • Segments are padded with 0 from the left.
  • Only Mapped Segments are applying Ignore Characters. 

Examples

Below are some examples of masked source data using different configurations of the Segmented Mapping algorithm.

The examples illustrated are:

  • Ignore Characters
  • Numeric 2 char, Numeric 2 char (N2N2)
  • Preserve 3 char, Numeric 2 char, Numeric 2 char, ignore '-' (P3N2N2_'-')

Ignore Characters 

The examples in the table below are showing Ignore Characters when using a simple segment configuration (P3N2N2).

Two cases are shown:

  1. No Ignore Character
  2. Ignore Character set to "-"

Segment configuration:

  • Preserver Values length 3
  • Numeric length 2
  • Numeric length 2
! Note! If the data is not fitting the pattern the data is note masked. 

+-------------+-------------+----------+-------------+-----------------------+
| Input Value | Ignore Char | Pattern  | Mask Result | Masked w Ignore Chars |
+-------------+-------------+----------+-------------+-----------------------+
| AB_0123     | [none]      | Match    | Masked      | AB_9048               |
+-------------+-------------+----------+-------------+-----------------------+
| AB_0-1-2-3  | [none]      | Mismatch | Unmasked    | AB_0-1-2-3            |
+-------------+-------------+----------+-------------+-----------------------+
| AB-0-1-2-3  | [none]      | Mismatch | Unmasked    | AB-0-1-2-3            |
+-------------+-------------+----------+-------------+-----------------------+
| AB_0123     | -           | Match    | Masked      | AB_9048               |
+-------------+-------------+----------+-------------+-----------------------+
| AB_0-1-2-3  | -           | Match    | Masked      | AB_9-0-4-8            |
+-------------+-------------+----------+-------------+-----------------------+
| AB-0-1-2-3  | -           | Match    | Masked      | AB-9-0-4-8            |
+-------------+-------------+----------+-------------+-----------------------+

Numeric 2 char, Numeric 2 char (N2N2)

Uses basic numeric configuration with all values empty (default). 

  • Segment 1: Numeric length 2 (nn)
  • Segment 2: Numeric length 2 (mm)
  • Pattern: nn|mm
+--------------+-------+--------+----------+-------+--------+
| Case         | Input |  nn|mm |  Pattern | xx|yy | Output |
+==============+=======+========+==========+=======+========+
| NULL         |  NULL |        | Mismatch |     - |   NULL |
+--------------+-------+--------+----------+-------+--------+
| Empty string |       |        | Mismatch |     - |        |
+--------------+-------+--------+----------+-------+--------+
| Short Alpha  |     A |        | Mismatch |     - |      A |
+--------------+-------+--------+----------+-------+--------+
| Normal       |  1234 |  12|34 |   Match  | 33|38 |   3338 |
+--------------+-------+--------+----------+-------+--------+
| Short        |   123 |  12|03 |   Match  | 33|48 |   3348 |
+--------------+-------+--------+----------+-------+--------+
| Short        |    12 |    12| |   Match  |   33| |     33 |
+--------------+-------+--------+----------+-------+--------+
| Short        |     1 |    01| |   Match  |   99| |     99 |
+--------------+-------+--------+----------+-------+--------+
| Short padded |    01 |    01| |   Match  |   99| |     99 |
+--------------+-------+--------+----------+-------+--------+
| Short padded |   001 |  00|01 |   Match  | 98|13 |   9813 |
+--------------+-------+--------+----------+-------+--------+
| Short padded |  0001 |  00|01 |   Match  | 98|13 |   9813 |
+--------------+-------+--------+----------+-------+--------+
| Short padded |   012 |  01|02 |   Match  | 99|91 |   9991 |
+--------------+-------+--------+----------+-------+--------+
| Short padded |  0012 |  00|12 |   Match  | 98|44 |   9844 |
+--------------+-------+--------+----------+-------+--------+
| Short padded |  0123 |  01|23 |   Match  | 99|60 |   9960 |
+--------------+-------+--------+----------+-------+--------+
| Too long     | 12345 | X23|45 | Mismatch |     - |  12345 |
+--------------+-------+--------+----------+-------+--------+
| Too long     | 1234b | X23|XX | Mismatch |     - |  1234b |
+--------------+-------+--------+----------+-------+--------+
| Invalid Type |  123b |  12|XX | Mismatch |     - |   123b |
+--------------+-------+--------+----------+-------+--------+
| Invalid Char |  -123 |        | Mismatch |     - |   -123 |
+--------------+-------+--------+----------+-------+--------+
| White space  |     1 |     01||   Match  |   99| |     99 |
+--------------+-------+--------+----------+-------+--------+
| White space  |    01 |  XX|01 | Mismatch |     - |     01 |
+--------------+-------+--------+----------+-------+--------+
| White space  |   001 |  00|01 |   Match  | 98|13 |   9813 |
+--------------+-------+--------+----------+-------+--------+
| White space  |  0001 | X00|01 | Mismatch |     - |   0001 |
+--------------+-------+--------+----------+-------+--------+

White space = added one leading space. Note the mismatch with ' 01'. 
This can be resolved by using Ignore Character (space). 

One preserve 3 char segment and two numeric two char segments, ignore '-' (P3N2N2_'-')

Uses the same basic Numeric configuration as above - with the ignore character "-" and preserving values starting at character 1 and of length 3.

  • Segment 1: Preserve from 1 length 3 (---)
  • Segment 2: Numeric length 2 (nn)
  • Segment 3: Numeric length 2 (mm)
  • Pattern: ---|nn|mm
+--------------+------------+-----------+----------+-----------+------------+
| Case         |      Input | ---|nn|mm |  Pattern | ---|xx|yy |     Output |
+--------------+------------+-----------+----------+-----------+------------+
| NULL         |       NULL |           | Mismatch |         - |       NULL |
+--------------+------------+-----------+----------+-----------+------------+
| Empty string |            |           | Mismatch |         - |            |
+--------------+------------+-----------+----------+-----------+------------+
| Normal       |    AB_0001 | AB_|00|01 |   Match  | AB_|34|10 |    AB_3410 |
+--------------+------------+-----------+----------+-----------+------------+
|              |   AB_000-1 | AB_|00|01 |   Match  | AB_|34|10 |   AB_341-0 |
+--------------+------------+-----------+----------+-----------+------------+
|              |   AB_00-01 | AB_|00|01 |   Match  | AB_|34|10 |   AB_34-10 |
+--------------+------------+-----------+----------+-----------+------------+
|              | AB_0-1-2-3 | AB_|01|23 |   Match  | AB_|88|25 | AB_8-8-2-5 |
+--------------+------------+-----------+----------+-----------+------------+
|              | AB-0-1-2-3 | AB-|01|23 |   Match  | AB_|88|25 | AB-8-8-2-5 |
+--------------+------------+-----------+----------+-----------+------------+
| Short        |          A |           | Mismatch |         - |          A |
+--------------+------------+-----------+----------+-----------+------------+
| Short        |         01 |           | Mismatch |         - |         01 |
+--------------+------------+-----------+----------+-----------+------------+
|              |        001 |           |   Match  |     001|| |        001 |
+--------------+------------+-----------+----------+-----------+------------+
|              |       0001 |           |   Match  |   000|01| |      00088 |
+--------------+------------+-----------+----------+-----------+------------+
|              |       AB_1 |   AB_|01| |   Match  |   AB_|88| |      AB_88 |
+--------------+------------+-----------+----------+-----------+------------+
|              |      AB_01 |   AB_|01| |   Match  |   AB_|88| |      AB_88 |
+--------------+------------+-----------+----------+-----------+------------+
|              |   A-B-0012 | A-B|00|12 |   Match  | A-B|47|74 |   A-B-4774 |
+--------------+------------+-----------+----------+-----------+------------+
| Not numeric  |   A-B_0012 | A-B|XX|12 | Mismatch |         - |   A-B_0012 |
+--------------+------------+-----------+----------+-----------+------------+
| Not numeric  |  A-B_01-23 | A-B|XX|23 | Mismatch |         - |  A-B_01-23 |
+--------------+------------+-----------+----------+-----------+------------+
| Not numeric  |     A-B_01 | A-B|XX|01 | Mismatch |         - |     A-B_01 |
+--------------+------------+-----------+----------+-----------+------------+
| Not numeric  |    AB_  01 | AB_|XX|01 | Mismatch |         - |    AB_  01 |
+--------------+------------+-----------+----------+-----------+------------+
| Not numeric  |    AB_aa01 | AB_|XX|01 | Mismatch |         - |    AB_aa01 |
+--------------+------------+-----------+----------+-----------+------------+
| Too long     |  ABC123456 | ABC|12|XX | Mismatch |         - |  ABC123456 |
+--------------+------------+-----------+----------+-----------+------------+

Common errors

There are some common errors when using Segmented Mapping. These are easy to resolve.

Segments not aligned correctly with Preserve Original Value

Key when using Segmented Mapping is to ensure the segments are matching If the segments need to match exactly or the data is passed through. It is recommended to look at the data before and then define the pattern. 

Note that Ignore Characters (see below) might be needed and will affect the segments. 

Ignore Characters

If there are 'space' or control characters in the masked data these need to be removed. In order to retain the format, these are later re-added back to the masked data.

An example where Ignore Characters is needed is when the data has white spaces (see example above).

An alternative method for removing white spaces is to using Custom SQL in the Rule Set. Please see here: 

Algorithms - Casting Values before Masking

Real and Masked Values not correctly set

Real (input) and Masked (output) values can be specifically set. This in order to just mask a specific value or range. 

Most of the time these don't need to be set and should be left empty.

Input string too long and therefore not masked

See the Segments not aligned correctly above. If the input data does not match the patterns the data will pass through unmasked. 

Understanding how segments are split

The segments are filled from the left and padded (when using numeric) from the left. 

DataType char and varchar works differently. 

Char and varchar will be masked differently if not considering white spaces. All data will be padded with spaces when using char - it is therefore important to either cast the input using Custom SQL in the Rule Set or using Ignore Characters (space). 

Only US-ASCII 7 bit

Segment Mapping can only be used with US-ASCII 7 characters. 

Additional Information