Skip to main content
Delphix

Algorithm: Segment Mapping (KBA1775)

 

Overview

Segmented Mapping (SM) is one of the more common algorithms. It is lightweight, fast, and it creates a 1:1 unique mapping. 

The mapping in the segments are stored in the Masking Engine's internal repository and the values are not editable. From version 5.2 of the Masking Engine - this algorithm will be part of the Engine Synchronization feature. 

 At a Glance 

Available in versions

4.x, 5.0, 5.1. 5.2, and 5.3.

Characteristics
 
Type
Referential
Integrity1

1:1 Mapping2

Comment
Numeric Lookup Yes  Yes/No3  
Alpha-Numeric Lookup Yes Yes/No3 Slower than numeric.

1 Referential Integrity - The masked value will be the same between job executions as well as tables.
2 1:1 Mapping - The masked value will be mapped uniquely to the input value within masked column.

3 Yes, if  Min, Max, Range in 'Real Values' and 'Masked Values' are left empty, the algorithm produces 1:1 Mapping. 
Maybe, if Min, Max, Range are filled in it might generate duplicates.
No, if the "Real Values" range is larger than the 'Masked Values' range then, there will be some masked duplicates (N:1 Mapping). It is possible to get all 'Real Values' to be masked to one single value.

Character 
Encoding:
In: US-ASCII 7 bit Alphabetic characters (letters and numbers)
Out: US-ASCII 7 bit alphabetic characters.
Lookup Pool Size: Each segment (or sub-segment) can be Alpha-Numeric or Numeric. The pool sizes are:
 
Alpha-Numeric: <= 36
Numeric: 2 digits <= 100
3 digits <= 1,000
4 digits <= 10,000

When Min, Max and Range are selected the pool is smaller.

Segment Patterns: Each segment in the algorithm is a valid pattern and multiple data patterns can be masked with one algorithm. In Numeric, the data is padded with 0 from the left. In Alpha-Numeric each character is a segment.
Limitations: Does not work on any other character set than US-ASCII 7 bit.

Patterns need to match (see below) or data is unmasked.

A large number of segments and Segmented Mapping algorithms loaded at the same time can cause internal database connection issues. 

Configuration

The values in the mappings are generated when the algorithm is created and are NOT editable.  To change segments and mappings the algorithm needs to be recreated.

The following fields are editable: DescriptionIgnore Characters, and Preserve Original Values.

Failure Reporting: From version 5.3, the algorithm has a reporting mechanism for when the data is not matching the defined pattern. See below for log entries.

There is an Application Setting (contact support) that when enabled and the pattern is not matched will fail the job. 
Security Note: Data exceeding the full length of all patterns will pass through unmasked. There was also a bug in 5.2x (fixed in 5.2.6.1) where data not matching the full length was passed through unmasked.  Security Vulnerability - TB048.
Known (fixed) issue:  In version 5.3.0 -5.3.2 the 'Ignore Characters' needs to have a value or an error is reported. 

Creating and Modifying Algorithms

The algorithms are accessed from the tab Settings and under the submenu Algorithm. There are no built-in SM algorithms and the algorithm need to be created. Some fields can be modified after creation. 

  • To create click "Add Algorithm".
  • To modify click the green pen in the "Edit" column.

 

 
Note

Note

  • Mapping Segments cannot be changed. The algorithm needs to be recreated for any SM changes.
  • Recreating the algorithm will generate new mappings and masking result.

User Interface 

When creating (and modifying) the algorithm, the following popup will display:

UI_Segment Mapping Popup .png

  1. Algorithm Properties
    • Algorithm Name
    • Description *
  2. Masked Segments Details
    • See below
  3. Ignore Characters *
  4. Unmasked Segments *
    • Start position 
    • Length

* These fields are editable. 

Numeric Segment

The numeric segment has the following segment configuration. The number of mappings defined is dependent on the number of characters (digits) selected in the segment and less if Min, Max, and Range are used. 

Number of Characters in Numeric Segment (Digits) Number of Mappings
2
3
4
<= 100
<= 1,000
<= 10,000

UI_Segmented Mapping Numeric_Detailed.png

Configuration Examples

Using length 2 as an example, the following configurations will generate these values:

  1. Empty - leaving all fields empty will generate 100 values:
    • 00, 01, 02, 03... ,10,11,..., 97, 98, 99
  2. Min #, Max# - setting the min and max to 11 and 45 will generate 35 values: 
    • 11, 12, 13, 14, 15, ... , 43, 44, 45
  3. Range - defining specific values in the range will generate those values - for example, 21, 25, 33, 24:
    • 21, 25, 33, 24

Alpha-Numeric Segment

The Alpha-Numeric is defined per character in the segment according to settings below: 

UI_Segmented Mapping Alpha-Numeric_Detailed.png

Configuration Examples

Alpha-Numeric is configured per characters and the following configurations will generate these values:

  1. Empty - leaving all fields empty will generate 36 characters:     
    • 1,2,3,4,5,6,7,8,9,0,A,B,C,D, ... ,X,Y,Z
  2. Min #, Max# - setting the min and max to 1 and 4, and A and E will generate 9 values: 
    • 1,2,3,4,A,B,C,D,E
  3. Range - defining specific values in the range will generate those values - for example 2,1,5,3,A,F:
    • 2,1,5,3,A,F

Ignore Characters

The Segment Mapping algorithm has a feature called 'Ignore Characters'. In the Segment Mapping algorithm, this feature is used to remove specified characters before masking and then insert these back at the same location to ensure that all control and other characters are as the input.

Note

Note:

Ignore characters that are only applied to Mapped Segments. They are NOT applied to Preserved Values.

Mapping Segments

Some salient points:

  • There can be multiple patterns in one Algorithm.
  • Segments are filled in full, from the left to right.
  • First numeric segment is padded with 0 from the left.
  • Only Mapped Segments are applying Ignore Characters. 
  • Data that exceeds the length of all patterns will pass through unmasked. 

Below are some examples of masked source data using different configurations of the Segmented Mapping algorithm.

The examples illustrated are:

Ignore Characters 

The examples in the table below are showing Ignore Characters when using a simple segment configuration (P3N2N2).

Two cases are shown:

  1. No Ignore Character
  2. Ignore Character set to "-"

Segment configuration:

  • Preserver Values length 3
  • Numeric length 2
  • Numeric length 2
! Note! If the data is not fitting the pattern the data is note masked. 

+-------------+-------------+----------+-------------+-----------------------+
| Input Value | Ignore Char | Pattern  | Mask Result | Masked w Ignore Chars |
+-------------+-------------+----------+-------------+-----------------------+
| AB_0123     | [none]      | Match    | Masked      | AB_9048               |
| AB_0-1-2-3  | [none]      | Mismatch | Unmasked    | AB_0-1-2-3            |
| AB-0-1-2-3  | [none]      | Mismatch | Unmasked    | AB-0-1-2-3            |
+-------------+-------------+----------+-------------+-----------------------+
| AB_0123     | -           | Match    | Masked      | AB_9048               |
| AB_0-1-2-3  | -           | Match    | Masked      | AB_9-0-4-8            |
| AB-0-1-2-3  | -           | Match    | Masked      | AB-9-0-4-8            |
+-------------+-------------+----------+-------------+-----------------------+

Numeric and Alpha-Numeric Segments

Each segment is a valid pattern and for alpha-numeric, each character is a segment. If the length of data exceeds all segments the data will be passed through unmasked. 

+----+------------+------------+------------+
|    |            | Masked     | Masked     |
| ID | Input      | N4-N4      | A4-A4      |
+====+============+============+============+
| 1  | 1          | 6845       | 8          |  < Numeric match segment 1
| 2  | 11         | 2383       | 85         |   "
| 3  | 111        | 0912       | 850        |   "
| 4  | 1111       | 2024       | 8501       |   "
+----+------------+------------+------------+
| 5  | 11111      | 20248944   | 85013      |  < Numeric match segment 1 and 2
| 6  | 111111     | 20246102   | 850132     |   "
| 7  | 1111111    | 20240115   | 8501321    |   "
| 8  | 11111111   | 20243248   | 85013216   |   "
+----+------------+------------+------------+
| 9  | 111111111  | 111111111  | 111111111  |  < Exceeds segment length,
| 10 | 1111111111 | 1111111111 | 1111111111 |    data will pass thorugh
+----+------------+------------+------------+

Numeric 2 Char, Numeric 2 Char (N2N2)

Uses basic numeric configuration with all values empty (default). 

  • Segment 1: Numeric length 2 (nn)
  • Segment 2: Numeric length 2 (mm)
  • Pattern: nn|mm
+--------------+-------+--------+----------+-------+--------+
| Case         | Input |  nn|mm |  Pattern | xx|yy | Output |
+==============+=======+========+==========+=======+========+
| NULL         |  NULL |        | Mismatch |     - |   NULL |
| Empty string |       |        | Mismatch |     - |        |
| Short Alpha  |     A |        | Mismatch |     - |      A |
+--------------+-------+--------+----------+-------+--------+
| Normal       |  1234 |  12|34 |   Match  | 33|38 |   3338 |
+--------------+-------+--------+----------+-------+--------+
| Short        |   123 |  12|03 |   Match  | 33|48 |   3348 |
| Short        |    12 |    12| |   Match  |   33| |     33 |
| Short        |     1 |    01| |   Match  |   99| |     99 |
+--------------+-------+--------+----------+-------+--------+
| Short padded |    01 |    01| |   Match  |   99| |     99 |
| Short padded |   001 |  00|01 |   Match  | 98|13 |   9813 |
| Short padded |  0001 |  00|01 |   Match  | 98|13 |   9813 |
| Short padded |   012 |  01|02 |   Match  | 99|91 |   9991 |
| Short padded |  0012 |  00|12 |   Match  | 98|44 |   9844 |
| Short padded |  0123 |  01|23 |   Match  | 99|60 |   9960 |
+--------------+-------+--------+----------+-------+--------+
| Too long     | 12345 | X23|45 | Mismatch |     - |  12345 |
| Too long     | 1234b | X23|XX | Mismatch |     - |  1234b |
+--------------+-------+--------+----------+-------+--------+
| Invalid Type |  123b |  12|XX | Mismatch |     - |   123b |
+--------------+-------+--------+----------+-------+--------+
| Invalid Char |  -123 |        | Mismatch |     - |   -123 |
+--------------+-------+--------+----------+-------+--------+
| White space  |     1 |     01||   Match  |   99| |     99 |
| White space  |    01 |  XX|01 | Mismatch |     - |     01 |
| White space  |   001 |  00|01 |   Match  | 98|13 |   9813 |
| White space  |  0001 | X00|01 | Mismatch |     - |   0001 |
+--------------+-------+--------+----------+-------+--------+

White space = added one leading space. Note the mismatch with ' 01'. 
This can be resolved by using Ignore Character (space). 

Preserve 3 Char Segment and Two Numeric Two Char Segments, ignore '-' (P3N2N2_'-')

Uses the same basic Numeric configuration as above - with the ignore character "-" and preserving values starting at character 1 and of length 3.

  • Segment 1: Preserve from 1 length 3 (---)
  • Segment 2: Numeric length 2 (nn)
  • Segment 3: Numeric length 2 (mm)
  • Pattern: ---|nn|mm
+--------------+------------+-----------+----------+-----------+------------+
| Case         |      Input | ---|nn|mm |  Pattern | ---|xx|yy |     Output |
+--------------+------------+-----------+----------+-----------+------------+
| NULL         |       NULL |           | Mismatch |         - |       NULL |
| Empty string |            |           | Mismatch |         - |            |
+--------------+------------+-----------+----------+-----------+------------+
| Normal       |    AB_0001 | AB_|00|01 |   Match  | AB_|34|10 |    AB_3410 |
|              |   AB_000-1 | AB_|00|01 |   Match  | AB_|34|10 |   AB_341-0 |
|              |   AB_00-01 | AB_|00|01 |   Match  | AB_|34|10 |   AB_34-10 |
|              | AB_0-1-2-3 | AB_|01|23 |   Match  | AB_|88|25 | AB_8-8-2-5 |
|              | AB-0-1-2-3 | AB-|01|23 |   Match  | AB_|88|25 | AB-8-8-2-5 |
+--------------+------------+-----------+----------+-----------+------------+
| Short        |          A |           | Mismatch |         - |          A |
| Short        |         01 |           | Mismatch |         - |         01 |
+--------------+------------+-----------+----------+-----------+------------+
|              |        001 |           |   Match  |     001|| |        001 |
|              |       0001 |           |   Match  |   000|01| |      00088 |
|              |       AB_1 |   AB_|01| |   Match  |   AB_|88| |      AB_88 |
|              |      AB_01 |   AB_|01| |   Match  |   AB_|88| |      AB_88 |
|              |   A-B-0012 | A-B|00|12 |   Match  | A-B|47|74 |   A-B-4774 |
+--------------+------------+-----------+----------+-----------+------------+
| Not numeric  |   A-B_0012 | A-B|XX|12 | Mismatch |         - |   A-B_0012 |
| Not numeric  |  A-B_01-23 | A-B|XX|23 | Mismatch |         - |  A-B_01-23 |
| Not numeric  |     A-B_01 | A-B|XX|01 | Mismatch |         - |     A-B_01 |
| Not numeric  |    AB_  01 | AB_|XX|01 | Mismatch |         - |    AB_  01 |
| Not numeric  |    AB_aa01 | AB_|XX|01 | Mismatch |         - |    AB_aa01 |
+--------------+------------+-----------+----------+-----------+------------+
| Too long     |  ABC123456 | ABC|12|XX | Mismatch |         - |  ABC123456 |
+--------------+------------+-----------+----------+-----------+------------+

Common Errors

Some common errors when using Segmented Mapping:

Segments not aligned correctly with Preserve Original Value

Key when using Segmented Mapping is to ensure the segments are matching If the segments need to match exactly or the data is passed through. It is recommended to look at the data before and then define the pattern. 

Note that Ignore Characters (see below) might be needed and will affect the segments. 

Ignore characters

If there are 'space' or control characters in the masked data these need to be removed. In order to retain the format, these are later re-added back to the masked data.

An example where Ignore Characters is needed is when the data has white spaces (see example above).

An alternative method for removing white spaces is to using Custom SQL in the Rule Set. See  Algorithms - Casting Values before Masking

Real and masked values not correctly set

Real (input) and Masked (output) values can be specifically set. This in order to just mask a specific value or range. 

Most of the time these don't need to be set and should be left empty.

Input string too long and therefore not masked

See the Segments not aligned correctly above. If the input data does not match the patterns the data will pass through unmasked. 

Understanding how segments are split

The segments are filled from the left and padded (when using numeric) from the left. 

DataType char and varchar works differently. 

Char and varchar will be masked differently if not considering white spaces. All data will be padded with spaces when using char - it is therefore important to either cast the input using Custom SQL in the Rule Set or using Ignore Characters (space). 

Duplicated Masked Values

If Min, Max, or Range has been used when generating the segment values or if the data to be masked is shorter than the masked pattern the masked values might have duplicates. 

Only US-ASCII 7 bit

Segment Mapping can only be used with US-ASCII 7 characters. 

Segment Mapping Warnings

If the masked value is not matching the pattern, the following Warning is logged. The warning will be logged logarithmically every: 1, 10, 100, 1,000... occurrence.

There are two types:

  • Failed to segment value (locating)
  • Failed to create masked value (building)

Failed to segment value (locating)

This warning happens when the algorithm failed to identify segments - for example there is a letter in the incoming data into a numeric segment ('99I' instead of '991' - i instead of 1). 

[JOB_ID_7_57] 2018/06/04 11:42:36 - class com.masking.SegmentMapping.SegmentMapping - WARNING-UNMASKED-DATA: Segment mapping mask failure, pattern mismatch locating segment values at columnName=SSN (1+ occurrences)

Failed to create masked value (building)

This is when the masked data has trailing white spaces or similar and the ignore characters has not been specified - for example '999 ' instead of '999' and ignore space has not been specified. 

[JOB_ID_6_58] 2018/06/04 13:17:24 - class com.masking.SegmentMapping.SegmentMapping - WARNING-UNMASKED-DATA: Segment mapping mask failure, pattern mismatch building masked value at columnName=SSN (1+ occurrences)

Resolution 

When these two issues happens:

  • Check the data for inconsistencies.
  • If the inconsistency is understood:
    • see if ignore character can be used to resolve the issue.
    • see if the algorithm can be changed to expand the accepted patterns.