Skip to main content
Delphix

Algorithm: Segment Mapping (KBA1775)

 

KBA

KBA#1775

Overview

Segmented Mapping (SM) is one of the more common algorithms. It is lightweight and it creates a 1:1 unique mapping on most data types. 

From Masking Engine (ME) version 5.2 - this algorithm is a part of the Engine Synchronization feature. 
From ME version 5.3 - the algorithm will generate a Non-Conforming data warning if the data is not matching the algorithm pattern. 

 At a Glance 

Characteristics
 
Type
Referential
Integrity1

1:1 Mapping2

Comment
Numeric Lookup Yes  Yes/No3  
Alpha-Numeric Lookup Yes Yes/No3 Slower than numeric.

1 Referential Integrity - The masked value will be the same between job executions as well as tables.

2 1:1 Mapping - The masked value will be mapped uniquely to the input value within masked column.

3 Yes, if  Min, Max, Range in 'Real Values', and 'Masked Values' are left empty, the algorithm produces 1:1 Mapping. 
Maybe, if Min, Max, Range are filled in it might generate duplicates.
No, if the "Real Values" range is larger than the 'Masked Values' range then, there will be some masked duplicates (N:1 Mapping). It is possible to get all 'Real Values' to be masked to one single value.

Character 
Encoding:
In: US-ASCII 7 bit Alphabetic characters (letters and numbers)
Out: US-ASCII 7 bit alphabetic characters.
Lookup Pool Size: Each segment (or sub-segment) can be Alpha-Numeric or Numeric. The pool sizes are:
 
Alpha-Numeric: <= 36
Numeric: 2 digits <= 100
3 digits <= 1,000
4 digits <= 10,000

When Min, Max and Range are selected the pool is smaller.

Segment Patterns: Each segment in the algorithm is a valid pattern and multiple data patterns can be masked with one algorithm. In Numeric, the data is padded with 0 from the left. In Alpha-Numeric each character is a segment.
Limitations: The algorithm does not work on any other character set than US-ASCII 7-bit.

Patterns need to match (see below) or a Non-Conforming data warning is raised.

A large number of segments and Segmented Mapping algorithms loaded at the same time can cause internal database connection issues. 

Limited support for columns of numeric types.  The database will typically drop leading zeros.  Therefore, masked values with leading "0" are likely resulting in duplicates.

Configuration

All value mappings are pre-generated when the algorithm is created and are NOT editable.  To change segments and mappings the algorithm needs to be recreated.

The following fields are editable: DescriptionIgnore Characters, and Preserve Original Values.

Failure Reporting: From version 5.3, the algorithm has a reporting mechanism for when the data is not matching the defined pattern. See below for log entries.

There is an Application Setting (contact support) that when enabled and the pattern is not matched will fail the job. 
Security Note: In some 5.2 version, the data exceeding the full length of all patterns passed through unmasked.  Security Vulnerability - TB048.
Known (fixed) issue:  In version 5.3.0 - 5.3.2 the 'Ignore Characters' needs to have at least one character or an error is reported. 

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases

5.3

5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

5.2

5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1

5.1

5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0

5.0

5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1 ,5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

4.3

4.3.1.0, 4.3.2.0, 4.3.2.1, 4.3.3.0, 4.3.4.0, 4.3.4.1, 4.3.5.0

4.2

4.2.0.0, 4.2.0.3, 4.2.1.0, 4.2.1.1, 4.2.2.0, 4.2.2.1, 4.2.3.0, 4.2.4.0 , 4.2.5.0, 4.2.5.1

4.1

4.1.0.0, 4.1.2.0, 4.1.3.0, 4.1.3.1, 4.1.3.2, 4.1.4.0, 4.1.5.0, 4.1.6.0

Creating and Modifying Algorithms

The algorithms are accessed from the tab Settings and under the submenu Algorithm. There are no built-in SM algorithms and the algorithm need to be created. Some fields can be modified after creation. 

  • To create click "Add Algorithm".
  • To modify click the green pen in the "Edit" column.

 

 
Note

Note

  • Mapping Segments cannot be changed. The algorithm needs to be recreated for any SM changes.
  • Recreating the algorithm will generate new mappings and masking result.

User Interface 

When creating (and modifying) the algorithm, the following popup will display:

UI_Segment Mapping Popup .png

  1. Algorithm Properties
    • Algorithm Name
    • Description *
  2. Masked Segments Details
    • See below
  3. Ignore Characters *
  4. Unmasked Segments *
    • Start position 
    • Length

* These fields are editable. 

Numeric Segment

The numeric segment has the following segment configuration. The number of mappings defined is dependent on the number of characters (digits) selected in the segment and less if Min, Max, and Range are used. 

Number of Characters in Numeric Segment (Digits) Number of Mappings
2
3
4
<= 100
<= 1,000
<= 10,000

UI_Segmented Mapping Numeric_Detailed.png

Configuration Examples

Using length 2 as an example, the following configurations will generate these values:

  1. Empty - leaving all fields empty will generate 100 values:
    • 00, 01, 02, 03... ,10,11,..., 97, 98, 99
  2. Min #, Max# - setting the min and max to 11 and 45 will generate 35 values: 
    • 11, 12, 13, 14, 15, ... , 43, 44, 45
  3. Range - defining specific values in the range will generate those values - for example, 21, 25, 33, 24:
    • 21, 25, 33, 24

Alpha-Numeric Segment

The Alpha-Numeric is defined per character in the segment according to settings below: 

UI_Segmented Mapping Alpha-Numeric_Detailed.png

Configuration Examples

Alpha-Numeric is configured per characters and the following configurations will generate these values:

  1. Empty - leaving all fields empty will generate 36 characters:     
    • 1,2,3,4,5,6,7,8,9,0,A,B,C,D, ... ,X,Y,Z
  2. Min #, Max# - setting the min and max to 1 and 4, and A and E will generate 9 values: 
    • 1,2,3,4,A,B,C,D,E
  3. Range - defining specific values in the range will generate those values - for example 2,1,5,3,A,F:
    • 2,1,5,3,A,F

Preserve Original Values

The "Preserve Original Values" feature causes the masking process to skip the defined segments.  For example, if the first 4 characters have a special meaning and you want to make sure that part of the value is undisturbed, you would add a preserve segment that makes the masking operation skip those characters.  The first character in the input is considered position 1.  So, using our example, you would have a "Preserve Original Characters" of position 1 for length 4.

Ignore Characters

The Segment Mapping algorithm has a feature called 'Ignore Characters'. In the Segment Mapping algorithm, this feature is used to remove any instances of specified characters before masking and then insert these back at the same location to ensure that all control and other characters are as they appeared in the input.

Note

Note:

Ignore characters that are only applied to Mapped Segments. They are NOT applied to Preserved Values.

Mapping Segments

Some salient points:

  • There can be multiple patterns in one Algorithm.
  • Segments are filled in full, from the left to right.
  • First numeric segment is padded with 0 from the left.
  • Only Mapped Segments are applying Ignore Characters. 
  • Data that exceeds the length of all patterns will pass through unmasked. 

Below are some examples of masked source data using different configurations of the Segmented Mapping algorithm.

The examples illustrated are:

Examples

Ignore Characters 

The examples in the table below are showing Ignore Characters when using a simple segment configuration (P3N2N2).

Two cases are shown:

  1. No Ignore Character
  2. Ignore Character set to "-"

Segment configuration:

  • Preserver Values length 3
  • Numeric length 2
  • Numeric length 2
Note

Note:

If the data is not fitting the segment then a Non-Conforming warning is thrown.

+-------------+-------------+----------+-------------+-----------------------+
| Input Value | Ignore Char | Pattern  | Mask Result | Resulting Output      |
+-------------+-------------+----------+-------------+-----------------------+
| AB_0123     | [none]      | Match    | Masked      | AB_9048               |
| AB_0-1-2-3  | [none]      | Mismatch | Unmasked    | AB_0-1-2-3            |
| AB-0-1-2-3  | [none]      | Mismatch | Unmasked    | AB-0-1-2-3            |
+-------------+-------------+----------+-------------+-----------------------+
| AB_0123     | -           | Match    | Masked      | AB_9048               |
| AB_0-1-2-3  | -           | Match    | Masked      | AB_9-0-4-8            |
| AB-0-1-2-3  | -           | Match    | Masked      | AB-9-0-4-8            |
+-------------+-------------+----------+-------------+-----------------------+

Numeric and Alpha-Numeric on CHAR

Each segment is a valid pattern and for alpha-numeric, each character is a segment. If the length of data exceeds all segments the data will generate a Non-Conforming data warning.  

+----+------------+------------+------------+
|    |            | Masked     | Masked     |
| ID | Input      | N4-N4      | A4-A4      |
+====+============+============+============+
| 1  | 1          | 6845       | 8          |  < Numeric match segment 1
| 2  | 11         | 2383       | 85         |   "
| 3  | 111        | 0912       | 850        |   "
| 4  | 1111       | 2024       | 8501       |   "
+----+------------+------------+------------+
| 5  | 11111      | 20248944   | 85013      |  < Numeric match segment 1 and 2
| 6  | 111111     | 20246102   | 850132     |   "
| 7  | 1111111    | 20240115   | 8501321    |   "
| 8  | 11111111   | 20243248   | 85013216   |   "
+----+------------+------------+------------+
| 9  | 111111111  | 111111111  | 111111111  |  < Exceeds segment length,
| 10 | 1111111111 | 1111111111 | 1111111111 |    data will pass thorugh
+----+------------+------------+------------+

Numeric and Alpha-Numeric on INT 

When the data type is a numerical type (for example INT) there can be duplicates as shown in examples 9, 10, and 11. If the length of data exceeds all segments the data will generate a Non-Conforming data warning. 

+----+------------+------------+------------+
|    |            | Masked     | Masked     |
| ID | Input      | N3-N3      | A3-A3      |
+====+============+============+============+
| 1  | 1          | 520        | 8          |  < Numeric match segment 1.
| 2  | 11         | 420        | 81         |   "
| 3  | 111        | 284        | 819        |   "
+----+------------+------------+------------+
| 4  | 1111       | 284144     | 8194       |  < Numeric match segment 1 and 2.
| 5  | 11111      | 284997     | 81944      |   "
| 6  | 111111     | 284478     | 819446     |   "
+----+------------+------------+------------+
| 7  | 1111111    | 1111111    | 111111111  |  < Exceeds segment length,
| 8  | 11111111   | 11111111   | 1111111111 |    Non-Conforming data warning.
+----+------------+------------+------------+
| 9  | 62         | 753        | 8          |  < Alpha masked to "08" = 8.
+----+------------+------------+------------+
| 10 | 2491       | 957144     | 2754       |  < Numeric seg 2 masked both
| 11 | 24901      | 957144     | 27574      |    1 and 01 masked to 144.
+----+------------+------------+------------+

N2N2 on CHAR 

Uses basic numeric configuration with all values empty (default). 

  • Segment 1: Numeric length 2 (nn)
  • Segment 2: Numeric length 2 (mm)
  • Pattern: nn|mm
+--------------+-------+--------+----------+-------+--------+
| Case         | Input |  nn|mm |  Pattern | xx|yy | Output |
+==============+=======+========+==========+=======+========+
| NULL         |  NULL |        | Mismatch |     - |   NULL |
| Empty string |       |        | Mismatch |     - |        |
| Short Alpha  |     A |        | Mismatch |     - |      A |
+--------------+-------+--------+----------+-------+--------+
| Normal       |  1234 |  12|34 |   Match  | 33|38 |   3338 |
| Normal msk 0 |  4867 |  48|67 |   Match  | 00|00 |   0000 |
+--------------+-------+--------+----------+-------+--------+
| Short        |   123 |  12|03 |   Match  | 33|48 |   3348 |
| Short        |    12 |  12|-- |   Match  | 33|-- |     33 |
| Short        |     1 |  01|-- |   Match  | 99|-- |     99 |
+--------------+-------+--------+----------+-------+--------+
| Short padded |    01 |  01|-- |   Match  | 99|-- |     99 |
| Short padded |   001 |  00|01 |   Match  | 98|13 |   9813 |
| Short padded |  0001 |  00|01 |   Match  | 98|13 |   9813 |
| Short padded |   012 |  01|02 |   Match  | 99|91 |   9991 |
| Short padded |  0012 |  00|12 |   Match  | 98|44 |   9844 |
| Short padded |  0123 |  01|23 |   Match  | 99|60 |   9960 |
+--------------+-------+--------+----------+-------+--------+
| Too long     | 12345 | X23|45 | Mismatch |     - |  12345 |
| Too long     | 1234b | X23|XX | Mismatch |     - |  1234b |
+--------------+-------+--------+----------+-------+--------+
| Invalid Type |  123b |  12|XX | Mismatch |     - |   123b |
+--------------+-------+--------+----------+-------+--------+
| Invalid Char |  -123 |        | Mismatch |     - |   -123 |
+--------------+-------+--------+----------+-------+--------+
| White space  |     1 |  01|-- |   Match  | 99|-- |     99 |
| White space  |    01 |  XX|01 | Mismatch |     - |     01 |
| White space  |   001 |  00|01 |   Match  | 98|13 |   9813 |
| White space  |  0001 | X00|01 | Mismatch |     - |   0001 |
+--------------+-------+--------+----------+-------+--------+

White space = added one leading space. Note the mismatch with ' 01'. 
This can be resolved by using Ignore Character (space). 

Duplicates from NUMERIC data types 

This use-case illustrates the risk of duplicates when masking and the masked value is inserted into a NUMERIC data type. Both Alpha-Numeric and Numeric Segments can result in duplicates when inserted into any NUMBER data type.

Example: N2N2

  • Segment 1: Numeric length 2 (nn)
  • Segment 2: Numeric length 2 (mm)
  • Pattern: nn|mm
+--------------+-------+--------+----------+-------+--------+
| Case         | Input |  nn|mm |  Pattern | xx|yy | Output |
+==============+=======+========+==========+=======+========+
| NULL         |  NULL |        |    >>    |     - |   NULL |
| Empty string |       |        |    >>    |     - |        |
+--------------+-------+--------+----------+-------+--------+
| Normal       |  5468 |  54|67 |   Match  | 00|00 |      0 |
+--------------+-------+--------+----------+-------+--------+
| Mask 0 Dup   |  1302 |  13|02 |   Match  | 00|89 |     89 |
| Mask 0 Dup   |   132 |  13|02 |   Match  | 00|89 |     89 |
| Mask 0 Dup   |    96 |  96|-- |   Match  | 89|-- |     89 |
+--------------+-------+--------+----------+-------+--------+

Preserve characters on CHAR 

Uses the same basic Numeric configuration as above - with the ignore character "-" and preserving values starting at character 1 and of length 3.

  • Segment 1: Preserve from 1 length 3 (---)
  • Segment 2: Numeric length 2 (nn)
  • Segment 3: Numeric length 2 (mm)
  • Pattern: ---|nn|mm
+--------------+------------+-----------+----------+-----------+------------+
| Case         |      Input | ---|nn|mm |  Pattern | ---|xx|yy |     Output |
+--------------+------------+-----------+----------+-----------+------------+
| NULL         |       NULL |           | Mismatch |         - |       NULL |
| Empty string |            |           | Mismatch |         - |            |
+--------------+------------+-----------+----------+-----------+------------+
| Normal       |    AB_0001 | AB_|00|01 |   Match  | AB_|34|10 |    AB_3410 |
|              |   AB_000-1 | AB_|00|01 |   Match  | AB_|34|10 |   AB_341-0 |
|              |   AB_00-01 | AB_|00|01 |   Match  | AB_|34|10 |   AB_34-10 |
|              | AB_0-1-2-3 | AB_|01|23 |   Match  | AB_|88|25 | AB_8-8-2-5 |
|              | AB-0-1-2-3 | AB-|01|23 |   Match  | AB_|88|25 | AB-8-8-2-5 |
+--------------+------------+-----------+----------+-----------+------------+
| Short        |          A |           | Mismatch |         - |          A |
| Short        |         01 |           | Mismatch |         - |         01 |
+--------------+------------+-----------+----------+-----------+------------+
|              |        001 |           |   Match  | 001|--|-- |        001 |
|              |       0001 |           |   Match  | 000|01|-- |      00088 |
|              |       AB_1 | AB_|01|-- |   Match  | AB_|88|-- |      AB_88 |
|              |      AB_01 | AB_|01|-- |   Match  | AB_|88|-- |      AB_88 |
|              |   A-B-0012 | A-B|00|12 |   Match  | A-B|47|74 |   A-B-4774 |
+--------------+------------+-----------+----------+-----------+------------+
| Not numeric  |   A-B_0012 | A-B|XX|12 | Mismatch |         - |   A-B_0012 |
| Not numeric  |  A-B_01-23 | A-B|XX|23 | Mismatch |         - |  A-B_01-23 |
| Not numeric  |     A-B_01 | A-B|XX|01 | Mismatch |         - |     A-B_01 |
| Not numeric  |    AB_  01 | AB_|XX|01 | Mismatch |         - |    AB_  01 |
| Not numeric  |    AB_aa01 | AB_|XX|01 | Mismatch |         - |    AB_aa01 |
+--------------+------------+-----------+----------+-----------+------------+
| Too long     |  ABC123456 | ABC|12|XX | Mismatch |         - |  ABC123456 |
+--------------+------------+-----------+----------+-----------+------------+

Common Errors

Some common errors when using Segmented Mapping:

Segments not aligned correctly with Preserve Original Value 

Key when using Segmented Mapping is to ensure the segments are matching If the segments need to match exactly or the data is passed through. It is recommended to look at the data before and then define the pattern. 

Note that Ignore Characters (see below) might be needed and will affect the segments. 

Ignore characters 

If there are 'space' or control characters in the masked data these need to be removed. In order to retain the format, these are later re-added back to the masked data.

An example where Ignore Characters is needed is when the data has white spaces (see example above).

An alternative method for removing white spaces is to using Custom SQL in the Rule Set. See  Algorithms - Casting Values before Masking

Real and masked values not correctly set 

Real (input) and Masked (output) values can be specifically set. This in order to just mask a specific value or range. 

Most of the time these don't need to be set and should be left empty.

Input string too long and therefore not masked 

See the Segments not aligned correctly above. If the input data does not match the patterns the data will pass through unmasked. 

Understanding how segments are split 

The segments are filled from the left and padded (when using numeric) from the left. 

DataType char and varchar works differently.  

CHAR and VARCHAR will be masked differently if not considering white spaces. All data will be padded with spaces when using CHAR - it is therefore important to either cast the input using Custom SQL in the Rule Set or using Ignore Characters (space). 

Duplicated Masked Values 

If Min, Max, or Range has been used when generating the segment values or if the data to be masked is shorter than the masked pattern the masked values might have duplicates. 

There might also be duplicated values if the data type is NUMBER. The reason for this is when a value is masked to leading '0', the '0' is removed by the database when the value is written back to the database column. It could then be a duplicate with a value masked to this value but without the leading '0'. Example '007' and '7' are the same when written to a NUMBER column.

Note

Note:

There is a way to configure the algorithm so this is not happening. Please reach out to Technical Services for assistance.

Non-Conforming Data Warnings  

When an input-value is not matching a segment pattern, the data will not be masked and the algorithm will report Non-Conforming Data error.

There is a case where the masking engine incorrectly reports 'Non-Confirming Data warning'. If the input-value is the same as 'Ignore Characters' this is incorrectly reported as not masked (DLPX-71666). 

Only US-ASCII 7-bit 

Segment Mapping can only be used with US-ASCII 7-bit characters. 

Segment Mapping Warnings

If the masked value is not matching the pattern, the following Warning is logged. The warning will be logged logarithmically every: 1, 10, 100, 1,000... occurrence.

There are two types:

  • Failed to segment value (locating)
  • Failed to create masked value (building)

Failed to segment value (locating)

This warning happens when the algorithm failed to identify segments - for example there is a letter in the incoming data into a numeric segment ('99I' instead of '991' - i instead of 1). 

[JOB_ID_7_57] 2018/06/04 11:42:36 - class com.masking.SegmentMapping.SegmentMapping - WARNING-UNMASKED-DATA: Segment mapping mask failure, pattern mismatch locating segment values at columnName=SSN (1+ occurrences)

Failed to create masked value (building)

This is when the masked data has trailing white spaces or similar and the ignore characters has not been specified - for example '999 ' instead of '999' and ignore space has not been specified. 

[JOB_ID_6_58] 2018/06/04 13:17:24 - class com.masking.SegmentMapping.SegmentMapping - WARNING-UNMASKED-DATA: Segment mapping mask failure, pattern mismatch building masked value at columnName=SSN (1+ occurrences)

Resolution 

When these two issues happens:

  • Check the data for inconsistencies.
  • If the inconsistency is understood:
    • see if ignore character can be used to resolve the issue.
    • see if the algorithm can be changed to expand the accepted patterns.