Algorithm: Segment Mapping (KBA1775)
KBA
KBA#1775Overview
Segmented Mapping (SM) is one of the more common algorithms. It is lightweight and it creates a 1:1 unique mapping on most data types.
From Masking Engine (ME) version 5.2 - this algorithm is a part of the Engine Synchronization feature.
From ME version 5.3 - the algorithm will generate a Non-Conforming data warning if the data is not matching the algorithm pattern.
At a Glance
Characteristics |
1 Referential Integrity - The masked value will be the same between job executions as well as tables. 3 Yes, if Min, Max, Range in 'Real Values', and 'Masked Values' are left empty, the algorithm produces 1:1 Mapping. |
|||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Character Encoding: |
In: US-ASCII 7 bit Alphabetic characters (letters and numbers) Out: US-ASCII 7 bit alphabetic characters. |
|||||||||||||||
Lookup Pool Size: | Each segment (or sub-segment) can be Alpha-Numeric or Numeric. The pool sizes are:
When Min, Max and Range are selected the pool is smaller. |
|||||||||||||||
Segment Patterns: | Each segment in the algorithm is a valid pattern and multiple data patterns can be masked with one algorithm. In Numeric, the data is padded with 0 from the left. In Alpha-Numeric each character is a segment. | |||||||||||||||
Limitations: | The algorithm does not work on any other character set than US-ASCII 7-bit.
Patterns need to match (see below) or a Non-Conforming data warning is raised. A large number of segments and Segmented Mapping algorithms loaded at the same time can cause internal database connection issues. |
|||||||||||||||
Configuration |
All value mappings are pre-generated when the algorithm is created and are NOT editable. To change segments and mappings the algorithm needs to be recreated. The following fields are editable: Description, Ignore Characters, and Preserve Original Values. |
|||||||||||||||
Failure Reporting: | From version 5.3, the algorithm has a reporting mechanism for when the data is not matching the defined pattern. See below for log entries. There is an Application Setting (contact support) that when enabled and the pattern is not matched will fail the job. |
|||||||||||||||
Security Note: | In some 5.2 version, the data exceeding the full length of all patterns passed through unmasked. Security Vulnerability - TB048. | |||||||||||||||
Known (fixed) issue: | In version 5.3.0 - 5.3.2 the 'Ignore Characters' needs to have at least one character or an error is reported. |
Applicable Delphix Versions
- Click here to view the versions of the Delphix engine to which this article applies
-
Major Release All Sub Releases 5.3
5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0 5.2
5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1
5.1
5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0
5.0
5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1 ,5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4
4.3
4.3.1.0, 4.3.2.0, 4.3.2.1, 4.3.3.0, 4.3.4.0, 4.3.4.1, 4.3.5.0
4.2
4.2.0.0, 4.2.0.3, 4.2.1.0, 4.2.1.1, 4.2.2.0, 4.2.2.1, 4.2.3.0, 4.2.4.0 , 4.2.5.0, 4.2.5.1
4.1
4.1.0.0, 4.1.2.0, 4.1.3.0, 4.1.3.1, 4.1.3.2, 4.1.4.0, 4.1.5.0, 4.1.6.0
Creating and Modifying Algorithms
The algorithms are accessed from the tab Settings and under the submenu Algorithm. There are no built-in SM algorithms and the algorithm need to be created. Some fields can be modified after creation.
- To create click "Add Algorithm".
- To modify click the green pen in the "Edit" column.
User Interface
When creating (and modifying) the algorithm, the following popup will display:
- Algorithm Properties
- Algorithm Name
- Description *
- Masked Segments Details
- See below
- Ignore Characters *
- Unmasked Segments *
- Start position
- Length
* These fields are editable.
Numeric Segment
The numeric segment has the following segment configuration. The number of mappings defined is dependent on the number of characters (digits) selected in the segment and less if Min, Max, and Range are used.
Number of Characters in Numeric Segment (Digits) | Number of Mappings |
2 3 4 |
<= 100 <= 1,000 <= 10,000 |
Configuration Examples
Using length 2 as an example, the following configurations will generate these values:
- Empty - leaving all fields empty will generate 100 values:
- 00, 01, 02, 03... ,10,11,..., 97, 98, 99
- Min #, Max# - setting the min and max to 11 and 45 will generate 35 values:
- 11, 12, 13, 14, 15, ... , 43, 44, 45
- Range - defining specific values in the range will generate those values - for example, 21, 25, 33, 24:
- 21, 25, 33, 24
Alpha-Numeric Segment
The Alpha-Numeric is defined per character in the segment according to settings below:
Configuration Examples
Alpha-Numeric is configured per characters and the following configurations will generate these values:
- Empty - leaving all fields empty will generate 36 characters:
- 1,2,3,4,5,6,7,8,9,0,A,B,C,D, ... ,X,Y,Z
- Min #, Max# - setting the min and max to 1 and 4, and A and E will generate 9 values:
- 1,2,3,4,A,B,C,D,E
- Range - defining specific values in the range will generate those values - for example 2,1,5,3,A,F:
- 2,1,5,3,A,F
Preserve Original Values
The "Preserve Original Values" feature causes the masking process to skip the defined segments. For example, if the first 4 characters have a special meaning and you want to make sure that part of the value is undisturbed, you would add a preserve segment that makes the masking operation skip those characters. The first character in the input is considered position 1. So, using our example, you would have a "Preserve Original Characters" of position 1 for length 4.
Ignore Characters
The Segment Mapping algorithm has a feature called 'Ignore Characters'. In the Segment Mapping algorithm, this feature is used to remove any instances of specified characters before masking and then insert these back at the same location to ensure that all control and other characters are as they appeared in the input.
Mapping Segments
Some salient points:
- There can be multiple patterns in one Algorithm.
- Segments are filled in full, from the left to right.
- First numeric segment is padded with 0 from the left.
- Only Mapped Segments are applying Ignore Characters.
- Data that exceeds the length of all patterns will pass through unmasked.
Below are some examples of masked source data using different configurations of the Segmented Mapping algorithm.
The examples illustrated are:
- Ignore Characters
- Numeric and Alpha-Numeric on CHAR
- Numeric and Alpha-Numeric on INT
- N2N2 on CHAR
- Duplicates from NUMERIC data types
- Preserve characters on CHAR
Examples
Ignore Characters
The examples in the table below are showing Ignore Characters when using a simple segment configuration (P3N2N2).
Two cases are shown:
- No Ignore Character
- Ignore Character set to "-"
Segment configuration:
- Preserver Values length 3
- Numeric length 2
- Numeric length 2
+-------------+-------------+----------+-------------+-----------------------+
| Input Value | Ignore Char | Pattern | Mask Result | Resulting Output |
+-------------+-------------+----------+-------------+-----------------------+
| AB_0123 | [none] | Match | Masked | AB_9048 |
| AB_0-1-2-3 | [none] | Mismatch | Unmasked | AB_0-1-2-3 |
| AB-0-1-2-3 | [none] | Mismatch | Unmasked | AB-0-1-2-3 |
+-------------+-------------+----------+-------------+-----------------------+
| AB_0123 | - | Match | Masked | AB_9048 |
| AB_0-1-2-3 | - | Match | Masked | AB_9-0-4-8 |
| AB-0-1-2-3 | - | Match | Masked | AB-9-0-4-8 |
+-------------+-------------+----------+-------------+-----------------------+
Numeric and Alpha-Numeric on CHAR
Each segment is a valid pattern and for alpha-numeric, each character is a segment. If the length of data exceeds all segments the data will generate a Non-Conforming data warning.
+----+------------+------------+------------+
| | | Masked | Masked |
| ID | Input | N4-N4 | A4-A4 |
+====+============+============+============+
| 1 | 1 | 6845 | 8 | < Numeric match segment 1
| 2 | 11 | 2383 | 85 | "
| 3 | 111 | 0912 | 850 | "
| 4 | 1111 | 2024 | 8501 | "
+----+------------+------------+------------+
| 5 | 11111 | 20248944 | 85013 | < Numeric match segment 1 and 2
| 6 | 111111 | 20246102 | 850132 | "
| 7 | 1111111 | 20240115 | 8501321 | "
| 8 | 11111111 | 20243248 | 85013216 | "
+----+------------+------------+------------+
| 9 | 111111111 | 111111111 | 111111111 | < Exceeds segment length,
| 10 | 1111111111 | 1111111111 | 1111111111 | data will pass thorugh
+----+------------+------------+------------+
Numeric and Alpha-Numeric on INT
When the data type is a numerical type (for example INT) there can be duplicates as shown in examples 9, 10, and 11. If the length of data exceeds all segments the data will generate a Non-Conforming data warning.
+----+------------+------------+------------+
| | | Masked | Masked |
| ID | Input | N3-N3 | A3-A3 |
+====+============+============+============+
| 1 | 1 | 520 | 8 | < Numeric match segment 1.
| 2 | 11 | 420 | 81 | "
| 3 | 111 | 284 | 819 | "
+----+------------+------------+------------+
| 4 | 1111 | 284144 | 8194 | < Numeric match segment 1 and 2.
| 5 | 11111 | 284997 | 81944 | "
| 6 | 111111 | 284478 | 819446 | "
+----+------------+------------+------------+
| 7 | 1111111 | 1111111 | 111111111 | < Exceeds segment length,
| 8 | 11111111 | 11111111 | 1111111111 | Non-Conforming data warning.
+----+------------+------------+------------+
| 9 | 62 | 753 | 8 | < Alpha masked to "08" = 8.
+----+------------+------------+------------+
| 10 | 2491 | 957144 | 2754 | < Numeric seg 2 masked both
| 11 | 24901 | 957144 | 27574 | 1 and 01 masked to 144.
+----+------------+------------+------------+
N2N2 on CHAR
Uses basic numeric configuration with all values empty (default).
- Segment 1: Numeric length 2 (nn)
- Segment 2: Numeric length 2 (mm)
- Pattern: nn|mm
+--------------+-------+--------+----------+-------+--------+ | Case | Input | nn|mm | Pattern | xx|yy | Output | +==============+=======+========+==========+=======+========+ | NULL | NULL | | Mismatch | - | NULL | | Empty string | | | Mismatch | - | | | Short Alpha | A | | Mismatch | - | A | +--------------+-------+--------+----------+-------+--------+ | Normal | 1234 | 12|34 | Match | 33|38 | 3338 | | Normal msk 0 | 4867 | 48|67 | Match | 00|00 | 0000 |
+--------------+-------+--------+----------+-------+--------+ | Short | 123 | 12|03 | Match | 33|48 | 3348 | | Short | 12 | 12|-- | Match | 33|-- | 33 | | Short | 1 | 01|-- | Match | 99|-- | 99 | +--------------+-------+--------+----------+-------+--------+ | Short padded | 01 | 01|-- | Match | 99|-- | 99 | | Short padded | 001 | 00|01 | Match | 98|13 | 9813 | | Short padded | 0001 | 00|01 | Match | 98|13 | 9813 | | Short padded | 012 | 01|02 | Match | 99|91 | 9991 | | Short padded | 0012 | 00|12 | Match | 98|44 | 9844 | | Short padded | 0123 | 01|23 | Match | 99|60 | 9960 | +--------------+-------+--------+----------+-------+--------+ | Too long | 12345 | X23|45 | Mismatch | - | 12345 | | Too long | 1234b | X23|XX | Mismatch | - | 1234b | +--------------+-------+--------+----------+-------+--------+ | Invalid Type | 123b | 12|XX | Mismatch | - | 123b | +--------------+-------+--------+----------+-------+--------+ | Invalid Char | -123 | | Mismatch | - | -123 | +--------------+-------+--------+----------+-------+--------+ | White space | 1 | 01|-- | Match | 99|-- | 99 | | White space | 01 | XX|01 | Mismatch | - | 01 | | White space | 001 | 00|01 | Match | 98|13 | 9813 | | White space | 0001 | X00|01 | Mismatch | - | 0001 | +--------------+-------+--------+----------+-------+--------+ White space = added one leading space. Note the mismatch with ' 01'. This can be resolved by using Ignore Character (space).
Duplicates from NUMERIC data types
This use-case illustrates the risk of duplicates when masking and the masked value is inserted into a NUMERIC data type. Both Alpha-Numeric and Numeric Segments can result in duplicates when inserted into any NUMBER data type.
Example: N2N2
- Segment 1: Numeric length 2 (nn)
- Segment 2: Numeric length 2 (mm)
- Pattern: nn|mm
+--------------+-------+--------+----------+-------+--------+
| Case | Input | nn|mm | Pattern | xx|yy | Output |
+==============+=======+========+==========+=======+========+
| NULL | NULL | | >> | - | NULL |
| Empty string | | | >> | - | |
+--------------+-------+--------+----------+-------+--------+
| Normal | 5468 | 54|67 | Match | 00|00 | 0 |
+--------------+-------+--------+----------+-------+--------+
| Mask 0 Dup | 1302 | 13|02 | Match | 00|89 | 89 |
| Mask 0 Dup | 132 | 13|02 | Match | 00|89 | 89 |
| Mask 0 Dup | 96 | 96|-- | Match | 89|-- | 89 |
+--------------+-------+--------+----------+-------+--------+
Preserve characters on CHAR
Uses the same basic Numeric configuration as above - with the ignore character "-" and preserving values starting at character 1 and of length 3.
- Segment 1: Preserve from 1 length 3 (---)
- Segment 2: Numeric length 2 (nn)
- Segment 3: Numeric length 2 (mm)
- Pattern: ---|nn|mm
+--------------+------------+-----------+----------+-----------+------------+
| Case | Input | ---|nn|mm | Pattern | ---|xx|yy | Output |
+--------------+------------+-----------+----------+-----------+------------+
| NULL | NULL | | Mismatch | - | NULL |
| Empty string | | | Mismatch | - | |
+--------------+------------+-----------+----------+-----------+------------+
| Normal | AB_0001 | AB_|00|01 | Match | AB_|34|10 | AB_3410 |
| | AB_000-1 | AB_|00|01 | Match | AB_|34|10 | AB_341-0 |
| | AB_00-01 | AB_|00|01 | Match | AB_|34|10 | AB_34-10 |
| | AB_0-1-2-3 | AB_|01|23 | Match | AB_|88|25 | AB_8-8-2-5 |
| | AB-0-1-2-3 | AB-|01|23 | Match | AB_|88|25 | AB-8-8-2-5 |
+--------------+------------+-----------+----------+-----------+------------+
| Short | A | | Mismatch | - | A |
| Short | 01 | | Mismatch | - | 01 |
+--------------+------------+-----------+----------+-----------+------------+
| | 001 | | Match | 001|--|-- | 001 |
| | 0001 | | Match | 000|01|-- | 00088 |
| | AB_1 | AB_|01|-- | Match | AB_|88|-- | AB_88 |
| | AB_01 | AB_|01|-- | Match | AB_|88|-- | AB_88 |
| | A-B-0012 | A-B|00|12 | Match | A-B|47|74 | A-B-4774 |
+--------------+------------+-----------+----------+-----------+------------+
| Not numeric | A-B_0012 | A-B|XX|12 | Mismatch | - | A-B_0012 |
| Not numeric | A-B_01-23 | A-B|XX|23 | Mismatch | - | A-B_01-23 |
| Not numeric | A-B_01 | A-B|XX|01 | Mismatch | - | A-B_01 |
| Not numeric | AB_ 01 | AB_|XX|01 | Mismatch | - | AB_ 01 |
| Not numeric | AB_aa01 | AB_|XX|01 | Mismatch | - | AB_aa01 |
+--------------+------------+-----------+----------+-----------+------------+
| Too long | ABC123456 | ABC|12|XX | Mismatch | - | ABC123456 |
+--------------+------------+-----------+----------+-----------+------------+
Common Errors
Some common errors when using Segmented Mapping:
Segments not aligned correctly with Preserve Original Value
Key when using Segmented Mapping is to ensure the segments are matching If the segments need to match exactly or the data is passed through. It is recommended to look at the data before and then define the pattern.
Note that Ignore Characters (see below) might be needed and will affect the segments.
Ignore characters
If there are 'space' or control characters in the masked data these need to be removed. In order to retain the format, these are later re-added back to the masked data.
An example where Ignore Characters is needed is when the data has white spaces (see example above).
An alternative method for removing white spaces is to using Custom SQL in the Rule Set. See Algorithms - Casting Values before Masking
Real and masked values not correctly set
Real (input) and Masked (output) values can be specifically set, in order to just mask a specific value or range.
Most of the time these do not need to be set and should be left empty.
Input string too long and therefore not masked
See the Segments not aligned correctly above. If the input data does not match the patterns the data will pass through unmasked.
Understanding how segments are split
The segments are filled from the left and padded (when using numeric) from the left.
DataType char and varchar works differently.
CHAR and VARCHAR will be masked differently if not considering white spaces. All data will be padded with spaces when using CHAR - it is therefore important to either cast the input using Custom SQL in the Rule Set or using Ignore Characters (space).
Duplicated Masked Values
If Min, Max, or Range has been used when generating the segment values or if the data to be masked is shorter than the masked pattern the masked values might have duplicates.
There might also be duplicated values if the data type is NUMBER. The reason for this is when a value is masked to leading '0', the '0' is removed by the database when the value is written back to the database column. It could then be a duplicate with a value masked to this value but without the leading '0'. Example '007' and '7' are the same when written to a NUMBER column.
Non-Conforming Data Warnings
When an input-value is not matching a segment pattern, the data will not be masked and the algorithm will report Non-Conforming Data error.
There is a case where the masking engine incorrectly reports 'Non-Confirming Data warning'. If the input-value is the same as 'Ignore Characters' this is incorrectly reported as not masked (DLPX-71666).
Only US-ASCII 7-bit
Segment Mapping can only be used with US-ASCII 7-bit characters.
Segment Mapping Warnings
If the masked value is not matching the pattern, the following Warning is logged. The warning will be logged logarithmically every: 1, 10, 100, 1,000... occurrence.
There are two types:
- Failed to segment value (locating)
- Failed to create masked value (building)
Failed to segment value (locating)
This warning happens when the algorithm failed to identify segments - for example there is a letter in the incoming data into a numeric segment ('99I' instead of '991' - i instead of 1).
[JOB_ID_7_57] 2018/06/04 11:42:36 - class com.masking.SegmentMapping.SegmentMapping - WARNING-UNMASKED-DATA: Segment mapping mask failure, pattern mismatch locating segment values at columnName=SSN (1+ occurrences)
Failed to create masked value (building)
This is when the masked data has trailing white spaces or similar and the ignore characters has not been specified - for example '999 ' instead of '999' and ignore space has not been specified.
[JOB_ID_6_58] 2018/06/04 13:17:24 - class com.masking.SegmentMapping.SegmentMapping - WARNING-UNMASKED-DATA: Segment mapping mask failure, pattern mismatch building masked value at columnName=SSN (1+ occurrences)
Resolution
When these two issues happens:
- Check the data for inconsistencies.
- If the inconsistency is understood:
- see if ignore character can be used to resolve the issue.
- see if the algorithm can be changed to expand the accepted patterns.
Related Articles
Knowledge Base Links: