Skip to main content
Delphix

Analyze Non-Conforming Data in Masking (KBA5039)

 

KBA

KBA# 5039

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases
6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0

5.3

5.3.4.0, 5.3.5.0 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

At a Glance

Description: This page describes how to analyze Non-Conforming data Warnings/Errors. 
Non-Conforming Classification:   

The Classification used on the Masking Engine is: 

Classification Code US ASCII Unicode 
Letter 'L' A,B,C,D,..,Z Letters.
Number 'N' 0,1,2,3,..,9 Numbers. 
Marks 'M' Not applicable Marks.
Separators 'Z' [Space] Separators.
Punctuations 'P' (, {, [, ",' ., !,  etc Punctuations.
Symbols 'S' $, *, +, -, /,  © etc Symbols.
Other 'O' ESC, BEL etc  

For details about the Unicode characters please look here (or look below): 
https://www.compart.com/en/unicode/category/

Applies to: The Non-Conforming Warnings and Errors applies to: 
  • Segment Mapping algorithms.
  • DateShift algorithms. 
  • PHONE SL algorithm.
  • ZIP+4 algorithm.
Reporting: Warnings and Errors are reported on:
  • the Monitor page (Table level warnings/errors).
  • on the Logs page on the Admin tab.
Configuration - Status Code: The status code can be set to Fail or Successful should there be Non-Conforming data in a column. This can be  
  • Global Setting on the Settings tab on the Algorithm page.
  • Per algorithm, in each applicable Algorithm popup.
Configuration - Job Execution: The job can be configured to stop masking the table which contains Non-Confirming data. 
  • Job Configuration popup by ticking "Stop job on first occurrence".

Issue

When the algorithm used does not match the data to be masked, the Masking Engine reports this and the engine also tries to give hints on what is wrong.

Furthermore, since this is a critical issue, the action to abort the Masking Job can be defined should non-conforming data be encountered. 

When using algorithms that are defined on specific characters and that data is not matching it is essential that this is reported, as the data will not be masked.

The issue with non-conforming data is frequently special characters or non-US letters, but can also be related to the data length.

The special characters and foreign letters are usually listed as 'P' but can be listed as 'L'. 

Example

The example shows two records that masked and three that failed due to different non-conforming issues. The example uses Segment Mapping with 4 characters Alpha-Numeric. 

+--------+--------+-------------+--------------------------------------------------+
| Input  | Masked | Non-Conform | Comment                                          |
+--------+--------+-------------+--------------------------------------------------+
| 1234   | 3424   |             | Masked ok                                        |
| ABCD   | KENB   |             | Masked ok                                        |
+--------+--------+-------------+--------------------------------------------------+
| !AB!   | !AB!   | PLLP        | Failed - punctuations.                           |
| ÀÄÅB   | ÀÄÅB   | LLLL        | Failed - tricky as it includes foreign letters.  |
+--------+--------+-------------+--------------------------------------------------+
| ABCD12 | ABCD12 | LLLLNN      | Failed - too long.                               |
+--------+--------+-------------+--------------------------------------------------+

What is Non-Conforming data - from Masking Engine UI

The following is from the Masking Engine UI when defining Inventory and setting Actions. 

Nonconforming Data Information

It is possible that some data in a dataset does not conform to the structure of the chosen algorithm and masking will fail for this data.

For example, if you have a segment mapping algorithm that will mask SSNs with the format NNN-NN-NNNN, and an entry is encountered with format NNN-NN-NNNNN, masking of this data will fail. A warning will be displayed on the job monitor indicating Nonconforming data was present in the affected table.

You may control whether the presence of nonconforming data causes the masking job to fail using the "Nonconforming Data" selection on the Settings > Algorithms page. This setting may also be controlled individually for each Segment Mapping algorithm.

It is also possible to control whether failure is immediate, or reported after the job runs to completion, using the checking the box under "If Nonconforming Data is encountered" in the Create Job screen.

The Job Monitor page (Success or Fail) will help you to troubleshoot which data was nonconforming. When representative nonconforming patterns of data are shown, the character pattern is illustrated as follows:

  • N for digits
  • L for letters
  • M for marks
  • P for punctuation
  • S for symbols
  • Z for separator
  • O for other
  • U for unknown

Job Execution and Status Configurations 

 

There are two functional configurations:

  • Marked job status.
  • Stop job execution

Algorithm Settings

There are two settings - one Global for all Algorithms and one in each applicable Specific Algorithm.

These settings will only set the job status. The default settings are marked with '*'.

+--------------------------+---------------------+---------------+
| Algorithm Global Setting | Specific Algorithm  | Job marked as |
+==========================+=====================+===============+
| Mark as Failed*          | Use global setting* | Failed        |
| Mark as Succeeded        |                     | Succeeded     |
+--------------------------+---------------------+---------------+
| -                        | Mark as Failed      | Failed        |
| -                        | Mark as Succeeded   | Succeeded     |
+--------------------------+---------------------+---------------+

Job Configurations

The job can be configured to stop masking the table which contains non-conforming data. This feature makes sure that unmasked data is not sent to the masked database.

 

Note

Note:

There are situations where it is ok to have unmasked data and in this case, the job should succeed even if there are non-conforming data (for example codes in dates fields or phone numbers). Though, it is better to make sure the data and algorithms conform. 

The masking job will only stop masking the table with non-confirming data that failed. 
When non-conforming data is identified the masking will terminate at the current location. No unmasked data is written.

 

The default setting is marked with '*'. For 'Job marked as' see above.

If Nonconforming Data is encountered 
+------------------------------+---------------+-----------------------------------------------------+
| Stop job on first occurrence | Job marked as | Job Status                                          |
+==============================+===============+=====================================================+
| Not ticked*                  | Failed        | All rows - job marked as Failed.                    |
|                              | Succeeded     | All rows - job marked as Successful (with warning). |
+------------------------------+---------------+-----------------------------------------------------+
| Ticked                       | Failed        | Table with non-conf. terminated.                    |
|                              | Succeeded     | All rows - job marked as Successful (with warning). |
+------------------------------+---------------+-----------------------------------------------------+

Classifications and Examples

Below is a complete listing of all classifications and sub-classifications used to categorize the non-conformant data. 

Letters (L)

Sub-Classification US ASCII Example Unicode and Other Examples
Lower Case Letters a, b, c, d, ..., z µ, ß, à, æ, ...
Upper Case Letters A, B, C, D, ..., Z À, Æ, Ç, Ň, ...
Modifier Letters None ᴬ, ᴭ, ʰ, ʶ, ...
Titlecase Letters None Dž, Lj, ᾈ, ...
Other Letters None ª, º, ƻ, ج, ش, ഘ, オ, ポ, ...

Numbers (N)

Sub-Classification US ASCII Example Unicode and Other Examples
Decimal Numbers 0, 1, 2, 3, ..., 9 ٠, ٠, २, ४, ... 
Letter Numbers None ᛮ, ᛯ, ᛰ, Ⅰ, Ⅱ, Ⅲ, ...
Other Numbers None ², ³, ¼, ½, ৴, ৵, ...

Marks (M)

Sub-Classification US ASCII Example Unicode and Other Examples
Enclosing Marks None ҈, ҉, ᪾, ... 
Nonspacing Marks None ۖ, ۗ, ۘ  , ...
Spacing Marks None ः, ऻ, ा, ि, ...

Separators (Z)

Sub-Classification US ASCII Example Unicode and Other Examples
Space Separators [space]  , [different size spaces]
Line Separators Not visible None
Paragraph Separators Not visible None

Punctuation (P)

Sub-Classification US ASCII Example Unicode and Other Examples
Close Punctuation ), ], } ༻, ༽,  ᚜, ⁆, ⟧, ...
Connector Punctuation _ ‿, ⁀, ⁔, ︳, ﹍
Dash Punctuation - -, ⸗, ⸚, 〜, ... 
Final Punctuation » ’, ”, ›, ⸃, ...
Initial Punctuation « ‘, ‛, “, ‟, ...
Open Punctuation (, [, { ༺, ༼, ᚛, ⟦, ...
Other Punctuation !, ", #, %, &, *, /, :, ... ՞, ։, ؊, ؟, ๏, ๛, ៘, ...

Symbols (S)

Sub-Classification US ASCII Example Unicode and Other Examples
Currency Symbol $ ¢, £, ¥, ֏, ...
Math Symbol +, <, =, >, |, ~, ... ϶, ؆, ؇, ⅀, ⅁, ...
Modifier Symbol ^, `, ¨, ¯, ... ꜈, ꜉, ꜊, ꜋ , ꜎, ꜠, ...
Other Symbol ¦, ©, ®, ° ҂, ؎, ؏, ۞, ...

Other (O)

Sub-Classification US ASCII Example Unicode and Other Examples
Control NULL, ACK, BEL, ESC, ... None
Format SHY LRO, RLO, RLI, LRI, BOM, ...
Private Use None None
Surrogate None None

 

Troubleshooting

To troubleshoot non-conforming data one has to look at the data. There are no details provided in the bundle or in the logs other than the non-conforming classification shown above.

To understand what is causing the warnings/errors the data has to be queried. Examples of queries that can assist investigation can be found here: