Skip to main content
Delphix

File Masking and Enclosure and Escape Characters (KBA8555)

 

 

 KBA

KBA# 8555

At a Glance

Versions: Applicable Delphix Masking versions: 6.0.3 and onwards. 
Enclosure: If a value needs to be enclosed, the enclosure can be single or multiple characters. The Masking Engine can handle multiple enclosures per field. The normal enclosure is a double quote ("). The Masking Engine cannot use different characters for opening and closing.
  • No enclosure (default)
  • Custom character (or characters)
Escape Character: There are two options to escape the Enclosure:
  • Double Enclosed (default)
  • Custom Character
Number of Fields: The output will always have the number of fields defined in the File Format (or File Formats if different Record Types are used). 
Important: If values are enclosed incorrectly, it is not possible to detect the separators or End of Line (EOL) characters, as these are enclosed.

Therefore, it is important to enclose and escape correctly. 
Possible Errors Errors with enclosures and escape characters can only be seen in the masked data.

On some occasions, the engine can run out of memory trying to find the closing enclosure. 
 
java.lang.OutOfMemoryError: Requested array size exceeds VM limit

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases
6.0 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.8.1, 6.0.9.0, 6.0.10.0, 6.0.10.1, 6.0.11.0, 6.0.12.0, 6.0.12.1, 6.0.13.0, 6.0.13.1, 6.0.14.0, 6.0.15.0, 6.0.16.0

Using Enclosures and Escaping 

Data stored in a delimited plain text file requires special characters to be used as the delimiter and the end-of-record character. The delimiter is usually one character (commonly comma, TAB, pipe, tilde, etc) but can also be multiple characters. For the end-of-record character, this is usually LF for Unix and CR+LF for Windows but can also be custom. 

The special thing about these are:

  • If these characters appear inside a value in a field, the value needs to be enclosed.
  • If the enclosure appears inside a value being enclosed, that enclosure character needs to be escaped. 

Possible issues

The possible issues are that there is a delimiter or end-of-record character in the field. This will cause the data to not line up and the wrong data is masked. 

Important to Enclose and Escape Correctly

If the delimited plain text file is not using enclose and escape correctly - the following can happen:

  • The data will be misaligned which will cause: 
    • Data loss
    • PII leakage
  • The engine may crash the job due to:
    • Processing errors
    • Out of memory

None of the issues above can be fixed by more memory or tweaks. The only solution is the have the correct enclosure and escaping strategy - this strategy needs to be defined where the data is generated and it then needs to match in the masking job.

Enclosure Strategies

The key things here are: 

  • If enclosures are not used, then any characters that encloses a value will be ignored. 
  • If enclosures are used, then there needs to be an opening enclosure and a matching closing enclosure. 

Examples of Correct and Incorrect Enclosure

Correct Incorrect

The following enclosures are valid and are enclosing the value in the field.

Enclosures are not needed even if they are defined.

...,abcd,...

Opening and closing enclosure used around the whole value: 

...,"abcd",...

Inside the enclosure there can be a separator:

...,"ab,cd",...

or EOL character:

...,"ab
cd",...

There can be multiple enclosed values in a field: 

...,"a"bc"d",...

The enclosure can enclose part of a value:

...,"abc"d,...

If an enclosure has been defined, the enclosure needs to match with opening and a matching closing enclosure. 

This will be treated as an opening enclosure:

...,ab"cd,...

This will be treated as opening, closing, and opening:

...,"ab"cd",...

The EOL is outside the enclosure and seen as two records.

...,"ab"
"cd",...

The EOL is outside the enclosure and the second record has no closing enclosure.

...,"ab"
cd",...

Escape Strategies

There are two different Escape Strategies to ignore the enclosure character:

  • Double Enclosure (the enclosure character is duplicated if it is escaped).
  • Escape the enclosure character.

Examples of Correct and Incorrect Double Enclosure 

Correct Incorrect

The following enclosures are valid and are enclosing the value in the field.

Enclosures are not needed even if they are defined.

...,abcd,...

Opening and closing enclosure used around the whole value: 

...,"abcd",...

Inside the enclosure there can be a separator:

...,"ab,cd",...

or EOL character:

...,"ab
cd",...

There can be multiple enclosed values in a field: 

...,"a"bc"d",...

The enclosure can enclose part of a value:

...,"abc"d,...

The escape and the enclosure needs to match with opening and a matching closing enclosure. 

This will be treated as an opening enclosure:

...,ab"cd,...

This will be treated as opening, closing, and opening:

...,"ab"cd",...

The EOL is outside the enclosure. This will be seen as two records.

...,"ab"
cd",...

 

 

Examples of Correct and Incorrect Escape Characters

Correct Incorrect

The following enclosures are valid and are enclosing the value in the field.

Enclosures are not needed even if they are defined.

...,abcd,...

Opening and closing enclosure used around the whole value: 

...,ab\"cd,...

Escape the enclosure inside the enclosed value:

...,"ab\"cd",...
...,"abcd\"",...

or outside the enclosed value:

...,"ab"cd\",...

The escape character, if defined will be applied to all occurrences of the escaped enclosure. 

A single enclosure will be seen as an opening enclosure: 

...,ab"cd,...

The two first are escaped and the last is opening:

...,\"ab\"cd",...

The first and last are escaped and the middle is opening:

...,\"ab"cd\",...

 

 

 

Example of Memory Error

If the enclosure strategy is not correctly defined and aligned with the data, the data in the enclosure could contain so many characters that the job consumes all available memory -- crashing the job. 

 

 

Warning

Warning:

If no end enclosure is found this can also cause the data to be written and sensitive data escape unmasked. For that reason - if an Out Of Memory Error is seen in the File Masking job - it is essential to review that the file and job have been correctly configured.

DO NOT INCREASE THE MEMORY.

CHECK THE CONFIGURATION (Enclosure Strategy).

 

java.lang.OutOfMemoryError: Requested array size exceeds VM limit

Troubleshooting

The following sections provide troubleshooting guidance. 

Linux/Unix

Finding errors in INFO.log

This listing can return many matches. The following command offers a way to filter on a specific job and execution. 

grep -E "E=1|Masking Engine|Caused by|OutOfMemoryError|Job with execution.*submitted|Unable to load the job|WARNING|for job|Text File.*Finished" info.log
Finding errors in INFO.log for specific JOB 

This limits the search for errors to a specific job and execution (change xx and yy to the job and execution as required). 

grep -E "E=1|Masking Engine|Caused by|OutOfMemoryError|Job with execution.*submitted|Unable to load the job|WARNING|for job|Text File.*Finished" info.log | grep "JOB_ID_xx_yy"

Windows PowerShell

Finding errors in INFO.log 

This listing can be long. The following command offers a way to filter on a specific job and execution. 

Select-String "E=1|Masking Engine|Caused by|OutOfMemoryError|Job with execution.*submitted|WARNING|for job|Text File.*Finished" info.log
Finding errors in INFO.log for specific JOB 

This limits the search for errors to a specific job and execution (change xx and yy to the job and execution as required). 

Select-String "E=1|Masking Engine|Caused by|OutOfMemoryError|Job with execution.*submitted|Unable to load the job|WARNING|for job|Text File.*Finished" info.log | Select-String "JOB_ID_xx_yy"

Log Example

Below are some logs and descriptions (slightly modified for readability).

  • INFO
    • Indicates general activity such as Start Job (submitted) and Job Status.
       
  • ERROR
    • The Out Of Memory can be seen here.
       
  • 'Caused by'
    • 'Caused by' is always good to look for as it (usually) captures errors (in this case it is listed in the ERROR). 
       
  • 'Masking Engine Version
    • This is logged at the top of each info.log file and each time the engine was rebooted.
    • This is a good way to check the version and if there is a second entry, it indicates an engine restart. 
grep -E "E=1|Masking Engine|Caused by|OutOfMemoryError|Job with execution.*submitted|Unable to load the job|WARNING|for job|Text File.*Finished" info.log
2021-... Masking Engine Version: 6.0.x.x (...), ...
...
2021-... [...] INFO  c.dmsuite.threads.JobExecutorService - Job with execution ID: yy submitted to thread pool
2021-... [...] INFO  c.d.d.m.t.RunServiceImpl - [JOB_ID_xx_yy] 2021/... - Text File Input.0 - ERROR (...) : java.lang.OutOfMemoryError: Java heap space
2021-... [...] INFO  c.d.d.m.t.RunServiceImpl - [JOB_ID_xx_yy] 2021/... - Text File Output.0 - Finished processing (I=0, O=20544753, R=20544753, W=20544753, U=0, E=0)
2021-... [...] INFO  c.d.d.m.t.RunServiceImpl - [JOB_ID_xx_yy] 2021/... - Text File Input.0 - Finished processing (I=20544753, O=20544753, R=0, W=20544753, U=0, E=1)
2021-... [...] ERROR c.dmsuite.threads.JobExecutorService - [JOB_ID_xx_yy] - Execution yy for job xx failed
Caused by: com.dmsuite.exception.JobFailedException: Job has failed due to error in running Kettle
Caused by: com.dmsuite.exception.JobFailedException: Kettle returned a non zero exit value: 1

 

 


Related Articles

The following articles may provide more information or related information to this article: