Skip to main content
Delphix

Delimited File Masking and EOR Enclosures Escapes (KBA8928)

 

KBA

KBA# 8928

At a Glance  

Description: This KBA describes Delimited File Masking configurations. Especially characters (printable and unprintable) for End of Record and Delimiters and Enclosure and Escape strategies. 
End of Record: The End of Record configures the character(s) that defines the end of the record. 

This is usually the end of a row (defined as a new line (CR+LF or LF)) but it doesn't need to be.

 
Options:
  • CR+LF trem. (Windows, DOS)
  • LF term. (Unix)
  • Custom
Characters:
  • Unprintable: 0x0D + 0x0A 
  • Unprintable: 0x0A
  • Custom printable and unprintable (Ctrl).
Delimiter: The Delimiter configures the character(s) that defines the delimitation of values in a record.

The most common delimiter is Comma ( , ) but can be any character. 

 
Options:
  • Custom
Characters:
  • Custom printable and unprintable (Ctrl).
Text Enclosure: The Text Enclosure configures the character(s) that encloses a value.

The most common text enclosure is Double Quotes ( " ) but can be any character.

 
Options:
  • Custom
Characters:
  • Custom printable.
Enclosure  Escape: The Escape Enclosure strategy configures how to escape the text enclosure.
 
Options:
  • Double Enclosure
  • Custom
Enclosure Escape Strategy:
  • The same character twice.
  • Custom printable.
Escape EEC: The Escape EEC (Escape Enclosure Character) is a flag that enables escaping the escape character that escapes the enclosure.
 
Options:
  • Flag (on/off)
Escape the Escape Enclosure Character:
  • The same character twice.
More info: More information about this can be found in the Masking Docs:

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases
6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.8.1, 6.0.9.0, 6.0.10.0, 6.0.10.1, 6.0.11.0, 6.0.12.0, 6.0.12.1, 6.0.13.0

5.3

5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0, 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

5.2

5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1

5.1

5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0

5.0

5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1, 5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

Delimited File Configuration

This section describes the different configurations in Delimited File Masking, such as EOR, delimiter, enclosures, and escape characters (strategies).

End of Record (EOR)

This defines the character (or sequence of characters) used to define the End of Record.

Option Hex Code Comment
Select one of:
  • CR+LF terminated (Windows, DOS)
  • LF terminated (Unix)
  • Custom
Hex:
  • 0x0D+0x0A
  • 0x0A
  • -
This sets the EOR:
  • The format used on Windows files. 
  • The format used on Unix/Linux/Mac files.
  • Define any character(s) as the EOR.

Custom End of Record

The EOR can be custom and be any character or a sequence of characters (printable or unprintable).

For unprintable characters please see the table in the section below. 

 

An example of a printable EOR sequence is: #n 

This means that the two characters ('#' and 'n') will be the EOR.

This will not look like many expect - below is a comma separated example (3 columns, 3 rows):

R1C1_Val,R1C2_Val,R1C3_Val#nR2C1_Val,R2C2_Val,R2C3_Val#nR3C1_Val,R3C2_Val,R3C3_Val

Common error: '\n'

A common error is to define '\n' as the EOR. When '\n' is defined as the EOR  this means that the two characters '\' and 'n' are used as an EOR (not LF (0x0A) as commonly thought).

To define LF (0x0a) please use the 'LF terminated (Unix)' or a Custom Control Character as listed in the table below. 

Note

Note:

'\n' is frequently used in text to indicate newline (note that this is an escaped encoding, part of the text, and not the LF control character).

 

Unprintable Control Characters 

An unprintable control characters is a characters/code (defined in ASCII) that does not represent a printable symbol. For the EOR, this is usually CR+LF or LF. 

The table shows the Ctrl Code (also used in the UI), the Hex Value, a Description, and a Comment.

UI Symbol Hex Value Description  Comment
^@ [NUL]
^A [SOH]
^B [STX]
^C [ETX]
^D [EOT]
^E [ENQ]
^F [ACK]
^G [BEL]
^H [BS]
^I [HT]
^J [LF]
^K [VT]
^L [FF]
^M [CR]
^N [SO]
^O [SI]
^P [DLE]
^Q [DC1]
^R [DC2]
^S [DC3]
^T [DC4]
^U [NAK]
^V [SYN]
^W [ETB]
^X [CAN]
^Y [EM]
^Z [SUB]
^[ [ESC]
^\ [FS]
^] [GS]
^^ [RS]
^_ [US]
$[00]
$[01]
$[02]
$[03]
$[04]
$[05]
$[06]
$[07]
$[08]
$[09]
$[0A]
$[0B]
$[0C]
$[0D]
$[0E]
$[0F]
$[10]
$[11]
$[12]
$[13]
$[14]
$[15]
$[16]
$[17]
$[18]
$[19]
$[1A]
$[1B]
$[1C]
$[1D]
$[1E]
$[1F]
NULL (character)
START OF HEADING
START OF TEXT
END OF TEXT
END OF TRANSMISSION
ENQUIRY
ACKNOWLEDGE
BELL (beep)
BACKSPACE
HORIZONTAL TAB
LINE FEED
VERTICAL TAB
FORM FEED
CARRIAGE RETURN
SHIFT OUT
SHIFT IN
DATA LINK ESCAPE
DEVICE CONTROL 1 (XON)
DEVICE CONTROL 2
DEVICE CONTROL 3
DEVICE CONTROL 4 (XOFF)
NEGATIVE ACKNOWLEDGE
SYNCHRONOUS IDLE
END OF TRANSMISSION BLOCK
CANCEL
END OF MEDIUM
SUBSTITUTE
ESCAPE
FILE SEPARATOR
GROUP SEPARATOR
RECORD SEPARATOR
UNIT SEPARATOR










HT = TAB (used in tab-delimited file)
LF = Windows EOR (2 of 2). Unix EOR.


CR = Windows EOR (1 of 2).

 

Delimiter

This defines the character (or characters) used to define the delimiter. The delimiter can be any character or a sequence of characters (printable or unprintable).

File Type Delimiter Comment
Common delimited files are: 
  • Comma Separated Values (CSV)
  • Pipe Delimited File
  • Tab Delimited File
Use:
  •  , 
  •  | 
  •  ^I [HT] 
Notes:
  • Printable character: comma.
  • Printable character: pipe.
  • Unprintable: click 'Ctrl' and select ^I [HT]  Horizontal Tab.

Common error: '^t'

A common error is to define '^t' as the tab character. This will in fact define two characters ('^' and 't') as the delimiter. 

To define TAB, click the 'Ctrl' and select Horizontal Tab (as listed in the table above). 

Note

Note:

'^t' is frequently used in text data. This is an escaped encoding in the text (not a tab characters).

 

Unprintable Control Characters 

To define unprintable control characters as the delimiter, click the Ctrl button and select the character. These are the same control characters as in the EOR, see above for a list of Control Characters.  

Text Enclosure

If a value (string) in the delimited file contains a delimiter or EOR character, what should you do?

This is when the value (string) is enclosed with an enclosure. This enclosure is also configurable on the engine.

Example

The example shows a CSV file with a value ('Jane,Smith') that contains a comma. Note the 3rd example where the value contains a newline, CR+LF (Windows) or LF (Unix).

ID,Full_name
1,John Smith
2,Jane,Smith
3,Scott
Smith

Solution

The solution is the enclose the strings with an enclosure (here " ). Note that the enclosure is optional.

ID,Full_name
1,"John Smith"
2,"Jane,Smith"
3,"Scott
Smith"

Enclosure Escape Strategy

What if the enclosed value does include the enclosure character?

There is a solution for this, too. It is called enclosure escape strategy.

Escape Strategy Comment
Two options: 
  • Double Enclosure
  • Custom
To enable the engine to ignore an enclosure in a value:
  • Use the enclosure twice. 
  • Use a special escape character (printable only).

Example

Following the example above but using a single quote as the enclosure. The surname is now O'connell. 

ID,Full_name
1,John O'connell
2,Jane,O'connell
3,Scott
O'connell

Solution - Double Enclosure

The solution here is to escape the enclosure by doubling it (writing it twice).

ID,Full_name
1,'John O''connell'
2,'Jane,O''connell'
3,'Scott
O''connell'

Solution - Custom 

The solution used here is to escape with a character. The character used is / .

ID,Full_name
1,'John O/'connell'
2,'Jane,O/'connell'
3,'Scott
O/'connell'

Escape "Enclosure Escape Character"

What if we need to escape the escape character that is used to escape the enclosure? There is a solution for that too.

 

Escape EEC Comment
Only one option: 
  • 'Double Escape'
To enable the engine to ignore an enclosure in a value:
  • Use the escape ECC twice. 

Example

Following the example above but using using an added ' / ' in one of the values. 

ID,Full_name
1,John /O'connell
2,Jane,/O'connell
3,Scott /
O'connell

Solution - Escape EEC 

In the example below - we have now:

  1. Enclosed to value with '
  2. Escaped the enclosure with /
  3. Escaped the Escape Enclosure Character by doubling the /. 
ID,Full_name
1,'John //O/'connell'
2,'Jane,//O/'connell'
3,'Scott //
O/'connell'