Skip to main content
Delphix

Algorithm v1: Special Characters in Secure Lookup (KBA5285)

 

 

KBA

KBA# 5285

Applicable Delphix Versions

This document is relevant to version 1 algorithms. Deprecated in 6.0.12.

At a Glance 

Deprecated This KBA is only for historical information. Valid for all versions up to 6.0.12.0.
Characters The affected characters are BOM, TAB, a hyphen, and accents above some letters. 

Note: comma (,) tab, single quote ('), and hashtag (#) are used and could cause errors. 
Algorithms The list of algorithms and characters including special characters: 
Algorithm Code       | Algorithm Name   | Characters        | Other
---------------------+------------------+-------------------+---------
 ADDRESS LINE SL     | AddrLookup       | A-Za-z0-9 &'()*_` | BOM
 ADDRESS LINE 2 SL   | AddrLine2Lookup  | A-Za-z0-9 #.      | BOM
 BUSINESS LEGAL E... | BusinessLegal... | A-Za-z0-9 (),-./  |
 COMMENT SL          | CommentLookup    | A-Za-z0-9 ()-.    |
 DUMMY_HOSPITAL_N... | DummyHospital... | A-Za-z '-.        |
 EMAIL SL            | EmailLookup      | A-Za-z.@          |
 FULL_NM_SL          | FullNMLookup     | A-Za-z '          | TAB
 LAST_COMMA_FIRST_SL | LastCommaFirs... | A-Za-z ',         |
 LAST NAME SL        | LastNameLookup   | A-Za-z'           |
 SCHOOL NAME SL      | SchoolNameLookup | A-Za-z (),-       |
 USCITIES_SL         | USCitiesLookup   | A-Za-z ()-.       | �
 US_COUNTIES_SL      | USCountiesLookup | A-Za-z '-.        | BOM, UTF
 WEB_URLS_SL         | WebURLsLookup    | a-z./:            |
Terms used BOMByte Order Mark
TAB = tabulator
UTF = UTF-8 Special Character

Background

When these algorithms were created, some of the values in the source file had special characters in them (some are legitimate and part of the name, and others were created by tools or systems used). These characters have always been with these algorithms. 

Since algorithms cannot be modified, as this will result in a different masked result, these characters will remain in these algorithms. 

Most of the time, these characters do not cause any issues. 

This KBA documents the algorithms, and the values, and gives some examples of errors they can cause. 

Special Characters

There are some masked values that have characters outside the old ASCII standard. This can be an issue if the database's or file's characters restrict the use of these characters.

Could cause error

The characters in these algorithms could cause the following error - the example below is from Sybase. 

Error converting characters into server's character set. Some character(s) could not be converted.

Algorithms and values   

Below is a list of algorithms and values: 

USCITIES_SL
"Winston�Salem"

US_COUNTIES_SL
"Coös County"
"Doña Ana County"
"Prince of Wales – Hyder Census Area"
"Valdez–Cordova Census Area"
"Yukon–Koyukuk Census Area"

Byte Order Mark (BOM) 

The BOM character is a sequence of hidden characters (bytes) at the start of a file to indicate the encoding type of the file. The BOM character might appear in the first value of some algorithms.

It is caused by a BOM in the original imported lookup file.

Error example

Due to the character, when trimmed it will result in an extra character and if the column length is too short this can result in the value being too long for the column.

Example from Oracle: 

ORA-12899: value too large for column

SQL Server:

SQLState( 22001) ErrorCode(8152)

 

Algorithms and values   

Below is a list of algorithms and the column lengths causing an error: 

ADDRESS LINE SL
 - Possible error if column length shorter or equeal to 21 characters. 

ADDRESS LINE 2 SL
 - Possible error if column length shorter or equeal to 9 characters.

US_COUNTIES_SL
 - Possible error if column length shorter or equeal to 19 characters.

Comma character

Some algorithms have comma (,) characters between some of the values. If a CSV file is masked this will cause an issue if the delimiter is a comma and the values are not enclosed (for example, with double quotes).

Error example 

This can result in delimited file masking to have the field alignment out of order.

Algorithm

The below algorithm has a comma between some values:

BUSINESS LEGAL ENTITY SL
LAST_COMMA_FIRST_SL
SCHOOL NAME SL

TAB character 

One algorithm has a TAB character between some of the values. This does not usually cause an issue, though will cause an issue if a tab-delimited file is masked, and this algorithm is used. 

Error example 

If this algorithm is used on a tab-delimited file, the column alignment will be out of order.

Algorithm

The below algorithm has a tab between some values:

FULL_NM_SL

Resolution 

To resolve this issue create a new custom algorithm.

If the original lookup values are required, please create a case with Delphix support.