Masking and Tokenization and Character Sets (KBA8651)
KBA
KBA# 8651At a Glance
Description: | This KBA describes possible issues and errors when data is masked to a string with characters that are not valid due to the character set in the file or database column storing the masked value. This issue can be hard to detect and it can also be silent (as with Oracle and MS SQL Server). |
---|---|
Applicable Versions: | This KBA applies to all known Continuous Compliance Engine versions. The issue is not bound to any version but to how character sets are handled. |
Issue 1: Error 1 | Character conversion errors. |
Issue 2: Error 2 | Masked value too large for the column. |
Issue 3: Silent | Oracle and SQL Server will convert some characters automatically. This might cause issues with uniqueness when masking and with Re-Identify when using Tokenization. |
Algorithms: | All new algorithms (new Framework) are UTF-8 internally on the Continuous Compliance Engine. |
Resolution: | Ensure that the Characters defined in the Algorithm are valid for the Character Set on the masked database column or masked file. If not - remove unsupported characters from the algorithm. |
Recommendation: | It is recommended to create your own algorithm using the built-in framework. The algorithm will then be created based on your specific requirements. The out of the box (OOTB) algorithms are created to demonstrate capabilities. For example, the built-in Character Mapping (dlpx-core:CM Alpha-Numeric) has a large set of extended Unicode characters which might not work on all character sets. |
Applicable Delphix Versions
- Click here to view the versions of the Delphix engine to which this article applies
-
Major Release All Sub Releases 6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.8.1, 6.0.9.0, 6.0.10.0, 6.0.10.1, 6.0.11.0, 6.0.12.0, 6.0.12.1, 6.0.13.0, 6.0.13.1, 6.0.14.0, 6.0.15.0 5.3
5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0, 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0 5.2
5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1
5.1
5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0
5.0
5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1, 5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4
Resolution
The solution to these issues is changing the algorithm so the algorithm masks to a supported set of characters.
Steps:
- Go to the database and get the Character Set.
- Go to the algorithm with the issue and:
- Remove any unsupported characters; or
- Recreate the Algorithm ensuring lookup values do not have unsupported characters; or
- For a Custom Algorithm, please contact the author.
Root cause
The root cause here is that the masked value contains a character that is not supported. This can come from Character Mappings that includes a character in the Character Group that is not supported. It can also be an unsupported character in a Secure Lookup or Mapping Algorithm.
Since different locals accept different characters the only solution is to create an algorithm that is correct for a specific local. Even US names are not immune from this issue where some names contain unsupported values. To see the extent of this - Google US town names that have diacritics.
Issues
There are three different types of issues identified related to the character set.
- Character conversion errors - the masked value cannot be converted into the database/file character set.
- Silent conversion - the unsupported character is converted into a supported character which causes referential integrity or re-identification errors.
- Value too long errors - the masked value gets an extra character and the value is too long for the field/column.
Below are some examples of errors. See above for Resolution.
Conversion errors
Sybase is known to throw an error if a character is not matching the character set.
SAP ASE (Sybase)
On Sybase a character conversion issue usually results in the following error:
[JOB_ID_xx_yy] - DelphixTableUpdate.0 - JZ0BE: BatchUpdateException: Error occurred while executing batch statement: Error converting characters into server's character set. Some character(s) could not be converted. ... [JOB_ID_xx_yy] - DelphixTableUpdate.0 - Next Exception: SQLState( ZZZZZ) ErrorCode(2402)
Silent conversion
Oracle and SQL Server are known to silently convert the unsupported character to a similar character, a '?', or similar.
This might mask a value in a way that causes duplicates, referential integrity errors, re-identification errors, or other. The main issue here is that it is silent and the error/issue might be identified long after the data is masked.
Oracle
For example (this depends on LNS settings), Oracle might convert the following:
Ū > U ₨ > ?
SQL Server
For SQL Server the following is an example (once again - this depends on code page settings):
Ū > U ₨ > ?
Truncation errors
In some instances, the new masked character can be converted and contains extra characters (up to 3 characters and 9 bytes have been noted on Oracle). If the data type is exactly defined for the value, this could cause the value to be too large for the column (as seen in the examples below).
Oracle
[JOB_ID_xx_yy] - DelphixTableUpdate.0 - ORA-12899: value too large for column "..." (actual: x, maximum: y)
DB2
[JOB_ID_xx_yy] - DelphixTableUpdate.0 - Next Exception: SQLState( 22001) ErrorCode(-404)
Troubleshooting
To troubleshoot this issue, begin with the following:
- The column Data Type.
- The Character Set.
- Check if there are any errors.
- Check all characters in the algorithm:
- See below for known characters in out of the box (OOTB) algorithms.
- Check characters in the character mapping algorithms.
- Check source file for Mapping and Secure Lookup algorithms.
- Contact the author of Custom Algorithms.
Notes
Oracle
The following statement is found on Oracle's docs site:
Because Java works internally in Unicode, the client character set is always set to Unicode.
Source: https://docs.oracle.com/en/database/...nnections.html
Continuous Compliance Engine
Alpha-Numeric Character Mapping OOTB
Alpha-Numeric Character Mapping has the following extended characters defined in the OOTB algorithm:
àáâãäåæèéêëçìíîïñðšòóôõöőùúûüűýÿžÀÁÂÃÄÅÆÈÉÊËÇÌÍÎÏÑЊÒÓÔÕÖŐÙÚÛÜŰÝŸŽ
Related Articles
The following articles may provide more information or related information to this article:
- KBA: Special Characters in Secure Lookups (KBA5285) (OOTB)
- Docs: Algorithm Framework / Character Mapping