Skip to main content
Delphix

KBA1764 Custom SQL - A Technical Overview

 

 

At a Glance 

Versions: Applicable Delphix Masking versions: 4.x, 5.0.x, 5.1.x, 5.2.x, 5.3.x
Description:

The Rule Set is the glue on the Masking Engine. From the Rule Set we have :

  • Connector Information 
  • The Inventory of Masking Rules
  • The Masking and Profiling Job
  • AND - tables and customization Rules 

Overview

This page describes in detail the Custom SQL which is a Rule Set customization feature. The page will describe how to use it, casting of values, pitfalls, and troubleshooting. Since there are some customization features that are related to the Custom SQL, these will be highlighted as well - such as Logical Key. 

Advanced Feature!

This is an advanced feature that requires a good knowledge of SQL statements, the structure of the data and the table to be masked and how masking is performed. It is recommended to first start creating a normal masking job and then if needed tweak the SQL in Custom SQL.

Why use it?

The Custom SQL is a great tool which enables you to change the data that goes into the Transformation Engine (or in other words 'what and how it is being masked'). With Custom SQL we can control and change:

  • The ID field(s) (or Logical Key).
  • The Masked Fields.
  • Casting values.
  • Reformatting of values.
  • Replacing Values.
  • Renaming Fields.
  • Filter with where clauses and in select statements.
  • Creating table joins.

Where do I find it?

The Custom SQL can be modified on the Rule Set page in the UI.

User Interface

The Custom SQL is accessed from the Rule Set Menu. The steps are:

  1. Click on the Rule Set menu.
  2. Click 'Edit' or the link on the Rule Set to edit (this step is not in the picture).
  3.  Click 'Edit' on the Table to edit.
  4. Select Custom SQL.

Masking UI - Environment - Rule Set - Edit Custom SQL.png

Custom SQL Indication

When a Custom SQL has been specified for a Table, this is indicated in the UI with the word "SQL" as shown above.

Editing Custom SQL 

When selecting Custom SQL in the popup, a SQL statement is presented. To assist with all column names, the SQL contains all field names in the table and some cast definition if the data type needs to be converted in order to be masked (for example bit for boolean which is cast to int). 

Pro Tip!

Edit the SQL in MS Access or any other database or database design tool.

How:
Create a connection to your database, paste in the SQL and edit and test it. When the data extracted is meeting your masking needs, paste it back into the Custom SQL box. 

Essential Fields

There are some essential fields which are required in order to get the masking job to work. This differs significantly between In-Place and On-The-Fly masking jobs.

 Let us look at what fields are needed: 

In-Place

With In-Place masking, only the Masked Columns and the Unique ID are needed to be extracted and ingested into the masking transformation process. 

In a normal masking job, the selection of fields is automatic and only required fields are included. 

select UniqueID, MaskedCol from MaskedTable

Masking - Custom SQL - In-Place ver 3.png

The following columns are required:

  1. The Unique Identifier for the table.
    • This should be an unmasked unique field in the table
    • or the Unique Identifier if added by the masking process. 
  2. The Masked Columns with the correct column names.
  3. Any Other Column specified will be ingested but not used.

Note - Remove Columns not used! 

It is highly recommended to remove all columns that are not needed to mask the data as all columns added are loaded into memory and will use job memory. It will also slightly slow down the masking job. There are many cases where large objects being part of the columns have generated Out of Memory errors. 

On-The-Fly

On-The-Fly masking is different as all data is extracted from one table and inserted into another. Therefore, all data to be inserted back is needed to be specified in the Custom SQL.

select MaskedCol, AllOtherCol from MaskedTable

Masking - Custom SQL - On-The-Fly ver 3.png
The required columns are: 

  1. The Masked Columns with the correct column names. 
  2. All Other Columns that are needed in the new table - this is usually all. 

Since all columns are generally needed in On-The-Fly masking, this can mean that these Custom SQL can be quite complex. It might also require more memory to hold all columns in memory.

Note, no ID column(s) are needed in On-The-Fly masking. This is because records are inserted instead of updated.

Unique ID for In-Place Masking

The Unique ID is Primary Key or Index

In this case, the Unique ID only needs to be specified in the Custom SQL statement. 

Specify the Unique ID column in the:

  1. Custom SQL statement. 
There is a Unique ID but it is Not a Primary Key or Index

In this case, the Masking Engine does not know which column is the index so we need to specify a Logical Key.

Specify the Unique ID column in the:

  1. Logical Field in the table Rule Set settings. 
  2. Custom SQL statement. 
If there is no Unique Identifier or if the Unique Identifier is masked

If a Unique Key is missing or masked then the following Unique Key is needed in the Custom SQL statement:

  1. Identify database type and determine which Row Id is used/created.
  2. Specify the Unique ID column in the Custom SQL statement. 

Oracle - ROWID

If the database is Oracle, use the ROWID key as identified. In fact, this can be used all the time for Oracle.

select ROWID, MaskedCol from MaskedTable
MS SQL Server - DMS_ROW_ID

On MS SQL Server, if there is no Unique Key a key will be temporarily added by the Masking Engine called 'DMS_ROW_ID'. If the Custom SQL is used this needs to be added in the Custom SQL statement even if the Masking Engine is adding it to the Table. 

select DMS_ROW_ID, MaskedCol from MaskedTable

Advanced Custom SQL constructs 

Some notes of caution here: 

  • Remember to name the column back to the masked column name - for example add "AS MaskedCol".
  • Column and Tables names can be case sensitive, especially when they are quoted (with ” or ’). Always test the SQL statement first.
  • Functions and syntax differ between databases - please check for correct syntax for your database.
  • It is best is to test the SQL on a database design tool. 

Select Values and Formats

Set the masked value or string to a constant

With Custom SQL it is possible to set the masked value to a constant value. This can be very useful if the masked column is a BLOB or a long text that needs to be masked to a specific string. As there is no need to send all data into the Masking Transformation - we can change the BLOB or the TEXT to a short string or value. 

All values in the column to a string constant 
... 'string' AS MaskedCol ...

Example: 

select UniqueID,'Redacted text object' AS LeagalOutcome from MaskedTable
select UniqueID, 'xxxx-xxxx-xxxx-xxxx' AS CreditCard from MaskedTable
All values in the column to a number
... value AS MaskedCol ...

Example:

select UniqueID, 1 AS Salary from MaskedTable

Set the masked value to a random value

It is possible to set each row to a different value - for example a 'random' value.

MS SQL Server Example - generating a random value between 20 and 85:

select ID, ABS(CHECKSUM(NewId())) % 65 + 20 AS Age FROM MaskedTable

Note - A better Algorithm is Secure Lookup!

It might be better to add the values 20 to 85 in a text file and use Secure Lookup as this will handle NULL and spaces as well. The example above is used to highlight what is possible and it is possible to use this in conjunction with an algorithm. 

Cast and Convert the masked value or string

Cast - number
  • Cast value to a 4 digit decimal number. 
  • For example 9.5 to 9.5000.
... CAST(MaskedCol AS decimal(6,4)) AS MaskedCol ...
Convert - number
  • Same as above but with CONVERT.
  • Converts a number to 4 digit decimal number.
... CONVERT(decimal(MaskedCol), 9.5) AS decimal AS MaskedCol ...
Parse a string to money

This example uses parse to convert a string to a value of type money - for example '($1,000.00)'. 

... PARSE(MaskedCol AS MONEY) AS MaskedCol ...
Convert - character Set

This is shown as an example. It will not resolve character conversion errors. It might though be able to handle some conversions prior to performing masking. 

... CONVERT(MaskedCol, 'US7ASCII', 'WE8ISO8859P1') AS MaskedCol ...
Some database reference documentation 

Case - IIF - Then - Else

This is also a very useful building block in Custom SQL. The implementation varies slightly between different databases. One specific case is to filter out NULL values or other specific values before masking. 

Case  - When - Then - Else

This can be nested and multiple WHEN clauses can be specified after each other. 

Syntax:

CASE
   WHEN when_expression THEN result_expression [ ...n ]         
   [ ELSE else_result_expression ]
END

Example:

select 
   UniqueID, 
   CASE WHEN MaskedCol is NULL THEN 1000 ELSE MaskedCol END as MaskedCol 
from 
   MaskedTable
IIF 

IIF is a shorter one-liner similar to the Excel function.

MS SQL Server -Syntax:

... IIF ( boolean_expression, true_value, false_value ) AS MaskedCol ...

Example:

select UniqueID, IIF(MaskedCol is NULL, '', MaskedCol) AS MaskedCol from MaskedTable  
COALESCE

A much better way for these examples though is to use COALESCE. It returns the 'first non-null value in a list of expressions'.  

Syntax:

... COALESCE (expr1, expr2) ...

Example:

select UniqueID, COALESCE (MaskedCol, 'Change Null to this') AS MaskedCol from MaskedTable
Some database reference documentation 

Most of these command exists on most database manufacturers. 

String manipulation

Change string to Upper/Lower

This is a useful function as the masking engine is case sensitive and Secure Lookup will mask 'Peter' and 'PETER' to different values. 

... UPPER(MaskedCol) AS MaskedCol ...
... LOWER(MaskedCol) AS MaskedCol ...

Example:

select UniqueID, UPPER(FirstName) as FirstName from MaskedTable
Concatenate Strings and using Right, Left, SubString

A specific use case for this is if the data in the column is not adhering to a pattern (for example values 0001 is stored as 1 in the database). It can also be that two values in the database need to be concatenated before masking. 

First the syntax - CONCAT:

... CONCAT ( string_value1, string_value2 [, string_valueN ] ) ...

Syntax - RIGHT, LEFT, SUBSTRING

... RIGHT ( character_expression , integer_expression ) ...
... LEFT ( character_expression , integer_expression ) ...
... SUBSTRING ( expression ,start , length )  ...

How it would be used with MaskedCol:

... CONCAT(MaskedCol, ' added string') AS MaskedCol ...

Examples:

select UniqueID, CONCAT(FirstName, " ", LastName) as FullName from MaskedTable
select 
   UniqueID, 
   RIGHT(CONCAT('0000', CAST( MaskedCol as VARCHAR(10))),4) as MaskedCol
from MaskedTable 
Trimming white spaces

The last and very useful functions are LTRIM and RTRIM. These will remove any leading or trailing white spaces - which can cause different masking results. 

Note - All columns of datatype CHAR has trailing white spaces!

A string stored in a column with CHAR and with VARCHAR is different - CHAR is of fixed length and will have trailing spaces while VARCHAR  is of variable length and will usually have no spaces after the string.

For example CHAR(10): "PETER     " and VARCHAR(10): "PETER" 

... LTRIM(MaskedCol) AS MaskedCol ... 
... RTRIM(MaskedCol) AS MaskedCol ...

Example:

These can be added together like in this example.

select ID, LTRIM(RTRIM( MaskedCol )) AS MaskedCol FROM MaskedTable

Where Predicates 

The Custom SQL can also include where clauses in order to:

  • Select out specific records - for example, Personal Accounts vs Company Accounts.
  • Select data per Partition Table.

Note: Any column can be used in the where predicate.

Warning - the negative Predicate is unmasked!

A warning - the unmatched rows in the where predicate will not be masked. Multiple jobs are needed to mask a complete dataset. 

... WHERE AnyCol = expr;

Example: 

select UniqueId, MaskedCol WHERE AnyCol = 'PERSONAL'

Additional Information

External Links

Database SQL Language Reference: