Skip to main content
Delphix

KBA1776 Masking Columns with PK and FK Constraints

 

How to mask columns with PK and FK Constraints?

This page describes how to mask columns with Primary and Foreign Key Constraints and the manual tasks required in order to successfully mask these and all relating tables as fast as possible. 

This page will use PK and FK to abbreviate Primary Key Constraint and Foreign Key Constraint respectively. 

What are a PK and FK?

First, what are PK and FK Constraints and how do they differ from database to database. 

Primary Keys (PK)
  • A PK is a column (or a group of columns) that uniquely identifies each record in a table.

  • Primary keys must contain UNIQUE values, and cannot contain NULL values.

  • A table can have only one primary key, which may consist of single or multiple columns.
     

PK is a special Constraint - I write special as it has special features and it also differs from database to database. Let's look how different vendors define PKs:

  • DB2 - Unique index has to be created when creating the PK. 
  • MS SQL Server - Unique Index is created. The Index will be a Clustered Index by default but can also be nonclustered.
  • MySQL - PKs are indexed with 'UNIQUE NOT NULL'.
  • Oracle - PKs are Indexed with a b-tree index. 
  • Sybase - Unique Index is created. The Index will be a Clustered Index by default but can also be nonclustered.
Foreign Keys (FK)
  • FK links a column (or group of columns) from one table to the PK of another table.
  • The FK constraint is used to prevent actions that would destroy links between tables.

  • The FK constraint also prevents invalid data from being inserted into the foreign key column, because it has to be one of the values contained in the table it points to.

  • All FKs have to be dropped/disabled before PKs. 

  • FK Constraints are frequently implicit - in other words, they are there but not enforced. 

An example - leading to How To

Consider the following diagram of an HRS (Human Resource System) for a moment.

Masking - Mask PK FK HRS.png

Masking Last Name, Department Name and similar fields are easy and straightforward. Just define the Rule Set and the Job and the masking job will use the PK as the Unique Row Identifier and we are in business.   

On the other hand, if we have to mask the columns DEPNO and EMPNO there are a few challenges. Some of the challenges are:  

  • There is no longer a URI (Unique Row Identifier) as we are masking the PK.
  • Due to the PK Uniqueness Constraint, the PK needs to be unique after masking.
  • Also, the FK needs to be masked to match the PK.
  • Then there is the cross reference where the two tables reference each other - which table fo you start with?

Now take a look at this example:

Masking - Mask PK FK HRS Payroll.png

In the example above we have more FK references and another system (Payroll System) accessing the data. We need to mask all these columns if we are to mask 'DEPTNO' and 'EMPNO' and if we don't drop/disable the PK and FK Constraints we will have errors. 

Let's look at what type of errors and issues we can have.

Examples of PK and FK Masking Issues

There are these issues/errors:

  • Constraint Violations
  • Slow Performance
  • Increased use of Transaction Logs and Storage Snapshots. 
Constraint Violations

For the PK and FK Constraints we can have a number of different types of database errors: 

  • Error: PK Constraint violation:
    • Case 1: The masked value is already in the column. 
      • For example: masking value x to y and y is already in the column. 
    • Case 2: The masked values are not unique.  
      • For example: masking value x to y and already masked z to y. 
  • Error: FK Constraint violation:
    • Case 1: PK change violates FK Constraint.
      • For example: masking x to y in the parent table but there is an FK that relies upon that there is x in PK. 
    • Case 2: FK change violates FK Constraint.
      • For example: masking x to y in the child table there is no y is in the PK. 
Slow Performance

When masking data generally all rows in the table will be masked.  As a result any enabled PK, Unique Key, and FK Constraints with their associated Indexes will mean the database has to maintain these while masking is occurring.  The database also has to generate additional information about the masking changes in the form of undo and redo (in Oracles case) and transaction logs (in MS SQL Server for example) as there will be changes occurring in the table and any indexes associated with the constraints.  This potentially makes the masking less performant and require more storage be made available to track masking engine generated changes to data.

Note: if the masked column is Indexed using a Clustered Index the performance will be even slower due to the table being sorted (in the case of MS SQL Server) based on the values in this column. 

Increased use of Transaction Logs and Undo TableSpace

All changes on the database will also result in increased number of entries in the Transaction Logs and the Undo TableSpace (or whatever is used for specific Database Type). 

These are further reasons to drop/disable indexes and PK Constraints prior to masking. 

MS SQL Server

MS SQL Server databases have a transaction log that records all transactions and database modifications made by each transaction. The transaction log is needed to bring your database back to a consistent state, for example in case of a rollback.

When masking a column with an index, the amount of transactions logged in the transaction log increases, and at times this can be significant, say when masking a column with a clustered index. 

The error here is unusual due to the storage space available for the transaction logs.

  • The error: 'The transaction log for database [DATABASE] is full... (Microsoft SQL Server, Error: 9002)'  
Oracle

Oracle has an automatic mechanism that uses undo tablespace to keep track of changes in order to be able to roll back, or undo, changes to the database, primarily before they are committed. The undo tablespace is used to revert a block to it original value in SELECT reads within a transaction. This means that if there are UPDATEs within a transactions and the SELECT needs to revert back to a value that is no longer covered by the undo tablespace, the database will raise an error - ORA-01555, the snapshot is too old to retrieve back to original value for the SELECT statement. 

The root causes for an ORA-01555 are many and complex, usually caused by long running transactions. Another cause is the amount of changes to blocks, and this can happens when masking a column with an index (PK usually uses a b-tree index with a lot of small changes, and these are stored in the undo tablespace) and in this case will see an increased chance to hit this error. 

  • The error: 'ORA-01555: snapshot too old'

PK Constraint violation example

This example will show two different types of Constraint Violations possible when masking a Primary Key.

  • Example 1: The masked value is already in the column.
  • Example 2: The algorithm isn't producing a 1:1 unique mapping (there are duplicates in the masked data).

The database error message for these two root causes is the same so let's start looking at what can cause these errors. 

Background details

To visual this, let's create an example. This example will show both types of violations by using an algorithm that is not producing a 1:1 unique mapping (Secure Lookup). The allocation of the masked value is random and the same masked value can appear more than once.

The configuration

The masking job is configured the following:

  • Table Columns:
    • 'ID' (logical key)
    • 'Mask PK' (masked)
  • Algorithm: Secure Lookup (with 25 values).
  • Commit Size: 4
Explanation

What happens here?

  • We Commit in groups of 4 rows.
  • In this example, the error happens in the second Commit.

Steps:

  • Commit 1: Success - the first 4 masked values are Committed ok.
    • Note that row with ID: '1' is masked to '2'. This will later violate a duplicated masked value.
    • Note also that the ID order has changed. More about that in an example further down.
  •  Commit 2: Error - the second lot of 4 values are masked and the Commit is failing. 
    • We are masking '6' to '9' and '4' to '2' .
      Both these violates the constraint for two different reasons.
      • UPDATE example 1.
        • The masked value '9' is already in the table and has not been masked yet.
        • We can never allocate '9' as this will violate the PK constraint. 
      • UPDATE exampe 2:
        • The masked value '2' is also already in the table.
        • Since the mapping is not 1:1 and unique. '1' is masked to '2" and so is '4' now as well. 

# Original Table                 # Masked Table
# before masking                 # after masking

SELECT * FROM Msk_Tbl;           SELECT * FROM Msk_Tbl;

+----+---------+                 +----+---------+
| ID | Mask PK |                 | ID | Mask PK |
+----+---------+    >> Mask >>   +----+---------+
| 0  | 0       |                 | 3  | 1       |
| 1  | 1       |                 | 1  | 2*      | < Example 2 violates this value (2)
| 2  | 2       |                 | 0  | 15      |
| 3  | 3       |      Commit 1 > | 2  | 21      |
+----+---------+                 +----+---------+
| 4  | 4       |         Error > | 4  | 9*      | < UPDATE example 1
| 5  | 5       |                 | 5  | 4       |   Value (9) already in the column.
| 6  | 6       |         Error > | 6  | 2*      | < UPDATE example 2
| 7  | 7       |      Commit 2 > | 7  | 16      |   Masked value (2) duplicate.
+----+---------+          Stop > +----+---------+
| 8  | 8       |                 | 8  | 8       |
| 9  | 9       |                 | 9  | 9*      | < Example 1 violates this value (9)
| .. | ..      |                 | .. | ..      |
+----+---------+                 +----+---------+

 

Error messages

Error messages from different database types. The two scenarios above have the same error message:

  • DB2
    • 'SQLSTATE: 23505. A violation of the constraint imposed by a unique index or a unique constraint occurred.'
       
    • 'SQL0803N One or more values in the INSERT statement, UPDATE statement, or foreign key update caused by a DELETE statement are not valid because the primary key, unique constraint or unique index identified by "[VALUE]" constrains table "[TABLE]" from having duplicate values for the index key. SQLSTATE=23505'
       
  • MS SQL Server
    • 'Violation of PRIMARY KEY constraint '[PK_CONSTRAINT]'. Cannot insert duplicate key in object '[TABLE]'. The duplicate key value is ([VALUE]).'
       
  • MySQL
    • '-- ERROR 1062 (23000): Duplicate entry '[NEW_VALUE]' for key [VALUE]'
       
  • Oracle
    • 'ORA-00001: unique constraint ([PK_CONSTRAINT]) violated'
       
  • Sybase
    • 'Check constraint violation occurred, dbname = [DATABASE], table name = [TABLE], constraint name = [PK_CONSTRAINT].'
Resolution

To solve this issue.

On all masked columns with PK:

  1. Use an algorithm that produces a 1:1 unique mapping - for example, Mapping Algorithm or Segment Mapping.
  2. Depending on the database, manually Drop or Disable the PK.

FK Constraint violation example 

This example shows two different Constraint Violations when masking in relation to Foreign Key Constraints.

  • Example 1: A value in the PK column changes and now violates the FK Constraint.
  • Example 2: A value in the FK column changes and now violates the FK Constraint.

The database error message for these two is different. Below are two examples which illustrate this. 

Example - PK value changes and violates FK Constraint

This example will show both types of violations by using an algorithm that is not producing a 1:1 unique mapping (Secure Lookup). The allocation of the masked value is random and the same masked value can appear more than once.

Background details

This is also best illustrated by an example. This example shows both types of violations by using an algorithm that is not producing a 1:1 unique mapping (Secure Lookup). The allocation of the masked value is random and the same masked value can appear more than once.

The configuration

The masking job is configured the following:

  • Algorithm: Secure Lookup (with 25 values).
  • Commit Size: 4
     
  • Example 1:
    • Two tables: 'Parent_Tbl' (masked) and 'Child_Tbl'.
    • 'Parent_Tbl':
      • 'ID' (logical key).
      • 'Mask PK' (masked - with Primary Key Constraint).
    • 'Child_Tbl':
      • 'ID' (logical key).
      • 'FK' (with Foreign Key Constraint).
         
  • Example 2:
    • Two tables: 'Parent_Tbl' and 'Child_Tbl' (masked).
    • 'Parent_Tbl':
      • 'ID' (logical key).
      • 'PK' (with Primary Key Constraint).
    • 'Child_Tbl':
      • 'ID' (logical key).
      • 'Mask FK' (with Foreign Key Constraint).
Explanation

What happens here?

  • We Commit in groups of 4 rows.
  • In this example, the error happens in the first Commit.

 

Example 1 - steps: 

  1. The values are masked, here to show the violation '0' has been masked to 'A' (all the other masked values are ok).
  2. At Commit, the constraint is checked and the FK Constraint is violated as '0' is referenced by the child table in the column with the FK. 
     

Example 2 - steps:  

  1. The values are masked, here to show the violation '0' has been masked to 'A' (all the other masked values are ok).
  2. At Commit, the constraint is checked and the FK Constraint is violated as 'A' is not in the referenced column with the PK. 

 

# Original Tables before masking                 # Masked Table after masking

SELECT * FROM Parent_Tbl;                        SELECT * FROM Parent_Tbl;
SELECT * FROM Child_Tbl;

       +- FK_CONST -----+
       ^                v
+----+----+      +----+---------+                +----+---------+
| ID | FK |      | ID | Mask PK |                | ID | Mask PK |
+----+----+      +----+---------+   >> Mask >>   +----+---------+
| 1  | 0  |      | 1  | 0       |        Error > | 1  | A       | < FK Violation. 
| 2  | 0  |      | 2  | 1       |                | 2  | 2       |
| 3  | 0  |      | 3  | 2       |                | 3  | 1       |
| 4  | 2  |      | 4  | 3       |       Commit > | 4  | 3       |
+----+----+      +----+---------+                +----+---------+
Child_Tbl        Parent_Tbl                      Parent_Tbl

 

SELECT * FROM Parent_Tbl;
SELECT * FROM Child_Tbl;                         SELECT * FROM Child_Tbl;

 

       +----- FK_CONST ----+
       v                   ^
+----+----+      +----+---------+                +----+---------+
| ID | PK |      | ID | Mask FK |                | ID | Mask FK |
+----+----+      +----+---------+   >> Mask >>   +----+---------+
| 1  | 0  |      | 1  | 0       |        Error > | 1  | A       | < FK Violation. 
| 2  | 1  |      | 2  | 0       |                | 2  | A       |
| 3  | 2  |      | 3  | 2       |                | 3  | 3       |
| 4  | 3  |      | 4  | 2       |       Commit > | 4  | 3       |
+----+----+      +----+---------+                +----+---------+
Parent_Tbl       Child_Tbl                       Child_Tbl

Error messages

Error messages from different database types. Each scenarios above have different error messages:

  • DB2
     
  • MS SQL Server
    • Example 1: 'The UPDATE statement conflicted with the REFERENCE constraint "FK_CONST".
      The conflict occurred in database "DATABASE", table "Child_Tbl", column 'Mask FK'.
      '
       
    • Example 2: 'The UPDATE statement conflicted with the FOREIGN KEY constraint "FK_CONST". The conflict occurred in database "DATABASE", table "Parent_Tbl", column 'Mask PK'.'
       
  • MySQL
    • Example 1: "HA_ERR_ROW_IS_REFERENCED: 
      Foreign key constraint violated: Referenced row exists
      "
    • Example 2: ": HA_ERR_NO_REFERENCED_ROW: 
      Foreign key constraint violated: No parent row found
      "
       
  • Oracle
    • Example 1: 'ORA-02292: integrity constraint (FK_CONST) violated - child record found'
    • Example 2: 'ORA-02291: integrity constraint (FK_CONST) violated - parent key not found'
       
  • Sybase
    • Example 1: "23000: Dependent foreign key constraint violation in a referential integrity constraint. dbname = DATABASE_name, table name = TABLE, constraint name = FK_CONST."
    • Example 2: "23000: Foreign key constraint violation occurred, dbname = DATABASE_name, table name = TABLE, constraint name = FK_CONST."
Resolution

To solve this issue.

On all columns with an FK reference to a masked column with a PK:

  1. The same algorithm used for the PK column needs to be used on the FK.
  2. Depending on the database, manually Drop or Disable the FK.

PK Clustered Index on MS SQL Server/Sybase causing sorting

This example shows how data in a masked column with a PK with Clustered Index is causing the table to be sorted. This will affect performance greatly. 

Background details

In normal database operations, using Clustered Index has normally great performance gains. On MS SQL Server and Sybase, a clustered index means the data is sorted and that the rows are stored physically on the disk in the same order. This means that when the data is updated, the sort order and the order on the disk will change. 

There are no errors in this example, just an example of why the performance is degraded.  

The configuration

The masking job is configured the following:

  • Table Columns:
    • 'ID' (logical key).
    • 'Mask PK' (masked) (with Primary Key Constraint).
  • Algorithm: 'Mapping Algorithm' (with 5+ values)
    • To show the changed order - 1-5 have been chosen for the original values and A-E as masked values.
  • Commit Size: 5
Explanation

What happens here?

  • We Commit in groups of 4 rows.
  • The order happens after each Commit.

Steps:

  • The first 4 values are masked:
    • 1 to C
    • 2 to B
    • 3 to D
    • 4 to A
  • Commit: 
    • The data is committed and ordered based on the PK column. 
    • The new order is: A, B, C, D, 5, 6, 7, 8

# Original table                # Masked table
# before masking                # after masking

SELECT * FROM Msk_Tbl;          SELECT * FROM Msk_Tbl;

+----+---------+                +----+---------+
| ID | Mask PK |                | ID | Mask PK |
+----+---------+   >> Mask >>   +----+---------+
| 0  | 1       |                | 3  | A       | ! Note the ID, the order has changed.
| 1  | 2       |                | 1  | B       | 
| 2  | 3       |                | 0  | C       |
| 3  | 4       |         Commit | 2  | D       |
+----+---------+                +----+---------+
| 4  | 5       |                | 4  | 5       | ! Note that here the data is as is.
| 5  | 6       |                | 5  | 6       |
| 6  | 7       |                | 6  | 7       |
| 7  | 8       |                | 7  | 8       |
+----+---------+                +----+---------+

Resolution

To solve this issue.

On all masked columns with a Clustered Index (PK or no PK):  

  1. On MS SQL Server and Sybase, manually Drop the Clustered Index (PK).

Resolution - How to mask columns with PK and FK Constraints

Since the masking engine doesn't know about all FK Constraints and the desired behavior depends or masking requirements, the drop/disable of PK/FK Constraints needs to be scripted and manually dropped/disabled. 

First steps are to:

  • Consider that it is a bad idea to mask a PK.
    If it needs to be done, ok it needs to be done but the best is to try to avoid it. 
  • Get a complete view of all columns with:
    • PK Constraints that are masked and all referencing FK Constraints.
    • FK Constraints that are masked and all referenced PK Constraints. 
  • Consider if it is needed to recreate/re-enable the PK and FK Constraints after masking
    • It might be desired to not create the PK and FK Constraints.
    • It might be desired to not create Indexes or create them differently (for example unclustered).
How to - steps

Before masking job - steps:

  • Define appropriate algorithm for the masked columns.
  • Get the list of all columns with PK and FK required to be masked.
    • All FK Constraints referencing a column that is masked needs to be masked.
    • All PK Constraints referenced by a column that is masked needs to be masked. 
  • Manually Drop/Disable these FK and PK Constraints before masking. 
    • FKs first and then PKs. 

After the masking job:

  • Based on the plan for recreating/re-enabling the constraints. Manually recreate/re-enable PK and FK Constraints. 
    • PKs first and then FKs. 
What actions can be taken to drop/disable PK?
  • DB2 - PK can be dropped and from v10.5 it can be set as 'not enforced'. 
  • MS SQL Server - PK can only be dropped. 
  • MySQL - PK can be dropped or disabled. 
  • Oracle - PK can be dropped or disabled. 
  • Sybase - PK can only be dropped. 
What actions can be taken to drop/disable FK?
  • DB2 - FK can be dropped or set as 'not enforced'. 
  • MS SQL Server - FK can only be dropped. 
  • MySQL - FK can be dropped or disabled. 
  • Oracle - FK can be dropped or disabled. 
  • Sybase - FK can only be dropped. 

 

For assistance to script this, please contact your Customer Success Manager and Professional Services.