Skip to main content
Delphix

Masking a Column With a Primary Key (PK) (KBA1776)

 

KBA

KBA#1776

 

At a Glance

Summary: This page describes how to mask columns with Primary Key (PK) Constraints.

Manual tasks are required in order to successfully mask these.

Note: If the masked column is an IDENTITY column, please see KBA7266 (see link below).
Applies to:  These procedures apply to all databases and verified on the following: 
  • Oracle
  • MS SQL Server
  • SAP ASE (Sybase)
  • MySQL
  • PostgreSQL
  • DB2
  • Maria DB
Algorithm: Referential Integrity is essential when masking a Primary Key column. The Algorithm will, therefore, need to be one of the following: 
  • Character Mapping Algorithm (replaces Segment Mapping)
  • Mapping Algorithm
  • Custom Algorithm developed specifically for a Key (Primary or Foreign) column
Constraints: Different databases have different options around PK Constraints. As an example:
  • DB2 - PK can be dropped. From v10.5, PK can be set as 'not enforced'
  • MS SQL Server - PK can only be dropped
  • MySQL - PK can be dropped or disabled
  • Oracle - PK can be dropped or disabled 
  • Sybase - PK can only be dropped
PK - Steps: The below steps are for Database Connectors that support the creation of the IDENTITY column (DMS_ROW_ID) or the use of ROWID. 

For other Databases Connectors including Extended Connectors, see additional steps below. 
  1. Drop any Index on the PK and drop/disable the PK Constraint.
  2. Refresh the Rule Set.
  3. [if needed] Assign the Masking Algorithm to the PK column (see above for algorithms).
  4. Run the Masking Job.
  5. Recreate the PK and its Index
Below are additional steps for Database Connectors not having the IDENTITY feature:
  • After 1. Create a temporary IDENTITY column and add an Index to this column.
  • Before 5. Drop the temporary IDENTITY column (starting with its Index).

PostgreSQL the column 'ctid' can be used instead - define it as a Logical Key. If the masked column is an IDENTITY column, please see KBA7266 (see link below).

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases
6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0

5.3

5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0, 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

5.2

5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1

5.1

5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0

5.0

5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1, 5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

Masking Columns with PK Constraints

Below are further details about PK and how to mask these columns. Since the PK will require additional actions, these cannot be Disabled using the Job Configuration checkbox. Primary Keys are also frequently referenced by Foreign Keys (see KBA7286 for more details about masking these). 

 

Warning

Warning:

Masking Primary Keys (PK) and Foreign Keys (FK) requires Manual steps and can be tricky. PK should only be masked if necessary.

 

From Delphix Technical Services:

Primary keys are often an internal identifier and may not require masking since the PII associated with them is masked.  Changing these values does little to add security as long as associated PII is masked.

Furthermore, masking Primary and Foreign Keys often cause issues for test cases as well as add complexity to masking applications. An application owner familiar with test cases in lower environments should be consulted prior to expending the effort to mask these.

 

General notes

  1. The Primary Key can be composite - all masked columns must adhere to the key constraint.
  2. All related Keys (PK and FKs) need to be masked.
  3. The Masking Algorithm must have Referential Integrity and 1:1 unique Mapping.
  4. PK Constraints need to be dropped before masking and recreated after. 
  5. Masking PKs do affect Use Cases, Test Cases, and Down Stream Applications
    • Talk to the Application Owner.

Details about Primary Keys (PK) 

  • A Primary Key is a column (or a group of columns) that uniquely identifies each record in a table.

  • Primary Keys must contain UNIQUE values, and cannot contain NULL values.

  • A table can have only one primary key, which may consist of single or multiple columns.

  • A Primary key is a special Constraint - I write special as it has special features and it also differs from database to database.

Resolution

Since the masking engine does not know about all PK Constraints and the desired behavior depends on masking requirements, the drop/disable of PK/FK Constraints needs to be scripted and manually dropped/disabled. 

Firstly,

  • Determine if the PK needs to be masked. 
    If it is not necessary, it might be best to leave this column unmasked.  
  • Get a complete view of all columns with:
    • PK Constraints that are masked and all referencing FK Constraints.
    • FK Constraints that are masked and all referenced PK Constraints. 
  • Consider if it is needed to recreate/re-enable the PK and FK Constraints after masking
    • It might be desired to not create the PK and FK Constraints.
    • It might be desired to not create Indexes or create them differently (for example unclustered).
Prior to Masking 
  • Define the appropriate algorithms for the masked columns.
  • Get the list of all columns with PK and FK required to be masked.
    • All FK Constraints referencing a column that is masked needs to be masked.
    • All PK Constraints referenced by a column that is masked needs to be masked. 
  • Manually Drop/Disable these FK and PK Constraints. 
  • Refresh the Rule Set before masking.
Post Masking
  • Based on the plan for recreating/re-enabling the constraints, manually recreate/re-enable PK and FK Constraints. 
Drop or disable PK? 

The drop or disable of PK needs to be performed manually (even if some databases allow disabling these):

  • DB2 - PK can be dropped and from v10.5 it can be set as 'not enforced'. 
  • MS SQL Server - PK can only be dropped. 
  • MySQL - PK can be dropped or disabled. 
  • Oracle - PK can be dropped or disabled. 
  • Sybase - PK can only be dropped. 

 

For assistance with scripting this, please contact your Customer Success Manager and Technical Services. 

Errors and Issues 

When masking columns you may encounter the following issues/errors:

  • Constraint Violations
  • Slow Performance and Log Size Growth

 

Common related errors: 

  • Oracle: ORA-00001ORA-02291ORA-02292.
  • MS SQL: Violation of PRIMARY KEY constraintForeign key constraint violated.

Constraint Violations 

For the PK and FK Constraints we can have a number of different types of database errors: 

  • PK Constraint violation:
    • The masked value is already in the column.
      For example: masking value x to y and y is already in the column. 
    • The masked values are not unique.  
      For example: masking value x to y and already masked z to y. 
  • FK Constraint violation:
    • PK change violates FK Constraint.
      For example: masking x to y in the parent table but there is an FK that relies upon that there is x in PK. 
    • FK change violates FK Constraint.
      For example: masking x to y in the child table there is no y is in the PK. 

Slow Performance and Log Size Growth

When masking data, generally all rows in the table will be masked.  As a result, for any enabled PK, Unique Key, and FK Constraints, with their associated Indexes, the database has to maintain these while masking. The database also has to generate additional information about the masking changes in the form of undo and redo (Oracle) and transaction logs (examples MS SQL Server, Sybase, DB2) as there will be changes on tables and any indexes associated with the constraints. This potentially makes the masking slower and the database will require more storage to be made available to track Masking Engine generated changes to data.

Note: If the masked column is Indexed using a Clustered Index the performance will be even slower due to the table being sorted. 

Resolution

Drop Indexes on Masked Columns. 

Database with PK and FK Example

Consider the following diagram of an HRS (Human Resource System) for a moment.

Masking - Mask PK FK HRS.png

Masking Last Name, Department Name and similar fields are easy and straightforward. Just define the Rule Set and the Job and the masking job will use the PK as the Unique Row Identifier and we are in business.   

To mask the columns DEPNO and EMPNO there are a few challenges: 

  • There is no longer a URI (Unique Row Identifier) as we are masking the PK.
  • Due to the PK Uniqueness Constraint, the PK needs to be unique after masking.
  • Also, the FK needs to be masked to match the PK.
  • Then there is the cross reference where the two tables reference each other - which table fo you start with?

Now take a look at this example:

Masking_-_Mask_PK_FK_HRS_Payroll.png

In the example above, we have more FK references and another system (Payroll System) accessing the data. We need to mask all these columns if we are to mask 'DEPTNO' and 'EMPNO' and if we do not drop/disable the PK and FK Constraints we will have errors. 

Review the following types of errors and issues that can occur:

Examples: PK Constraint Violation

This example will show two different types of Constraint Violations possible when masking a Primary Key:

  • The masked value is already in the column.
  • The algorithm is not producing a 1:1 unique mapping (there are duplicates in the masked data).

The database error message for these two root causes is the same, so let's start looking at what can cause these errors. 

Background details 

To visualize this, let's create an example. This example will show both types of violations by using an algorithm that is not producing a 1:1 unique mapping (Secure Lookup). The allocation of the masked value is random and the same masked value can appear more than once.

The configuration 

The masking job is configured as follows:

  • Table Columns:
    • 'ID' (logical key)
    • 'Mask PK' (masked)
  • Algorithm: Secure Lookup (with 25 values).
  • Commit Size: 4
Explanation 

What happens here?

  • We Commit in groups of 4 rows.
  • In this example, the error happens in the second Commit.

Steps:

  • Commit 1: Success - the first 4 masked values are Committed ok.
    • Note that row with ID: '1' is masked to '2'. This will later violate a duplicated masked value.
    • Note also that the ID order has changed. More about that in an example further down.
  •  Commit 2: Error - the second lot of 4 values are masked and the Commit is failing. 
    • We are masking '6' to '9' and '4' to '2'.
      Both these violate the constraint for two different reasons.
      • UPDATE example 1.
        • The masked value '9' is already in the table and has not been masked yet.
        • We can never allocate '9' as this will violate the PK constraint. 
      • UPDATE example 2:
        • The masked value '2' is also already in the table.
        • Since the mapping is not 1:1 and unique. '1' is masked to '2" and so is '4' now as well. 

Original Table                 # Masked Table
# before masking                 # after masking

SELECT * FROM Msk_Tbl;           SELECT * FROM Msk_Tbl;

+----+---------+                 +----+---------+
ID | Mask PK |                 | ID | Mask PK |
+----+---------+    >> Mask >>   +----+---------+
| 0  | 0       |                 | 3  | 1       |
| 1  | 1       |                 | 1  | 2*      | < Example 2 violates this value (2)
| 2  | 2       |                 | 0  | 15      |
| 3  | 3       |      Commit 1 > | 2  | 21      |
+----+---------+                 +----+---------+
| 4  | 4       |         Error > | 4  | 9*      | < UPDATE example 1
| 5  | 5       |                 | 5  | 4       |   Value (9) already in the column.
| 6  | 6       |         Error > | 6  | 2*      | < UPDATE example 2
| 7  | 7       |      Commit 2 > | 7  | 16      |   Masked value (2) duplicate.
+----+---------+          Stop > +----+---------+
| 8  | 8       |                 | 8  | 8       |
| 9  | 9       |                 | 9  | 9*      | < Example 1 violates this value (9)
| .. | ..      |                 | .. | ..      |
+----+---------+                 +----+---------+

Error messages 

Error messages from different database types. The two scenarios above have the same error message:

  • DB2
    • 'SQLSTATE: 23505. A violation of the constraint imposed by a unique index or a unique constraint occurred.'
       
    • 'SQL0803N One or more values in the INSERT statement, UPDATE statement, or foreign key update caused by a DELETE statement are not valid because the primary key, unique constraint or unique index identified by "[VALUE]" constrains table "[TABLE]" from having duplicate values for the index key. SQLSTATE=23505'
       
  • MS SQL Server
    • 'Violation of PRIMARY KEY constraint '[PK_CONSTRAINT]'. Cannot insert duplicate key in object '[TABLE]'. The duplicate key value is ([VALUE]).'
       
  • MySQL
    • 'ERROR 1062 (23000): Duplicate entry '[NEW_VALUE]' for key [VALUE]'
       
  • Oracle
    • 'ORA-00001: unique constraint ([PK_CONSTRAINT]) violated'
       
  • Sybase
    • 'Check constraint violation occurred, dbname = [DATABASE], table name = [TABLE], constraint name = [PK_CONSTRAINT].'
Resolution 

To solve this issue:

On all masked columns with PK:

  1. Use an algorithm that produces a 1:1 unique mapping - for example, Character MappingMapping Algorithm, or a Custom Algorithm.
  2. Manually Drop or Disable the PK.

 

 

Related Articles

The following articles may provide more information or related information to this article: