Skip to main content
Delphix

How Profiling works

This page provides additional information to the user documentation about Managing Profiler Settings.  

When is it used?

Assigning Domains/Algorithms to Columns in a Rule Set can be time-consuming and profiling is a way to automate this process. Profiling can look at both the Column name (i.e. First Name, Last Name, Address 1, Credit Card Number etc) and the Data (values) in the database. 

Different types of profiles (or Profile Sets) can be defined and customized for specific requirements. These are then defined in a Profile Job which when executed assigns Domains/Algorithms to Columns based on the search criteria in the Profile Set.

How it all hangs together?

To use profiling, there are three different components that need to be defined:

  1. The Profile Set with search Expressions.
  2. A Rule Set to the columns/data to Profile.
  3. A Profile Job
Profile Set

There are two different components in a Profile:

  1. Profile Set
    • The Profile Set is a set of expressions which are used in the Profile Job. You can add and modify the Profile Sets. 
  2. Expression 
    • These are Regular Expressions (RegEx) that are applied to either a Column name or the data in a Column. 

 

Profile and Job setup.png

Rule Set

A Rule Set contains the Connector to the data, the Tables in the Database to be masked and the Columns in the tables.

Profile Job

The Profile Job is configured around the Rule Set (see above) and when executed applies the Profile Expressions to the database. The matched Domains/Algorithms are assigned to the Columns in the Rule Set.

The two different types of Profiling

There are two different types of profiling:

  1. Column Level profiling 
    • Searches through the Column names in the Rule Set from the connected database. looking for specific patterns in Column names.
  2. Data Level profiling 
    • Searches through the top x rows of data itself in the Rule Set, looking for specific patterns in the data. By default, the 100 top rows are searched, but this is configurable via support. 

The fastest way to profile a Rule Set is to use Column Level profiling.  There are special cases where Column Level profiling is not sufficient (i.e. the Column names are not specific enough) and Data Level profiling is the only way to profile and assign the correct Domain/Algorithm to a column.

Note that a Profile Set can contain both Column Level profiling and Data Level profiling. Column Level is executed first and will override Data Level profiling.  

Default Profile Sets (Out of the Box)

New Profile Sets can be created and they can be edited to suit specific profiling requirements. When the Masking Engine is installed, the following two Profile Sets and Profile Configurations are pre-installed:

  1. Financial
    • Column Level profiling only configured (for a list of Expressions - scroll down).
  2. HIPAA
    • Column Level profiling only configured (for a list of Expressions - scroll down).

Expression Examples

Column Level examples
Simple expression - First Name
(?>(fi?rst)_?(na?me?)|f_?name)(?!\w*ID)

This RegEx has two groups and matches: 

  • Group 1: (?>(fi?rst)_?(na?me?)
    • Any leading string, then
    • first or frst, then
    • optional _, then 
    • name, nam, nme or nm
  • Group 2:  f_?name)(?!\w*ID)
    • f_name or fname
Complex expression - Address
^(?:(?!postalcode|city|state|country|(l|ln|lin|line)?_?2{1}|ID).)*addre?s?s?_?(?:(?!city|state|country|(l|ln|lin|line)?_?2{1}|ID).)*$

This RegEx has multiple groups and quantifiers. 

Data Level examples
Simple expression - PO Box
po box|p\.o\.

This RegEx has two groups and matches: 

  • Group 1: po box
    • po box and any thing there after - i.e. po box 123
  • Group 2:  p\.o\.
    • p.o. and any thing there after - i.e. p.o. 1234
Complex expression - Address
(.*[\s]+b(ou)?l(e)?v(ar)?d[\s]*.*)|(.*[\s]+st[.]?(reet)?[\s]*.*)|(.*[\s]+ave[.]?(nue)?[\s]*.*)|(.*[\s]+r(oa)?d[\s]*.*)|(.*[\s]+l(a)?n(e)?[\s]*.*)|(.*[\s]+cir(cle)?[\s]*.*)

This RegEx has multiple groups and quantifiers. 

Out of the Box Expressions and Profile Sets

Listing Financial Profiling Expressions

The Profile Set "Financial" contains, Out of the Box, the following Expressions - all are Column Level expressions:

  •  Birth Date
  •  Address
  •  Address Line 2 - after
  •  Account Number
  •  Birth Date1
  •  Customer Number
  •  Credit Card Number
  •  Card Number
  •  Drivers License Number
  •  Drivers License Number1
  •  Email
  •  First Name
  •  Last Name
  •  Middle Name
  •  Security Code
  •  Social Security Number
  •  Tax ID Code or Number
  •  Telephone or Contact Number
  •  Zip or Postal Code
  •  School Name
  •  Tax ID Number
  •  Address Line2 - before
  •  Birth Date2
  •  PO Box
  •  Fax Number
  •  Street Address
  •  IP Address
  •  Web or URL Address

Listing HIPAA Profiling Expressions

The Profile Set "HIPAA" contains, Out of the Box, the following Expressions - all are Column Level expressions:

  •  Birth Date
  •  Address
  •  Address Line 2 - after
  •  Account Number
  •  Birth Date1
  •  Customer Number
  •  Credit Card Number
  •  Card Number
  •  Drivers License Number
  •  Drivers License Number1
  •  Email
  •  First Name
  •  Last Name
  •  Middle Name
  •  Security Code
  •  Social Security Number
  •  Tax ID Code or Number
  •  Telephone or Contact Number
  •  Zip or Postal Code
  •  School Name
  •  Tax ID Number
  •  Address Line2 - before
  •  Birth Date2
  •  Beneficiary Number
  •  Certificate Number
  •  City
  •  County
  •  License Plate
  •  PO Box
  •  Precinct
  •  Record Number
  •  Serial Number
  •  Signature
  •  VIN
  •  Vehicle
  •  Fax Number
  •  Admission Date
  •  Treatment Date
  •  Discharge Date
  •  Street Address
  •  IP Address
  •  Web or URL Address
  •  Biometric
  •  Certificate ID
  •  Beneficiary ID