Skip to main content

Data Level Profiling - Parameters and Calculations (KBA8131)




KBA# 8131

At a Glance


This KBA details how Data Level Profiling is calculated and working.

  • The new improved ASDD version is now available.
Percentage (Confidence): Data Level Percentage (Confidence) is calculated using (for exact formula see below): 
% = 100 * [Count rows with PII] / DataLevelRows

This means that the table profiled needs to have more than DataLevelRows rows in the table.
This parameter defines the TOP x number of rows selected for profiling.

It is also used in the Percentage calculation. 
Feature Flag:
Feature flag. If enabled it will count the number of records in the table at the end of the Profile Job.

This feature has issues and is incorrectly documented. Recommendation: Do not use. 
Improvements (ASDD): A new version of Profiling, called ASDD, is now available. This is a significant improvement to Profiling and it is recommended to upgrade and use ASDD.
  • Use version 14.0 or higher.




The KBA will use PII to indicate any type of Sensitive Information to be identified by the Profiler. It can be personal but does not need to be. 

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases





Key Parameters

There are five key parameters to consider. One is set in the Job Configuration and four are Application Settings.

Job Configuration

'Multiple Profiler Expression Check'

Job Configuration: 'Multiple Profiler Expression Check'

  • Default: Disabled.
  • This sets how the profiler will act when there is a match of Multiple Profile Expressions.
  • All Data Level Profile Expressions in the Profile Set are used to calculate the total rows with PII.
  • If more than one is identified, use DefaultMultiphiAlgorithm

Application Settings


Application Settings: DefaultMultiphiAlgorithm

  • Default: NULL SL.
  • This is the algorithm used when 'Multiple Profiler Expression Check' is selected (see above).


The Domain remains the one selected if the flag was not set.



Application Settings: DataLevelPercentage

  • Default: 80% 
  • This is the threshold for the percentage of the total number of PII rows detected.
  • See below for calculation.


Application Settings: DataLevelRows

  • Default: 100
  • This sets the number of rows selected from the profiled table. This ResultSet is used to calculate the number of PII rows. 
  • This number is also used as the Sample Size unless EnableDataLevelCount is enabled. 


Application Settings: EnableDataLevelCount

  • Default: False
  • If enabled, the number of rows in the table is calculated using a SELECT COUNT(1) statement for each table profiled.




This does not profile all rows in the table.




On MS SQL Server, this SELECT COUNT is missing the SCHEMA and is therefore using the default schema for the user (usually dbo). 

If the schema.table does not exist in the default schema for the user the Profile Job will fail.


Calculation and Allocation

The calculation is simple.


  1. Create and execute a Data Level Profile Job that Reads the TOP DataLevelRows from each table. 
    • The Profile Job will Count the number of PII rows matching each individual Profile Expressions in the ResultSet (from the query above - note the TOP).


  2. After the Profile Job execution:
    • If the EnableDataLevelCount is set, Count the number of rows in the table.
    • If this fails, all domains and algorithms will be set to null (as expected).


  3. Percentage Calculation: 
    • if (countedRows < DataLevelRows)
      • then Percentage = PIIrows / countedRows
      • else Percentage = PIIrows / DataLevelRows


  4. If Percentage >= DataLevelPercentage (this is the total - Percentage of all PII rows in the sample)
    • For each Expression having a PII count above, set the Domain in the Profile Result (this is what you see in the UI).
    • If one match only, set Domain and Algorithm to this one. 
    • If multiple matches and Multiple Profiler Expression Check is:
      • Disabled: Set the algorithm to the last one in the list (exact details here unknown).
      • Enabled: Set the algorithm to DefaultMultiphiAlgorithm.

Known issues

The issue happens when the profiled table has fewer rows than DataLevelRows and the SELECT Count is counted on the incorrect schema. Amended to this is that the Percentage is counted on the DataLevelRows (which can cause missing PII in small tables). 


There are some solutions:

  • Use a user that has the default schema set to the profiled tables.
  • If not possible, do not use EnableDataLevelCount and try to group tables in Profile Jobs based on number of rows in the tables.
    • This will minimize the effects and will calculate a close as possible Percentage.



Related Articles

The following articles may provide more information or related information to this article: