At a Glance
|Description:||This KBA details how Data Level Profiling is calculated and working.|
|Percentage:||Data Level Percentage is calculated using (for exact formula see below):
This means that the table profiled needs to have more than DataLevelRows rows in the table.
|This parameter defines the TOP x number of rows selected for profiling.
It is also used in the Percentage calculation.
|Feature flag. If enabled it will count the number of records in the table at the end of the Profile Job.
This feature has bugs and is incorrectly documented. Recommendation: Do not use.
|Improvements:||There is an ongoing project to update and improve Profiling. The feature will also get a new name.
Improvements will be incremental and these are slated for the next upcoming versions. ETA unknown.
Applicable Delphix Versions
- Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases 6.0 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199
188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199
188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206
220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206, 220.127.116.11
18.104.22.168, 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199
There are five key parameters to consider. One is set in the Job Configuration and four are Application Settings.
'Multiple Profiler Expression Check'
Job Configuration: 'Multiple Profiler Expression Check'
- Default: Disabled.
- This sets how the profiler will act when there is a match of Multiple Profile Expressions.
- All Data Level Profile Expressions in the Profile Set are used to calculate the total rows with PII.
- If more than one is identified, use DefaultMultiphiAlgorithm.
Application Settings: DefaultMultiphiAlgorithm
- Default: NULL SL.
- This is the algorithm used when 'Multiple Profiler Expression Check' is selected (see above).
Application Settings: DataLevelPercentage
- Default: 80%
- This is the threshold for the percentage of the total number of PII rows detected.
- See below for calculation.
Application Settings: DataLevelRows
- Default: 100
- This sets the number of rows selected from the profiled table. This ResultSet is used to calculate the number of PII rows.
- This number is also used as the Sample Size unless EnableDataLevelCount is enabled.
Application Settings: EnableDataLevelCount
- Default: False
- If enabled, the number of rows in the table is calculated using a SELECT COUNT(1) statement for each table profiled.
Calculation and Allocation
The calculation is simple.
- Create and execute a Data Level Profile Job that Reads the TOP DataLevelRows from each table.
- The Profile Job will Count the number of PII rows matching each individual Profile Expressions in the ResultSet (from the query above - note the TOP).
- After the Profile Job execution:
- If the EnableDataLevelCount is set, Count the number of rows in the table.
- If this fails, all domains and algorithms will be set to null (as expected).
- Percentage Calculation:
- if (countedRows < DataLevelRows)
- then Percentage = PIIrows / countedRows
- else Percentage = PIIrows / DataLevelRows
- if (countedRows < DataLevelRows)
- If Percentage >= DataLevelPercentage (this is the total - Percentage of all PII rows in the sample)
- For each Expression having a PII count above, set the Domain in the Profile Result (this is what you see in the UI).
- If one match only, set Domain and Algorithm to this one.
- If multiple matches and Multiple Profiler Expression Check is:
- Disabled: Set the algorithm to the last one in the list (exact details here unknown).
- Enabled: Set the algorithm to DefaultMultiphiAlgorithm.
The issue happens when the profiled table has fewer rows than DataLevelRows and the SELECT Count is counted on the incorrect schema. Amended to this is that the Percentage is counted on the DataLevelRows (which can cause missing PII in small tables).
There are some solutions:
- Use a user that has the default schema set to the profiled tables.
- If not possible, do not use EnableDataLevelCount and try to group tables in Profile Jobs based on number of rows in the tables.
- This will minimize the effects and will calculate a close as possible Percentage.