Skip to main content
Delphix

Profile Data in Database Column for SAP ASE (Sybase SQL Server) (KBA5266)

 

 

KBA

KBA# 5266

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases
6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.8.1, 6.0.9.0, 6.0.10.0, 6.0.10.1, 6.0.11.0

5.3

5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

5.2

5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1

5.1

5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0

5.0

5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1 ,5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

Overview

This page shows queries to use on SAP ASE (Sybase SQL Server) to profile data. This is especially useful to discover non-conforming data.

Description This page shows queries to use on SQL Server to profile data. This is especially useful to discover non-conforming data. 
Applies to SAP ASE (Sybase SQL Server) 
Non-Conforming Classification This is useful to identify Non-Conforming characters on the Masking Engine.
The Classification used on the engine is: 
  1. Letters (L)
  2. Numbers (N)
  3. Marks (M)
  4. Separators (Z)
  5. Punctuation (P)
  6. Symbols (S)
  7. Other (O)

For details about the Unicode characters please look here: 
https://www.compart.com/en/unicode/category/

Queries The following queries are listed below: 
  • How many rows in the table?
  • How many rows with a specific length?
  • How many duplicated column values?
  • How many rows with a specific profile?
  • How many rows with special characters in the cell?
  • What rows have Special Characters (how to find them)?
More info For more information about Non-Conforming data:

Profiling of Data in Masked Column  

These queries (and functions) are used to profile data in columns to be masked to understand what algorithm is best suited and how to configure the algorithm. Unicode character profile has been used to match the patterns reported in the masking engine Monitor Page for nonconforming data. 

The following codes have been used: 

  • N - Numbers (0-9)
  • L - Letters (A-Z)
  • Z - Space
  • All the other characters are shown.

Queries for SAP ASE (Sybase SQL Server)    

How many rows in the table?  

This is important to know to understand the statistical distribution of the profile result below. 

-- Replace [table]
--  
SELECT COUNT(*) cnt_rows FROM [table]
GO
Example  
 cnt_rows
 -----------
     4534824

(1 row affected)

How many rows with a specific length?  

This query shows the number of rows with a specific length (number of characters). The query has been limited to only count the first 1,000 rows. 

-- Replace [table]
-- Replace [mask_col]
-- Only displaying top 1000 rows. Change if more is needed. 
--
SELECT TOP 1000 LEN([mask_col]) as str_len, Count(*) as cnt
FROM [table]
GROUP BY LEN([mask_col])
ORDER BY cnt DESC
GO
Example  

Some algorithms are specifically created to fit a specific length. This example shows 'Home Phone Number' and there are clearly some additional data other than phone numbers here. 

 str_len     cnt
 ----------- -----------
          10         204 
          11          65 
          12          10 
...
          24           2 
          25           1 
          26           1 

(16 row affected)

How many duplicated column values?  

For uniqueness and for URIs (Unique Row Identifier), this query is important. This query shows if there are duplicated values in a column. The expected result is for all entries to have 'dups' = 1.

This might need to be executed on the column before as well as after masking. The columns to investigate are PKs, Identifiers, and masked columns, also on columns with unique constraints. There might be composite PKs.

Note

Note:

No Values from the columns are shown. The query only shows statistics, i.e., the number of duplicates.

 

-- Replace [table]
-- Replace [column]
--
-- dups = 1 indicate unique - if > 1 there are duplicates.
--
SELECT s1.DUPS, count(*) as CNT
FROM (
   SELECT [column], count(*) as DUPS
   FROM [table]
   GROUP BY [column]) s1
GROUP BY s1.DUPS
Example 

DUPS = 1 means unique entries. There are 4,534,822 unique entries in the example below and there is 1 duplicated value.

DUPS       CNT
---------- ----------
     1       4534822
     2             1

How many rows with a specific profile?  

This query shows the number of rows with a specific length (number of characters). The query has been limited to only count the 1,000 first rows. 

-- Create Function Below
-- In the function - change the size RETURNS VARCHAR(50) to fit the profiled data
-- Replace [table]
-- Replace [mask_col]
--
SELECT TOP 1000 dbo.profileASCII([mask_col]) as profile, Count(*) as cnt
FROM [table]
GROUP BY dbo.profileASCII([mask_col])
ORDER BY cnt DESC
GO
Function  

To use the query above, please create this function first. It will profile data based on: 

  • N - Numbers (0-9)
  • L - Letters (A-Z)
  • Z - Space
  • All the other characters are shown.
-- SAP ASE
CREATE FUNCTION [dbo].[profileASCII](@inputString VARCHAR(1000))
RETURNS VARCHAR(50)
AS
     BEGIN
         DECLARE @tmpChr VARCHAR(10)
         DECLARE @tmpStr VARCHAR(10)
         DECLARE @increment INT
         SET @increment = 1
         WHILE @increment <= DATALENGTH(@inputString)
             BEGIN
                 SET @tmpChr = SUBSTRING(@inputString, @increment, 1)
                 IF (@tmpChr LIKE '[0-9]') SET @tmpStr = 'N' ELSE 
                 IF (@tmpChr LIKE '[A-Z]') SET @tmpStr = 'L' ELSE
                 IF (@tmpChr LIKE '[ ]')   SET @tmpStr = 'Z' ELSE
                 SET @tmpStr = @tmpChr

                 SET @inputString = LEFT(@inputString, @increment -1) + @tmpStr + RIGHT(@inputString, LEN(@inputString) - @increment )
                 SET @increment = @increment + 1
             END
         RETURN @inputString
     END
GO
Example 

The example below shows an example from profiling 'Salutation' (for example 'Mr'). The example shows the length and that the data has some records with full stop and some without. 

 profile              cnt
 -------------------- -----------
 LL                       376,849
 LL.                      206,021
 LLL.                      34,005
 LLLL                      31,291
 LLL                          127
 LLLL.                          3

(6 rows affected)

How many rows with special characters in the cell?  

This query has specifically been created to extract out special characters in data that needs to be ignored in masking algorithms. The query shows the number of rows with special characters. The query has been limited to only count the 1,000 first rows. 

-- Create Function Below
-- In the function - change the size RETURNS VARCHAR(50) to fit the profiled data
-- Replace [table]
-- Replace [mask_col]
--
SELECT TOP 1000 dbo.profileSpecialASCII([mask_col]) as special, Count(*) as cnt
FROM [table]
GROUP BY dbo.profileSpecialASCII([mask_col])
ORDER BY cnt DESC
GO
Function   

To use the query above, please create this function first. 

  • All the other characters are only shown.
-- SAP ASE
CREATE FUNCTION [dbo].[profileSpecialASCII](@inputString VARCHAR(1000))
RETURNS VARCHAR(50)
  AS
    BEGIN
      DECLARE @tmpChr VARCHAR(10)
      DECLARE @tmpStr VARCHAR(10)
      DECLARE @increment INT
      SET @increment = 1
      WHILE @increment <= DATALENGTH(@inputString)
        BEGIN
          SET @tmpChr = SUBSTRING(@inputString, @increment, 1)
          IF (@tmpChr LIKE '[0-9]') SET @tmpStr = 'x' ELSE
          IF (@tmpChr LIKE '[A-Z]') SET @tmpStr = 'x' ELSE
          SET @tmpStr = @tmpChr
          SET @inputString = LEFT(@inputString, @increment -1) + @tmpStr + RIGHT(@inputString, LEN(@inputString) - @increment )
          SET @increment = @increment + 1
        END
      RETURN STR_REPLACE(@inputString, 'x', '')
   END
GO
Example  

Finding special characters in the data could be very useful and tells a lot about the data - in some cases, it needs to be taken care of. The sample below is taken from a 'Fullname' column. 

 profile              cnt
 -------------------- -----------
 _                        205,585 
 _&                        98,783 
 _-                        72,624 
 _++                       24,233 
 _'                        17,766 
 _()                       13,579 
 _/                        12,158 
 _&&                        8,431 
 _.                         8,053 
 _**                        7,714 
 _+                         7,425 
 _*                         6,294 
 _//                        4,992 
 _#                         4,213 
 _****                      4,162 
 _--                        3,677 
 _,                         1,365 
 _()*                       1,245 
 _(/)                       1,204 
 _---                       1,146 
 
(21 rows affected)

What rows have Special Characters (how to find them)?  

This query can be used to show the first records in the database with Special Characters. The number of rows selected might need to be changed.

-- Replace [table]
-- Replace [mask_col]
-- Limited to 50 rows.

SELECT TOP 50 [mask_col]
FROM [table] where [mask_col] LIKE '%[^0-9a-zA-Z]%';