This section allows the user to configure different masking functions for the imported data model. In addition, the user has the ability to manually add custom relations (FK’s).
The left side of the screen displays the imported schemas and tables. The right panel shows the details of the selected table and the data masking functions configured for the selected table. Either the Add button or a right click on the selected column(s) will allow the user to add a masking function on the selected column(s).
You can select multiple columns using the standard Windows <SHIFT> or <CTRL> options to select a range or selection of columns.
Description of the data masking functions
The following data masking functions are available in DATPROF Privacy.
Functions are datatype specific.
This function will NULL the selected column(s). This function can’t be used on columns which are specified as NOT NULL or columns upon which a Primary Key (PK) is defined.
This feature ensures that for numeric data types all characters are replaced with the number '1 ', and alphanumeric data types all characters are replaced by the symbol' x '.
Both the length of the original value and possible upper case letters in the original value are maintained.
The Function Editor of the Shuffle and Group Shuffle is slightly different compared to the other functions.
This feature ensures that all values in the selected column(s) are randomly mixed, like shuffling a deck of cards. The data distribution remains the same, which will ensure that the same values in the former situation, also obtain the same new values. A normal shuffle is a shuffle without selecting the group checkboxes.
Example of the Function Editor with a normal shuffle: Shuffle LAST_NAMES.
If the function is applied to more than one column at the same time, a multi column shuffle is applied. The values of the selected columns remain together and will be shuffled as a whole. In this way, for example, the gender and a first name are held together.
The option 'Exclude null values from shuffling' ensures that NULL values will not be considered for shuffling, retaining their place in their specific row(s).
Shuffle example on LAST_NAME.
The Group shuffle is a variation of the shuffle. In the application you add this function as a normal function and also check the Group checkboxes.
You may select one or more columns as a group. Within this group the unique values of the selected columns are shuffled.
NULL or Empty values in the Group column are considered as a group.
By using a group shuffle fewer values are used to shuffle because only unique values within in a group are used to shuffle. For this reason the "group" column(s) should result in a reasonable number of values to shuffle.
Example of the Function Editor with a group shuffle: Shuffle FIRST_NAMES grouped by GENDER.
Group shuffle example:
Above we see three Gender groups: M, F and Empty.
Kelly exists as Male and Female. Blaise and Maxima have no gender.
The group shuffle using FIRST_NAME as column and GENDER as Group results in shuffling three groups where the empty group, having only two records, does not change.
Fixed day in same month/year
This feature can be used for date fields only. Here the user has the ability to change the existing date to a fixed day in the same month. Or to a fixed day in the first month of the same year. With this change, in most cases the new values remain functionally viable.
Fixed day: 13
Fixed day in same month
Fixed day in first month of same year
The ‘Custom Expression’ is a generic function where any and all database platform functions can be used to manipulate the data in the selected column.
<column> / 4
<column> - 2
To verify the syntax of the expression it can be tested using the Test button which will send the defined expression to the database for verification.
The verification shows if the expression is valid or not with a message next to the Test button, per the panel below, if invalid. A valid expression will return 'Query validated'
To test an expression it is required to have a working database connection.
With this function the replacement value will be obtained from a lookup table or translation table. This translation table may well be the result of a previously executed Privacy function where, for instance, consistent masking was implemented using the translation table methodology.
The field Name can be used to enter a meaningful name.
The field Description allows you to enter a description of the rule which will be displayed in the comments report.
The field Lookup Table is a combo-box which allows the user to either select a table from a list or specify a table name manually. This lookup table contains both the columns to search for and the columns containing data to replace the original values with.
If another Privacy function uses the Translation Table feature the result is saved as a Translation table and will be shown in the drop-down list.
If the Lookup table is built outside a Privacy function it will not be present in the list and must be manually entered.
Using a view as a lookup table
A manually entered lookup table may well be a view created by you or your DBA.
Depending on the complexity of the view and size of the dataset you may encounter a significant performance penalty.
Consider creating a temporary table instead of using a view. Benchmarks show a performance boost up to 95% by using a table instead of a view.
The search criteria can be entered under Input mapping.
The Input column is a drop-down list where the search column can be selected.
The Lookup column is a combo-box where, if the Lookup table is generated by a Privacy function, the fields to search for are listed for selection. With other tables the field name should be entered manually.
Under Output mapping it is possible to select the columns to replace the original values with replacement data from the Lookup table.
The Original column is a drop-down list where the column name must be selected with the data to be replaced.
The Lookup column is a combo-box where, if the Lookup table is generated by a Privacy function, the column in the Lookup table can be selected from a list to replace the Original data with. With other tables the column name should be entered manually.
The If value not found then feature provides options to take should the Original column value not exist in the Lookup table. Options include retaining the original value, NULL'ing the item, specifying a Custom Expression or Scrambling (redacting) the existing value.
Using this function, which is a variation of the Value Lookup, a new value will be randomly obtained from a 'Lookup' or 'Translation' table.
When using a Random Lookup an equivalent value in two or more tables will not be consistently replaced.
As with the Value Lookup a Lookup table and a Lookup column can be chosen from a combo-box list or be entered manually.
Description of the Generate functions
Besides different masking functions that change existing data, DATPROF Privacy, as of version 4, comes with a list of generators that create replacement data. The generators will never process any data from the database itself, but instead create new data and integrate it within the database. It will generate exactly the right amount of data based on the content of the table and the optional user defined condition.
All generators require either the direct deployment scheme or the Runtime based deployment execution. The older (DATPROF Privacy 3.x) legacy direct deployment scheme does not support execution of generators so such rules will be skipped.
Each generator comes with their own parameters. Each generator contains a parameter to Replace the same original value with the same generated value.
When checked, DATPROF Privacy will replace, for example, the same first names within the existing data set with the same newly generated first name. For example: every occurrence of "John" will be replaced by the generated value of "Leonardo". Or multiple instances of the same IBAN must be replaced with the same generated IBAN.
Random versus Deterministic Masking
All generators that use seed lists (for example; First name, Last name, Company, Job, User-Defined), can be randomly or deterministic replace the original values. When replacing original values in Random mode, each value will be replaced with a random value from the seed list. When the Replace the same original values with the same generated value is checked, it will make sure that multiple occurrences of the same original value, will be randomly replaced with the same seed value.
In Deterministic Mode each occurrence of the original value will be replaced consistently with the same value. Each time that you execute the masking template the value will be masked consistently. When masking for example names in Oracle and SQL Server with different templates, you can achieve consistent results by using the same seed list(language), generator salt en global salt without the use of any translation tables.
If you change the global salt in the Deployment - Settings this will affect all generators that are configured as deterministic. For each individual generator you can also specify a separate salt.
The Generator options displayed in the drop-down list will reflect the datatype of the column in focus for masking. For instance, an Integer datatype will display the following available techniques:
Generates for all rows the same constant 'fixed' value.
Generates a random date/time between a minimum and maximum date/time. If the Random date/time generator is specified on a column with a character data type you can specify the format in which you want to generate the replacement values, for instance yyyy-MM-dd HH:mm:ss.
Random decimal number
Generates random decimals with a predefined precision between a minimum and maximum value.
Random whole number
Generates a random whole number between a minimum and maximum value. Use the Random decimal number should you require a precision value.
Generates random character strings which are a combination of UPPER and lower case letters of a user specified minimum and maximum length.
Generates a list of values beginning with a user specified start value and incremented (or decremented) by the "step" value.
Generates random global brand names.
Generates random company names based upon a select list of companies in the specified countries. You can select one or more of the available countries.
Generates random dinosaur names, like; Triceratops, Stegosaurus
First name (Female)
Generates random female first names based on a specified country or countries. You can select more than one country.
First name (Male)
Generates random male first names based on a specified country or countries. You can select more than one country.
Generates random Last names based on a specified country. You can select more than one country.
2 letter country code
Generates random two letter country codes such as NL, DE, US.
3 letter country code
Generates random three letter country codes such as NLD, DEU, USA.
Generates random city names. These can be country specific with a choice of any or all from United States, Netherlands and Germany.
Generates random country names. You can specify the country names in one or more languages - German, French, Dutch and English (US). For instance - Mongolei, Mongolie, Mongolië, Mongolia.
Generates random street names. These can be country specific with a choice of any or all from United States, Netherlands and Germany.
A-Number/GBA Number (Dutch township security number)
Generates unique valid A-numbers as replacements for Dutch GBA systems
BSN (Dutch Social Security Number)
Generates unique valid 11 proof replacements for a BSN (Burger Service Nummer / Dutch Social Security Number). No parameters are required with this option. These numbers can be existing customers.
Generates random English colors like; Apple Green, Baby Blue, Azure, Ball Blue, Beaver, etc
Generates hex color codes like; #ff7e00, #fdee00, #848482
Credit card account number
Generates valid unique credit card numbers . You can select one or more issuer's like American Express, Maestro, MasterCard, Visa, etc
Generates random abbreviated currency names such as DKK, EUR, USD, GBP.
Generates random currency symbols such as ¥, $, £, €.
Generate random English genres like Book, Catalogs, Medical, Lifestyle, Travel, Sports, Weather
IBAN (International Bank Account Number)
Generates valid International Bank Account Numbers (IBAN). A selection of more than 60 key trading nations is available for (multi-) selection.
Job / Profession
Generates random job/professions like; Academic Dean, Aerodynamicist, Animal Keeper, Mat Cutter (multiple language available)
Generates random English military ranks based on US military branches (Airforce, Army, Marine Corps, Navy)
SSN (US Social Security Number)
Generates valid and unique social security numbers
Generates browser user agents like; Mozilla/4.0 (compatible; MSIE 5.13; Mac_PowerPC), Opera/8.53 (Windows NT 5.2; U; en)
Value from seed file
Sometimes you may wish to provide your own replacement data to the masking process. This could be because of language, country/region or specific business reasons. You can generate your own seed (User Specified) file and specify it using this technique. The generator will randomly source a row from the file for each row in the table which is qualified by the Condition clause.
Notes on Seed files
- Seed files are generally saved as .txt files.
- The Encoding must be explicitly specified. The rule default is UTF-8.
- An empty row in the seed file will randomly populate the column in focus with an empty value.
Value from multi-column seed file
Generate data across multiple columns that should be consistent based upon a multi column seed file. For example; replace a city, postalcode and street based on correlated seed file containing addresses. You can map multiple columns of the table to the specified seed file. This file shouldn't contain headers and must be a comma separated (.CSV) file.
Combine different generators together with static text. Each generator can be referenced with the $ symbol. Entering the $ symbol in the expression editor will open a list of available generators. Pick one from the list; dependent upon the generator you will be presented with options to specifiy the generator paramaters.
Some examples are:
This will generate:
P.Sabella@gmail.com, O.Veach@hotmail.com, R.Levy@hotmail.com, etc
|Generator||Generator expression syntax|
|Random Date/Time||$datetime-generator(min="1970-01-01 00:00:01",max="2038-01-19 03:14:07",format="yyyy-MM-dd HH:mm:ss")|
Random decimal number
|Random whole number||$random-integer(min="0",max="1000")|
|Female first name||$random-first-name-female("IT","DA","DE")|
|Male first name||$random-first-name-male("ES")|
|2 letter country code||$country-code-2()|
|3 letter county code||$country-code-3()|
|BSN (Dutch Social Security Number)||$bsn-generator()|
|Credit Card account number||$credit-card-number(issuers="Maestro")|
|Job / Profession||$random-job("US")|
|SSN (US Social Security Number)||$ssn-generator()|
|Value from seed file||$random-seed-value(file="C:\file.csv",charset="UTF-8")|
Generates values based upon a regular expression. Some examples are:
Pick a random value from a list
Dutch Postal Code
3067ZG, 9372ED, 8423BE
US Zip Code
48357/4986 , 19414, 21237-6415
250.171.5.8, 188.8.131.52, 184.108.40.206
For a full specification of the regular expression possibilities, check https://www.brics.dk/automaton/doc/index.html
Generates values based on user specified, comma separated, values. The weights parameter is a comma separated list which gives each value a proportional weighting. For example:
Values: Male, Female
Weights: 60, 40
In this case 60% of the values that are generated are Male and 40% of the generated values are Female.
Adding data masking functions
To configure a new masking function, the user must first select the column (or multiple columns) upon which the function is to be executed. Functions can be added with either the Add Function button or a right click on the selected the column(s).
The subsequent dialogue box (called the Function Editor) allows the user to select various options for the function. These options are explained in the following sections. If the OK button is clicked, the function is added to the selected column(s). By unchecking the Enabled checkbox, this function will be skipped.
The user has the option to specify a condition which selects the rows against which the function will, or will not, be executed. A condition is defined in the Condition tab in the Function Editor.
The condition can be specified by defining a 'where clause' as in an SQL query. If the "Exclude null values from masking" option is selected this will be joined with your where clause. If a where caluse is not specified but the "Exclude null values from masking" is selected then only column content which is NOT NULL will be masked.
The Test button can be used to determine whether the defined condition is valid and what the result (qualified rows) of the condition will be.
Microsoft SQL Server:
To refer to the current table you can use the "table name as alias" concept. This is required whenever the “where clause” lists two identical column names. An example of a operation on table “Table2” with identical field name “name” is:
where exists (select 1 from [Table1] a where a.name = [Table2].name)
In this example "a" is the alias for [Table1].
Saving a translation table
You can choose to store the result of the data masking function in a translation table. This is particularly useful in implementing consistent data masking between tables. Translation tables store the old and the new value for each column value in the table.
You can also use a ValueLookup function against a Translation table when masking other tables or applications, effecting consistent masking across the Estate.
Saving a translation table is configured in the Translation table tab in the Function Editor. This dialog allows the user to indicate that the translation table must be created or used in subsequent runs if it already exists. The user must specify the database and schema in which the translation table is to be saved as well as a name for it.
The specified translation table will be listed as a selectable table in the definition of the ValueLookup Function.
Using Translation Tables
Because translation tables contain before and after images they can present a security vulnerability in that they can be used to reverse engineer the masking process. They are not truncated or dropped by default.
Care must be taken in securing these tables from unwanted access or a post-run script executed to drop them after the masking run completes.
Create the Translation Tables in another database which is protected by restricted/authorised access
Dependencies between functions
In certain cases it may be necessary to influence the sequence in which the configured functions are executed. This sequence can be specified in the Dependency Editor. The Dependency Editor can be opened using the Dependencies tab in the Function Editor. Choose the Edit button to invoke the dialogue:
With the Dependency Editor the predecessors and successors for a function can be specified. By specifying predecessors, functions or scripts can be selected to run before the function in focus. By specifying successors, functions or scripts can be selected that should run after the selected function. Predecessors and Successors are invoked by simply clicking on the function listed in either aspect.
Sample Use Case
You have a ValueLookup function which is based upon a Translation table created in another function so the ValueLookup must run after the Translation table has been populated. A dependency will enforce this.
When using the Shuffle, ValueLookup and Custom Expression functions the Function Editor shows an additional References tab. It shows all the available Foreign Keys for the current field. By checking the checkbox “Update referencing foreign keys” all instances of the updated field in the database are consistently modified by following the relationships.
This function is extremely powerful with complex and large data models.
The Foreign Keys tab shows the current Foreign Keys as gathered by the Import and Synch Metadata options. Sometimes, however, data relationships are implied at the application level and not imposed at the database level. When masking (or sub-setting) a database it's often useful to have these relationships noted and, potentially, acted upon. The Foreign Keys context allows you to create, edit or delete what is, effectively, a Virtual Foreign Key which expresses the application implied relationships.
As well as listing the name of the table and the related columns the source of the Foreign Key is also displayed. There are four possibilities:
ODBC: The Foreign Key is imported from an ODBC connection.
DME: The Foreign Key is imported using a DME file.
Manual: The Foreign Key is manually created in DATPROF Privacy.
Unknown: The Foreign Key shows no indication of how it has been imported. This is likely the case with Foreign Key’s created in an earlier version of DATPROF Privacy.
The distinction in the source of the Foreign Key is important to the synchronization process.
By clicking on the ‘Add foreign key’ button the Constraint Editor will be opened and a new virtual Foreign key can be defined.
In the Constraint Editor a parent schema and table can be selected. A new Foreign Key can be defined by selecting the current and related columns and by selecting the parent table. When saving a new Foreign Key it will be flagged as “manually" added.
About adding a virtual Foreign Key
- Adding a Foreign Key does NOT create a constraint in the database. It is exclusively used as a reference within Privacy.
The Foreign Key Condition option is currently ignored by Privacy.
A manually added Foreign Key has an impact on the synchronization wizard. The wizard considers the database as the primary source of truth so it flags a manually added Foreign key as to be removed from the metadata. Using a naming convention to distinguish virtual Foreign Keys from database Foreign Keys might help to avoid losing virtual Foreign Keys.
You can specify a Split Limit for every table. Selecting the tab Advanced settings for a table shows you this screen.
Specifying the SplitLimit has an effect on the final insert back into the table in focus. SplitLimit is synonymous with "Commit Interval" in that it groups and commits the transactions into blocks of rows specified by the SplitLimit. This will reduce the space required in the Oracle Temporary tablespace but does not apply to the DATPROF temporary tables which are created without logging.
The default value for SplitLimit = 0. All data is inserted and committed in one transaction.
Specifying the SplitLimit will impact performance.
The value used as SplitLimit is determined by your system resources, the row-count of your table and the size of the row.
Specifying the SplitLimit has an effect on the final insert back into the table in focus. SplitLimit is synonymous with "Commit Interval" in that it groups and commits the transactions into blocks of rows specified by the SplitLimit, thus preventing issues with the temporary tablespace.
In LUW the SplitLimit applies to both the temporary tables and the final insert into the original table.
By specifying a percentage > 0 Oracle statistics will be gathered prior to the merge of the masked data into the resulting table. This percentage is used as the "estimate_percent" in the Oracle query.
With large datasets this method gives a significant performance boost. For smaller datasets the time spent in gathering the statistics might exceed the time spent to merge the data, outweighing the benefits.
This setting defaults to 0, implying that no statistics will be gathered.
TempStatisticsPercentage (DB2 LUW)
By specifying a percentage > 0 DB2 statistics will be gathered prior to the merge of the masked data into the resulting table. This percentage is used as the percentage of the table rows using the TABLESAMPLE BERNOULLI method. With large datasets this method gives a significant performance boost. (See the IBM DB2 LUW documentation about this). This setting defaults to 0 and may be between 0 and 100. You can also use a precision of up to 2 decimal places. (ie: 0.05).
The value is used for the "method_opt" parameter of the DBMS_STATS.GATHER_TABLE_STATS procedure. When left blank (default) the parameter will be set to "FOR COLUMNS DPV_RID SIZE AUTO". Any other value will be passed as-is to the "method_opt" parameter.
For example, to gather statistics for all columns TempStatisticsMethodOpt could be set to "FOR ALL COLUMNS SIZE AUTO".
The first step in masking a table is to copy the table to a temporary “Snapshot table” where a DPV_RID column is added to the table. This column will hold a Unique numeric key value.
The parameter Custom_DPV_RID allows you to specify the method to fill this column. By default this setting is empty and the following expressions are used:
- For Oracle 12 this column is filled with the outcome of rownum.
- For all other Oracle versions this column is filled with the outcome of: ROW_NUMBER() OVER (order by 1).
The reason the parameters differ is to optimize performance.
On certain occasions faster solutions are available. You could use your own expression or you might specify an existing Primary key or a sequence. This is where the setting Custom_DPV_RID can be used.
An expression returning a numeric value is required, for example a sequence: dpv_rid_sequence.nextval.
Note: Normally you do not need to fill this setting.
TempTablesTableSpace (DB2 LUW)
When creating temporary tables the default tablespace is used. When specifying a value for this setting that tablespace will be used when temporary tables are created during the execution of the privacy function on the target table.
TempTableIndexesTableSpace (DB2 LUW)
When creating temporary tables the default tablespace is used for indexes on a temporary table. When specifying a value for this setting that tablespace will be used for indexes on these temporary tables.
Execution of PowerShell scripts
The OS Call scripts that can be created in DATPROF Privacy are either executed as "
cmd /c myscript" on Windows systems or as "
/bin/sh -c myscript" on Linux systems. To execute a PowerShell script it must be saved in a separate file which can then be used from the OS Call script:
For Windows systems:
Powershell.exe -executionpolicy bypass -File "C:/path/to/mypowershellscript.ps1"
For Linux systems:
Note that PowerShell's execution policy on a Windows system defaults to “Restricted” and this prevents PowerShell scripts from running. By using the “bypass” policy the script can be executed. In this cmd script a PowerShell script can be called in the following way:
Powershell.exe -executionpolicy bypass -File "C:/path/to/mypowershellscript.ps1"