Generation

For certain use-cases it’s not possible to mask existing data, and instead the user must generate new data. Generation makes this possible, giving the user the ability to generate any amount of rows for existing tables, while maintaining all of the existing constraints.

It does this by allowing the user to create and configure different generations sets that generate new data. A generation set is a specific set of tables and their corresponding functions that can be configured to generate data that covers specific test data requirements. For example, it’s possible to create a set that creates 10 rows of female customers between the age range 45 - 65, with specific contracts on specific products. This means that the user can generate as many sets of varying data for a table as is needed by creating multiple generation sets.

Creating a new generation set

To add a new generation set, click the Add generation set button in the toolbar, located in the top-left corner of the Privacy interface. This will create an empty generation set, which the user can then select and re-name to their chosen name. Additionally, the user can supply a description for the generation set.

After a generation set has been added, selecting it will show a list of all imported columns within the table. In order to include columns in the generation set, check the tick-box next to the column name. Because rows are generated based off of this configuration, any columns not included in this selection will generate NULL values.

Table settings

Row count

It’s possible to specify how many rows a specific generation set will generate by selecting the generation set and modifying the Rows value. By default, this is set to 100.

Truncate

Selecting this option will cause Privacy to execute a TRUNCATE TABLE statement on the table involved in the generation set. When using multiple generation sets on one table, this option should be left unchecked.

Cloning generators sets and copying generator specifications

Specifying different generators on a long list of columns and tables can be time consuming. Within generation it’s possible to copy generation sets, which maintains the generation set specification. Afterwards, the user can modify the new generation set. To do this, select the generation set and click Clone generation set. Each table and their generator properties are copied to the cloned generator set. Care should be taken that certain (primary) keys or unique columns should be modified to prevent duplicate key issues.

There is also an option to copy the generator specification of a table or column and paste it in a different generator set. Right click a table or column and select Copy generator specification, then right click a target table or column and click Paste generator specification. If you copy the complete generator specification of a table, the table on which you can paste it should contain the exact same columns. Otherwise the paste option is disabled, preventing the user from performing an illegitimate action.

Foreign Key

There are different methods to fill foreign key values! You can choose a constant if you want to refer to a specific value in the parent table. In that case you have to make sure that the referring key is present in that table. The Foreign Key generator however uses a lookup to actually get values from the parent table. There are two possible scenarios;

The parent table is part of the same generation set; in this case keys are generated that refer to the new parent records generated in the same generation set
The parent table is not part of the same generation set; in this case keys will be generated using current data content in the parent table

Random versus Deterministic Masking

All generators that use seed lists (for example; First name, Last name, Company, Job, User-Defined), can be randomly or deterministic replace the original values. When replacing original values in Random mode, each value will be replaced with a random value from the seed list. When the Replace the same original values with the same generated value is checked, it will make sure that multiple occurrences of the same original value, will be randomly replaced with the same seed value.

In Deterministic Mode each occurrence of the original value will be replaced consistently with the same value. Each time that you execute the masking template the value will be masked consistently. When masking for example names in Oracle and SQL Server with different templates, you can achieve consistent results by using the same seed list(language), generator salt and global salt without the use of any translation tables.

If you change the global salt in the Deployment - Settings this will affect all generators that are configured as deterministic. For each individual generator you can also specify a separate salt.

About Generators

The Generator options displayed in the drop-down list will reflect the datatype of the column in focus for masking. For instance, you cannot use the “First Name“ generator on an integer column as this would violate the definition of the column. Likewise, it is not possible to generate a sequential number for a date-time column. Therefore, these functions will be filtered depending on which type of column you’re working with.

Standard Generator Functions

The available generators as provided by Privacy differ slightly based on the data type of the underlying column. Because there are (database specific) variations on data types, we’ll be grouping these together in a few groups. Strings(all data types focusing around text, from varchar fields to BLOBs), numerical and date & time data types.

Function name	Supported Data Types	Description
Constant Value	String, Numerical, Date & Time	Creates a constant value in the generated dataset that is identical for every row. I.e. inputting MyFavoriteValue here will generate ‘MyFavoriteValue’ for every row in the resulting column’s dataset.
Random Date/Time	String, Date & Time	Generates a random Date/Time value per row between a specified minimum and maximum datetime that corresponds with the underlying column’s datatype.
Sequential Date/Time	String, Date & Time	Identical to the Random Date/Time generator except that this generator creates a sequential datetime for every row. Supplying a maximum datetime is optional. A number to increment the starting date is required, and can only accept whole numbers. Any unit of time to increment by can be chosen, from seconds to years.
Random Decimal Number	String, Numerical (except for integer variants)	Generates a random decimal number between a supplied minimum and maximum value for every row. Using the Scale setting the decimal accuracy can be defined. For instance, a generator with a minimum value of 0 and a maximum value of 1000 using a scale of 4 might generate 132.4202.
Random Whole Number	String, Numerical	Generates a random integer between a supplied minimum and maximum value for every row.
Random String	String	Generates a random string of lower- and uppercase letters for every row. The minimum and maximum length for strings can be defined.
Sequential Number	String, Numerical	Generates a sequential number for every row that starts at a specified value, and increments by the step value per row. Additionally, you can define a Padding for your generated integers. This is a set number that will be affixed to the generated integer. For example, using a padding of 3, and a start of 8 with a step of 2 will generate the following: 008 → 010 → 012
Names	String	The Names section contains multiple generators that generate data based on CSV files supplied by DATPROF. The available options are: Brand First Name First Name (Male) First Name (Female) Last Name Full Name Company Dinosaur Random Word For some options (such as First Name) there are localization options which determine which CSV file is used. These can be toggled on/off with the language(s) drop-down menu. Multiple languages can be used simultaneously.
2 Letter Country Code / 3 Letter Country Code	String	Generates a random country code per row either in a 2 letter country code, or a 3 letter country code format.
City	String	Generates a random city name per row. Here, you can specify for which countries you’d like to generate random names using the Countries drop-down menu.
Country	String	Generates a random country name per row. The Language(s) option specifies in which language the country will be written. Multiple options can be enabled simultaneously.
Street	String	Generates a random existing street name per row. The Countries option allows you to specify for which country street names will be generated.
A-Number / GBA number (Dutch township security number)	String, Numerical	Generates a 10 digit GBA number per row.
BSN	String, Numerical	Generates a valid Dutch social security number per row.
Color	String	Generates a random color per row. (Ex. Baby Blue, Sky Blue, Soft White)
Color Code	String	Generates a color hexcode (Both three-digit shorthand and six-digit full length hexcodes) per row.
Credit Card Account Number	String, Numerical	Generates a random credit card account number. Using Issuer(s), you must define one or multiple issuers to determine which syntax the generated account numbers adhere to.
Currency Code	String	Generates a three letter currency code for every row.
Currency Symbol	String	Generates a currency code for every row.
Genre	String	Generates a media genre for every row.
IBAN (International Bank Account Number)	String	Generates a valid IBAN number for every row. Using Country Code(s) you can specify which country codes you’d like your resulting IBAN codes to use.
Job / Profession	String	Generates a profession for every row. Using the Language(s) you can specify which language(s) you want your resulting job names generated in.
Military Rank	String	Generates a military rank name for every row. Using Department(s) you can specify which branches of the armed forces you’d like to include in your resulting dataset.
SSN (US Social Security Number)	String	Generates a SSN for every row. You can specify how you want to separate your resulting numbers. You can choose one of the following formats: None (Example: 101010101, 003122142) Dashed (Example: 101-01-0101, 003-12-2142) Spaced (Example: 101 01 0101, 003 12 2142)
User agent	String	Generates browser user agents like Mozilla/4.0 (compatible; MSIE 5.13; Mac_PowerPC), Opera/8.53 (Windows NT 5.2; U; en) per row.

Advanced Generator Functions

SQL Expression

The SQL expression generator is special in that it can use values from other columns. All other generators will generate it's values into a temporary table per generation set inside your database. The SQL expression on a column is used when the data in inserted back into the original table. That way a user can use the data that is generated in other columns. No dependency specification is required since SQL Expression rules will be the last to be applied to the table in focus.

Value from seed file

Sometimes you may wish to provide your own replacement data to the masking process. This could be because of language, country/region or specific business reasons. You can generate your own seed (User Specified) file and specify it using this technique. The generator will randomly source a row from the file for each row in the table which is qualified by the Condition clause.

Notes on Seed files

Seed files are generally saved as .txt files.
The Encoding must be explicitly specified. The rule default is UTF-8.
An empty row in the seed file will randomly populate the column in focus with an empty value.

Value from multi-column seed file

Generate data across multiple columns that should be consistent based upon a multi column seed file. For example; replace a city, postal code and street based on correlated seed file containing addresses. You can map multiple columns of the table to the specified seed file. This file shouldn't contain headers and must be a comma separated (.CSV) file.

Generator expression

Combine different generators together with static text. Each generator can be referenced with the $ symbol. Entering the $ symbol in the expression editor will open a list of available generators. Pick one from the list; dependent upon the generator you will be presented with options to specifiy the generator paramaters.

Examples

Email address

$random-regex(regex="[A-Z]{1}").$random-last-name("US")@$weighted-random-generator(values="gmail.com,http://hotmail.com ",weights="30,70")

This will generate:

P.Sabella@gmail.com, O.Veach@hotmail.com, R.Levy@hotmail.com, etc

Syntax overview

Generator	Generator expression syntax
Random Date/Time	$datetime-generator(min="1970-01-01 00:00:01",max="2038-01-19 03:14:07",format="yyyy-MM-dd HH:mm:ss")
Random decimal number	$random-decimal(min="0.0",max="1000",precision="0")
Random whole number	$random-integer(min="0",max="1000")
Random string	$random-string(min="5",max="20")
Sequential number	$sequential-number-generator(start="0",step="1")
Brand	$random-brand()
Company	$random-company("NL","DE")
Female first name	$random-first-name-female("IT","DA","DE")
Male first name	$random-first-name-male("ES")
Last name	$random-last-name("IT","DE","NL")
2 letter country code	$country-code-2()
3 letter county code	$country-code-3()
City	$random-city("NL","DE","US")
Country	$random-country("DE","FR","NL","US")
Street	$random-street("DE","NL","US")
A-Number/GBA Number	$gba-generator()
BSN (Dutch Social Security Number)	$bsn-generator()
Color	$random-color()
Color code	$random-color-code()
Credit Card account number	$credit-card-number(issuers="Maestro")
Currency code	$random-currency-code()
Currency symbol	$random-currency-symbol()
Genre	$random-genre()
IBAN	$iban-bic(countries="NL")
Job / Profession	$random-job("US")
SSN (US Social Security Number)	$ssn-generator()
User Agent	$random-user-agent()
Regular Expression	$random-regex(regex="[A-Z]{1}")
Value from seed file	$random-seed-value(file="C:/file.csv",charset="UTF-8") Or, alternatively: $random-seed-value(file="C:\\file.csv",charset="UTF-8")
Weighted list	$weighted-random-generator(values="F,M",weights="50,50")

Regular expression

Generates values based upon a regular expression. The syntax for the regular expression used in Privacy is standard C# regex.

Examples

Title	Regular Expression
Address Type Pick a random value from a list	Billing\|Delivery\|Home\|Office\|Primary\|Shipping
Dutch Postal Code 3067ZG, 9372ED, 8423BE	[1-9][0-9]{3}[A-Z]{2}
US Zip Code 48357/4986 , 19414, 21237-6415	[0-9]{5}([- /]?[0-9]{4})?
IP address 250.171.5.8, 4.8.5.180, 91.252.251.253	((25[0-5]\|(2[0-4]\|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]\|(2[0-4]\|1{0,1}[0-9]){0,1}[0-9])

For a full specification of the regular expression possibilities, check https://www.brics.dk/automaton/doc/index.html

Weighted list

Generates values based on user specified, comma separated, values. The weights parameter is a comma separated list which gives each value a proportional weighting.

Examples

Values: Male, Female

Weights: 60, 40

In this case 60% of the values that are generated are Male and 40% of the generated values are Female.

Generator settings

Post SQL Expressions

You can further modify data on a generator set by filling in the Post SQL text box on any generator function. This expression will trigger after a field has been generated, and can be used to further customize the generated data. You can also reference different fields in the table and use them in the expression here. In terms of functionality, the syntax is identical to regular SQL expressions.

*In this example we use the “Random Word” generator on our field named “word”. An expected result for this query could be “Magnanimous_04-10-2022”*

Generator Seed

If you generate data and need the data to be identical upon later execution, you can specify a seed value. Doing this will ensure that whenever you run this generator set again, your data will be identical. This can be useful if changing data can impact testing in a negative way. However, if you intend to execute a template multiple times on the same dataset, duplicate data will be generated. For more information on this general principle, please refer to the Deterministic Masking portion of the Privacy documentation.

Null Values %

Using this setting you can seed a percentage of NULL values into your resulting dataset.

Having NULL values can be an important part of testing, as missing data occurs regularly in production databases. However, if you specify a field as having a NOT NULL constraint in the database, attempting to seed NULL values in the resulting dataset will cause a run to fail.