Generation
Sometimes you cannot use existing data and you have to generate new data. The generation module now supports generating new rows inside existing tables.
The Generation section allows the user to create and configure different generations sets that generate new data. A generation set is a specific set of tables that can be configured to generate data that covers specific test data requirements. For example; 10 female customers between the age range 45 - 65, with specific contracts on specific products.
When executed, each generation set will generate its rows into a temporary table. After all temporary tables of the generators sets are filled, the data will be inserted into the original table.
To add a new generation set, click the Add generaton set button in the toolbar. This will create an empty generation set. You can change the name and add a functional description.
After adding a generation set, you can specify which tables you want to configure. You can expand the tree view to see which tables are part of the generation set.
Table settings
To specify how many rows you want to generate, select the specific table and change the Rows value. By default there will be 100 rows generated. You can also specify if the table should be truncated. Please note that when you generate data for the same table in different generation sets, you should uncheck the Truncate checkbox.
Cloning generators sets and copying generator specifications
Specifying different generators on a lot of columns and tables can be time consuming. Luckily you can easily clone generation sets that you can than change for a different purpose. Select the generation set and click the Clone generation set button. Each table and their generator properties are copied to the cloned generator set. Make sure to update the primary key columns to prevent duplicate keys.
There is also an option to copy the generator specification of a table or column and paste it in a different generator set. Right click a table or column and select Copy generator specification, then right click a target table or column and click Paste generator specification. If you copy the complete generator specification of a table, the table on which you can paste it should contain the exact same columns. Otherwise the paste option is greyed out.
Foreign Key
There are different methods to fill foreign key values! You can choose a constant if you want to refer to a specific value in the parent table. In that case you have to make sure that the referring key is present in that table. The Foreign Key generator however uses a lookup to actually get values from the parent table. There are two possible scenarios;
The parent table is part of the same generation set; in this case keys are generated that refer to the new parent records generated in the same generation set
The parent table is not part of the same generation set; in this case keys will be generated using current data content in the parent table
Random versus Deterministic Masking
All generators that use seed lists (for example; First name, Last name, Company, Job, User-Defined), can be randomly or deterministic replace the original values. When replacing original values in Random mode, each value will be replaced with a random value from the seed list. When the Replace the same original values with the same generated value is checked, it will make sure that multiple occurrences of the same original value, will be randomly replaced with the same seed value.
In Deterministic Mode each occurrence of the original value will be replaced consistently with the same value. Each time that you execute the masking template the value will be masked consistently. When masking for example names in Oracle and SQL Server with different templates, you can achieve consistent results by using the same seed list(language), generator salt and global salt without the use of any translation tables.
If you change the global salt in the Deployment - Settings this will affect all generators that are configured as deterministic. For each individual generator you can also specify a separate salt.
About Generators
The Generator options displayed in the drop-down list will reflect the datatype of the column in focus for masking. For instance, you cannot generate a “First Name“ value for an integer column, or generate a sequential number for a date-time column.
Standard Generator Functions
The available generators as provided by Privacy differ slightly based on the data type of the underlying column. Because there are (database specific) variations on data types, we’ll be grouping these together in a few groups. Strings(all data types focusing around text, from varchar fields to BLOBs), numerical and date & time data types.
Function name | Supported Data Types | Description |
---|---|---|
Constant Value | String, Numerical, Date & Time | Creates a constant value in the generated dataset that is identical for every row. I.e. inputting MyFavoriteValue here will generate ‘MyFavoriteValue’ for every row in the resulting column’s dataset. |
Random Date/Time | String, Date & Time | Generates a random Date/Time value per row between a specified minimum and maximum datetime that corresponds with the underlying column’s datatype. |
Sequential Date/Time | String, Date & Time | Identical to the Random Date/Time generator except that this generator creates a sequential datetime for every row. Supplying a maximum datetime is optional. A number to increment the starting date is required, and can only accept whole numbers. Any unit of time to increment by can be chosen, from seconds to years. |
Random Decimal Number | String, Numerical (except for integer variants) | Generates a random decimal number between a supplied minimum and maximum value for every row. Using the Scale setting the decimal accuracy can be defined. For instance, a generator with a minimum value of 0 and a maximum value of 1000 using a scale of 4 might generate 132.4202. |
Random Whole Number | String, Numerical | Generates a random integer between a supplied minimum and maximum value for every row. |
Random String | String | Generates a random string of lower- and uppercase letters for every row. The minimum and maximum length for strings can be defined. |
Sequential Number | String, Numerical | Generates a sequential number for every row that starts at a specified value, and increments by the step value per row. Additionally, you can define a Padding for your generated integers. This is a set number that will be affixed to the generated integer. For example, using a padding of 3, and a start of 8 with a step of 2 will generate the following: 008 → 010 → 012 |
Names | String | The Names section contains multiple generators that generate data based on CSV files supplied by DATPROF. The available options are:
For some options (such as First Name) there are localization options which determine which CSV file is used. These can be toggled on/off with the language(s) drop-down menu. Multiple languages can be used simultaneously. |
2 Letter Country Code / 3 Letter Country Code | String | Generates a random country code per row either in a 2 letter country code, or a 3 letter country code format. |
City | String | Generates a random city name per row. Here, you can specify for which countries you’d like to generate random names using the Countries drop-down menu. |
Country | String | Generates a random country name per row. The Language(s) option specifies in which language the country will be written. Multiple options can be enabled simultaneously. |
Street | String | Generates a random existing street name per row. The Countries option allows you to specify for which country street names will be generated. |
A-Number / GBA number (Dutch township security number) | String, Numerical | Generates a 10 digit GBA number per row. |
BSN | String, Numerical | Generates a valid Dutch social security number per row. |
Color | String | Generates a random color per row. (Ex. Baby Blue, Sky Blue, Soft White) |
Color Code | String | Generates a color hexcode (Both three-digit shorthand and six-digit full length hexcodes) per row. |
Credit Card Account Number | String, Numerical | Generates a random credit card account number. Using Issuer(s), you must define one or multiple issuers to determine which syntax the generated account numbers adhere to. |
Currency Code | String | Generates a three letter currency code for every row. |
Currency Symbol | String | Generates a currency code for every row. |
Genre | String | Generates a media genre for every row. |
IBAN (International Bank Account Number) | String | Generates a valid IBAN number for every row. Using Country Code(s) you can specify which country codes you’d like your resulting IBAN codes to use. |
Job / Profession | String | Generates a profession for every row. Using the Language(s) you can specify which language(s) you want your resulting job names generated in. |
Military Rank | String | Generates a military rank name for every row. Using Department(s) you can specify which branches of the armed forces you’d like to include in your resulting dataset. |
SSN (US Social Security Number) | String | Generates a SSN for every row. You can specify how you want to separate your resulting numbers. You can choose one of the following formats: None (Example: 101010101, 003122142) Dashed (Example: 101-01-0101, 003-12-2142) Spaced (Example: 101 01 0101, 003 12 2142) |
User agent | String | Generates browser user agents like Mozilla/4.0 (compatible; MSIE 5.13; Mac_PowerPC), Opera/8.53 (Windows NT 5.2; U; en) per row. |
Advanced Generator Functions
SQL Expression
The SQL expression generator is special in that it can use values from other columns. All other generators will generate it's values into a temporary table per generation set inside your database. The SQL expression on a column is used when the data in inserted back into the original table. That way a user can use the data that is generated in other columns. No dependency specification is required since SQL Expression rules will be the last to be applied to the table in focus.
Value from seed file
Sometimes you may wish to provide your own replacement data to the masking process. This could be because of language, country/region or specific business reasons. You can generate your own seed (User Specified) file and specify it using this technique. The generator will randomly source a row from the file for each row in the table which is qualified by the Condition clause.
Notes on Seed files
Seed files are generally saved as .txt files.
The Encoding must be explicitly specified. The rule default is UTF-8.
An empty row in the seed file will randomly populate the column in focus with an empty value.
Value from multi-column seed file
Generate data across multiple columns that should be consistent based upon a multi column seed file. For example; replace a city, postalcode and street based on correlated seed file containing addresses. You can map multiple columns of the table to the specified seed file. This file shouldn't contain headers and must be a comma separated (.CSV) file.
Generator expression
Combine different generators together with static text. Each generator can be referenced with the $ symbol. Entering the $ symbol in the expression editor will open a list of available generators. Pick one from the list; dependent upon the generator you will be presented with options to specifiy the generator paramaters.
Some examples are:
Email address | $random-regex(regex="[A-Z]{1}").$random-last-name("US")@$weighted-random-generator(values="gmail.com,http://hotmail.com ",weights="30,70") This will generate: P.Sabella@gmail.com, O.Veach@hotmail.com, R.Levy@hotmail.com, etc |
Generator | Generator expression syntax |
---|---|
Random Date/Time | $datetime-generator(min="1970-01-01 00:00:01",max="2038-01-19 03:14:07",format="yyyy-MM-dd HH:mm:ss") |
Random decimal number | $random-decimal(min="0.0",max="1000",precision="0") |
Random whole number | $random-integer(min="0",max="1000") |
Random string | $random-string(min="5",max="20") |
Sequential number | $sequential-number-generator(start="0",step="1") |
Brand | $random-brand() |
Company | $random-company("NL","DE") |
Female first name | $random-first-name-female("IT","DA","DE") |
Male first name | $random-first-name-male("ES") |
Last name | $random-last-name("IT","DE","NL") |
2 letter country code | $country-code-2() |
3 letter county code | $country-code-3() |
City | $random-city("NL","DE","US") |
Country | $random-country("DE","FR","NL","US") |
Street | $random-street("DE","NL","US") |
A-Number/GBA Number | $gba-generator() |
BSN (Dutch Social Security Number) | $bsn-generator() |
Color | $random-color() |
Color code | $random-color-code() |
Credit Card account number | $credit-card-number(issuers="Maestro") |
Currency code | $random-currency-code() |
Currency symbol | $random-currency-symbol() |
Genre | $random-genre() |
IBAN | $iban-bic(countries="NL") |
Job / Profession | $random-job("US") |
SSN (US Social Security Number) | $ssn-generator() |
User Agent | $random-user-agent() |
Regular Expression | $random-regex(regex="[A-Z]{1}") |
Value from seed file | $random-seed-value(file="C:\file.csv",charset="UTF-8") |
Weighted list | $weighted-random-generator(values="F,M",weights="50,50") |
Regular expression
Generates values based upon a regular expression. Some examples are:
Title | Regular Expression |
---|---|
Address Type Pick a random value from a list | Billing|Delivery|Home|Office|Primary|Shipping |
Dutch Postal Code 3067ZG, 9372ED, 8423BE | [1-9][0-9]{3}[A-Z]{2} |
US Zip Code 48357/4986 , 19414, 21237-6415 | [0-9]{5}([- /]?[0-9]{4})? |
IP address 250.171.5.8, 4.8.5.180, 91.252.251.253 | ((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]) |
For a full specification of the regular expression possibilities, check https://www.brics.dk/automaton/doc/index.html
Weighted list
Generates values based on user specified, comma separated, values. The weights parameter is a comma separated list which gives each value a proportional weighting. For example:
Values: Male, Female
Weights: 60, 40
In this case 60% of the values that are generated are Male and 40% of the generated values are Female.
Generator settings
Post SQL Expressions
You can further modify data on a generator set by filling in the Post SQL text box on any generator function. This expression will trigger after a field has been generated, and can be used to further customize the generated data. You can also reference different fields in the table and use them in the expression here. In terms of functionality, the syntax is identical to regular SQL Expressions.
Generator Seed
If you generate data, and need the data to be identical upon later execution, you can specify a seed value. Doing this will ensure that whenever you run this generator set again, your data will be identical. This can be useful if changing data can impact testing in a negative way. However, if you intend to execute a template multiple times on the same dataset, duplicate data will be generated.
Null Values %
Using this setting you can seed a percentage of NULL values into your resulting dataset.
Having NULL values can be an important part of testing, as missing data occurs regularly in production databases. However, if you specify a field as having a NOT NULL constraint in the database, attempting to seed NULL values in the resulting dataset will cause a run to fail.