The Gasware Database consists alot of data that can be directly linked to actual persons, like name fields, email addresses and bank account numbers. We will focus first on the first names. A first name is in 99% of cases directly related to the gender of a person. We will add a function that shuffle all the first names of the customers, but the first name must be kept together with it's gender.
We can solve that in two different ways that are slightly different! Let's check out the following example
This is the original production data. The examples next will show you the different results when masking over multiple columns of within a specified group column. There are two male and two female customers in the original data.
Result when shuffling over multiple colums
When you shuffle the FNAME and GENDER column together, it can cause that customers change from gender. That can possibly cause that related data becomes inconsistent. In this example customer 1 and 4 were original Male customers and are now Female customers.
Results when shuffling with a group column
When you shuffle the FNAME and select GENDER as group column all customers will have different first names while keeping the same gender. Customer 1 and 4 are still Male, but with different first names.
Now let's configure the shuffle for the first names of the CUSTOMER table.
- Select the CUSTOMER table from the tables list
- Right click the FIRST_NAME column and choose Add function... → Shuffle...
The function editor will open where you can configure some extra properties. By default the Exclude null values from shuffling checkbox is checked. This will cause that all null values will be the same after shuffling. If you want to also shuffle the null values. Uncheck the checkbox.
- We will define the GENDER column as Group. Find the GENDER column in the function editor and check the Group checkbox.
- Also add a description to the function: Shuffle all the first names within the same gender
- Click OK to add the function and close the function editor
When shuffling small sets of data, there could be a chance that some values are not shuffled. Choose a different function or replace existing data with synthethic data on small data sets.
If you're got stuck, you can also watch this short video to show you how it is done. Make the video fullscreen to get a better view.