Skip to main content
Skip table of contents

File Masking

Screenshot 2025-07-02 083741.png

DATPROF Runtime not only enables data masking within databases, but also supports the masking of external data files. This functionality allows you to apply the same masking rules used in database environments to (un)structured files, ensuring consistent and secure handling of sensitive information across your entire data landscape.

Using a masking template, Runtime interprets the structure of the file and applies the configured masking rules to the relevant data fields.

File masking is typically used in scenarios where:

  • Data files are exchanged between systems or teams.

  • Sensitive content is exported for testing, development, or analysis.

  • Compliance requires anonymization of personal or confidential information outside the database.

This functionality can be fully automated and integrated into your Runtime workflows, providing a streamlined approach to secure data handling across both databases and files.

DATPFOF Runtime File Masking Demo Application overview.png

File Masking with Spark SQL

To mask fields in your files, Runtime uses Spark SQL expressions to locate, filter, and transform specific columns within your data. Spark SQL provides a flexible and powerful way to target fields at any level of your data structure.

Apache Spark SQL is a powerful, distributed SQL engine that can process large-scale data efficiently. Runtime uses Spark SQL for file masking:

  • Masking rules use Spark SQL expressions to apply conditional logic on columns.

  • Spark SQL allows complex conditions and transformations for flexible masking.

Building a File Masking Application

Before you can begin building a File Masking application, you'll need to set up a group and an environment. For step-by-step instructions on creating groups and environments, refer to the Groups and Environments section of the Runtime documentation.

Runtime requires Input, Translation, and Output directories for File Masking. You can verify the directory paths by clicking the “Validate” button. Currently, only local or network files can be masked. Support for Azure and AWS storage will be added in future releases.

Once the group and environment are in place, you can proceed with defining the file structure, selecting masking rules, and configuring the masking template for your application.

DATPROF Runtime File Masking Environment.png

Application Editor

The interface of the Application editor is quite intuitive, but we’ll walk you through the key features to ensure you get the most out of it. Start by entering the following information:

Name: Enter a unique and descriptive name for your application. This name will be used to identify your application within the DATPROF Runtime interface.

Version: Specify the version of the application you’re creating. Versioning helps you manage updates and track changes to your application over time.

File Type: Select the file type that you need to process. Currently, only Parquet files are supported. Support for additional file formats will be introduced in future versions of DATPROF Runtime.

Once you’ve entered the application details, you’re ready to add file patterns and masking functions!

DATPROF Runtime File Masking Application.png

File Pattern

Once you’ve entered the basic application details, the next step is to define the file pattern and configure the appropriate masking functions. This allows DATPROF Runtime to identify which files to process and how to handle sensitive data within them.

The file pattern is used to specify which files should be masked by the application. This pattern tells Runtime how to locate the target files in the file system or data source.

You can use the following approaches when defining your pattern:

  • Exact file names: Use the full file name if it’s always the same.
    Example: CUSTOMERS_10k.parquet

DATPROF Runtime File Masking File Pattern Full.png

  • Wildcard patterns: Use wildcards to match files with dynamic elements, such as timestamps, sequence numbers, or environment identifiers.
    Example: CUSTOMERS_*.parquet

DATPROF Runtime File Masking File Pattern Wildcard.png

Wildcards allow flexibility in identifying files without needing to update the application for every filename change.

Tip: Ensure your pattern is specific enough to avoid unintentionally matching unrelated files, especially in directories containing multiple file types or data sources.

Once the file pattern is defined, you can proceed to map the file structure and assign masking functions to individual fields. These functions determine how each piece of sensitive data is transformed, ensuring that it remains protected while retaining a usable format for development, testing, or analytics.

To mask files, use patterns: for exact matches, specify the full name (e.g., filename.dat); for variable parts like timestamps, use wildcards (e.g., filename_*.dat).

Masking Functions

Currently, File Masking in Runtime supports 39 masking functions. These functions allow you to apply a wide range of masking techniques, such as character replacement, data encryption, or randomization, to ensure that sensitive information is properly protected while maintaining the integrity and usability of the dataset. Whether you’re masking personal identifiers (PII), financial data, or other confidential information, DATPROF Runtime’s masking functions enable fine-grained control over how data is transformed.

This section will introduce you to the available masking functions for File Masking.

Masking

Masking Functions

Description

Constant value

Creates a constant value in the generated dataset that is identical for every field. I.e. inputting MyFavoriteValue here will generate ‘MyFavoriteValue’ for every field in the resulting field’s dataset.

Date/time modifier

This function will change the existing date to a fixed day in the same month. Or to a fixed day in the first month of the same year. With this change, in most cases the new values remain functionally viable.

Value lookup

With this function the replacement value will be obtained from a lookup file or translation file.

Blank

This function will NULL the selected field(s).

Custom expression

The Spark SQL custom expression should resolve to a value or contain a function that returns a value.

Sequential number

Generates a sequential number for every field that starts at a specified value and increments by the step value per field.

Additionally, you can define a Padding for your generated integers. This is a set number that will be affixed to the generated integer. For example, using a padding of 3, and a start of 8 with a step of 2 will generate the following:

008 → 010 → 012

Sequential date/time

Identical to the Random Date/Time generator except that this generator creates a sequential datetime for every field. Supplying a maximum datetime is optional.

A number to increment the starting date is required, and can only accept whole numbers. Any unit of time to increment by can be chosen, from seconds to years.

DATPROF Runtime File Masking Blank Function.png

DATPROF Runtime File Masking Date time modifier.png

DATPROF Runtime File Masking Custom Expression 1.png

Basic Generator

Basic Generators

Description

Random date/time

Generates a random Date/Time value per field between a specified minimum and maximum datetime that corresponds with the underlying field’s datatype.

Random decimal number

Generates a random decimal number between a supplied minimum and maximum value for every field.

Using the Scale setting the decimal accuracy can be defined. For instance, a generator with a minimum value of 0 and a maximum value of 1000 using a scale of 4 might generate 132.4202.

Random whole number

Generates a random integer between a supplied minimum and maximum value for every field.

Random string

Generates a random string of lower- and uppercase letters for every field. The minimum and maximum length for strings can be defined.

DATPROF Runtime File Masking Random Whole Number.png

DATPROF Runtime File Masking Random String.png

Business Generator

Business Generators

Description

Credit card account number

Generates a random credit card account number. Using Issuer(s), you must define one or multiple issuers to determine which syntax the generated account numbers adhere to.

IBAN (International Bank Account Number)

Generates a valid IBAN number for every field. Using Country Code(s) you can specify which country codes you’d like your resulting IBAN codes to use.

Currency code

Generates a three letter currency code for every field.

Currency symbol

Generates a currency symbol for every field.

User agent

Generates user agents per field, for example: Mozilla/4.0 (compatible; MSIE 5.13; Mac_PowerPC), Opera/8.53 (Windows NT 5.2; U; en).

A-Number/GBA Number (Dutch township security number)

Generates a 10 digit GBA number per field.

Genre

Generates a media genre for every field.

BSN Number (Dutch citizen service number)

Generates a valid Dutch social security number per field.

SSN (US Social Security Number)

Generates a SSN for every field. You can specify how you want to separate your resulting numbers.

You can choose one of the following formats:

  • None (Example: 101010101, 003122142)

  • Dashed (Example: 101-01-0101, 003-12-2142)

  • Spaced (Example: 101 01 0101, 003 12 2142)

Job

Generates a profession for every field. Using the Language(s) you can specify which language(s) you want your resulting job names generated in.

Military rank

Generates a military rank name for every field. Using Department(s) you can specify which branches of the armed forces you’d like to include in your resulting dataset.

DATPROF Runtime File Masking Military Rank.png

DATPROF Runtime File Masking Credit Card Number Generator.png

Advanced Generator

Advanced Generators

Description

Regular expression

Generates values based upon a regular expression. The syntax for the regular expression used in Runtime/Privacy is specific to the package we use, so please refer to the specifications here.

Name Generator

Name Generators

Description

Brand

The Brand generator creates a unique brand name for each field in your dataset. It is particularly useful when working with data that requires distinct and realistic brand names.

Color code

Generates a color hexcode (Both three-digit shorthand and six-digit full length hexcodes) per field.

Color

Generates a random color per field. (Ex. Baby Blue, Sky Blue, Soft White)

Dinosaur

The Dinosaur generator generates a Dinosaur name for each field in your dataset.

First name

The First name generator creates a unique first name for each field in your dataset. It is particularly useful when working with data that requires distinct and realistic first names.

Full name

The Full name generator creates a unique full name for each field in your dataset.

First name (female)

The First name (female) generator creates a unique female first name for each field in your dataset.

First name (male)

The First name (male) generator creates a unique male first name for each field in your dataset.

Last name

The Last name generator creates a unique last name for each field in your dataset.

Random word

The Random word generator generates a random word for each field in your dataset.

DATPROF Runtime File Masking First Name Generator.png

DATPROF Runtime File Masking Last Name Generator.png

Location Generator

Location Generators

Description

City

Generates a random city name per field. Here, you can specify for which countries you’d like to generate random names using the Countries drop-down menu.

Company

The Company generator generates a random company name for each field in your dataset.

Country

Generates a random country name per field. The Language(s) option specifies in which language the country will be written. Multiple options can be enabled simultaneously.

Street

Generates a random existing street name per field. The Countries option allows you to specify for which country street names will be generated.

Two letter country code

Generates a random country code per field either in a 2 letter country code.

Three letter country code

Generates a random country code per field either in a 3 letter country code.

DATPROF Runtime File Masking Company Generator.png

DATPROF Runtime File Masking City Generator.png

DATPROF Runtime File Masking Country Generator.png

Condition

The Condition tab enables you to apply filters to the selected fields based on specific values. This allows you to define precise masking rules that apply only when certain conditions are met.

For example, in our demonstration, we’ll use the First Name (Female) generator to mask the selected field only when the gender field contains the value "F". This ensures that the masking function is applied selectively, preserving the integrity of other records while maintaining realistic and consistent data.

Screenshot 2025-07-02 at 09-40-53 DATPROF Runtime.png

DATPROF Runtime File Masking Condition.png

Screenshot 2025-07-02 at 09-40-53 DATPROF Runtime.png

Translation

You can choose to store the result of the data masking function in a translation file. This is particularly useful in implementing consistent data masking between files. Translation files store the old and the new value for each field value in the file.

The Translation tab inside a masking function lets you create and name translation files:

DATPROF Runtime File Masking Translation file.png

Note: Comma-separated list of columns that will be used as keys in the translation file. (e.g. id, customerId)

Comment

The Comment tab allows you to add internal notes or documentation related to your File Masking functions. This section is not used during execution but serves as a helpful space to provide context, instructions, or other information for yourself or team members.

Typical uses for the Comment tab include:

  • Describing the purpose or scope of the masking function.

  • Documenting important changes or version history.

  • Adding notes for future maintenance or handover.

  • Listing assumptions, known limitations, or special handling rules.

Providing clear and concise comments can improve collaboration and make it easier to manage and troubleshoot the application over time, especially in environments where multiple team members are involved.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.