About DATPROF Subset

Concepts

Development, testing, QA, and training databases are essential for supporting business applications. Typically, organizations replicate full production databases for these environments. However, this approach can lead to challenges due to the large size of production databases, resulting in:

Longer testing runtimes
High disk space requirements
Complex management and backup processes

These issues can reduce efficiency and increase costs. Specifically:

Lengthy backup processes: Creating a backup can take considerable time and may not be readily available when needed.
Time-consuming restoration: Restoring a production copy to a testing environment can also be a lengthy process.
High infrastructure costs: The testing infrastructure must closely match the production environment to ensure similar performance characteristics, which can be expensive to maintain.

These challenges highlight the need for more efficient solutions in managing test data.

DATPROF Subset selects data from a full-sized production database, known as the Source database, and transfers it to a copy used for non-production purposes, referred to as the Target database. By applying filters during the transfer, the size of the Target database is reduced compared to the Source, resulting in faster testing and cost savings.

The core method DATPROF Subset uses to access data is through a central table in the Source database, called the Start table. Data from other tables is extracted based on Foreign Key relationships with the Start table, ensuring consistency across the Relational Integrity model.

This setup improves performance by reducing network traffic at the endpoints and provides a critical security benefit by keeping traffic confined to the network "spine."

A small example.

The example below shows three tables. There are 100 Customers, 1000 Orders and 10000 Order Lines in the Source database.

Every customer has 10 orders and every order has 10 related order_lines.

Only 10 customers have a name starting with ‘A’.

Source database containing all customers, orders and order lines

If the Customers table is the Start table and we only select Customers having a name starting with an ‘A’, this will result in a Target database with 10 Customers, 100 orders and a total of 1000 order_lines.

Target database with only customer names starting with 'A' and their related orders and order lines.

The result is a Target database that is 90% smaller than (or 10% of the size of) the Source database, retaining data consistency and referential integrity.

DATPROF Subset enables the user to save these filters and functions so that they can be deployed repeatedly. This way a controlled Subset process emerges.

After subsetting, the data in the Target database can be used for testing purposes, but can also be the base for anonymization using DATPROF Privacy.

Using DATPROF Subset

To create a Subset using DATPROF Subset for the first time, follow these steps:

Create a New Project and Template
Begin by setting up a new project and defining the template that will be used for the subset.
Set Up Connections to the Source and Target Databases
Establish connections to both the Source database (the full-size production database) and the Target database (the copy for testing or development).
Import the Data Model of the Source Database
Import the data model of the Source database into DATPROF Subset's metadata to map the structure and relationships between tables.
Develop and Configure the Project
Configure the project by defining filters, selecting tables, and setting any additional parameters or rules that determine which data will be included in the subset.
Deploy the Subset Project
Once the configuration is complete, deploy the Subset project to generate the reduced-size Target database, which will support testing, QA, or development needs.