Introduction
Concepts
Development, testing, QA, and training databases are commonly deployed to support business applications. Typically, organizations use full copies of production databases for these environments. However, production databases can be very large, which creates challenges related to testing runtimes, disk space requirements, and the management and backup of these environments.
Lengthy backup processes: Creating a backup can take considerable time and may not be readily available when needed.
Time-consuming restoration: Restoring a production copy to a testing environment can also be a lengthy process.
High infrastructure costs: The testing infrastructure must closely match the production environment to ensure similar performance characteristics, which can be expensive to maintain.
DATPROF Subset selects data from a full-sized production database, referred to as the Source database. This data is then transferred to a copy of the database, deployed for non-production purposes, known as the Target database. By applying filters during the population of the Target database, its size is reduced compared to the Source database. This results in faster testing, saving both time and money.
The primary method DATPROF Subset uses to access data is through a central table in the Source database, known as the Start table. Data from other tables is extracted based on Foreign Key relationships with the Start table and, by extension, across the Relational Integrity model.
During the deployment of the Subset project, only a connection between the Source and Target databases is required, no data is transferred through DATPROF Subset itself. This setup enhances performance by reducing network traffic on End-Points, keeping it confined to the network "spine," which is also a critical security benefit.
A small example.
The example below shows three tables. There are 100 Customers, 1000 Orders and 10000 Order Lines in the Source database.
Every customer has 10 orders and every order has 10 related order_lines.
Only 10 customers have a name starting with ‘A’.
Source database containing all customers, orders and order lines
If the Customers table is the Start table and we only select Customers having a name starting with an ‘A’, this will result in a Target database with 10 Customers, 100 orders and a total of 1000 order_lines.
Target database with only customer names starting with 'A' and their related orders and order lines.
The result is a Target database that is 90% smaller than (or 10% of the size of) the Source database, retaining data consistency and referential integrity.
DATPROF Subset enables the user to save these filters and functions so that they can be deployed repeatedly. This way a controlled Subset process emerges.
After subsetting, the data in the Target database can be used for testing purposes, but can also be the base for anonymization using DATPROF Privacy.
Using DATPROF Subset
To create a Subset using DATPROF Subset for the first time, follow these steps:
Create a New Project and Template
Begin by setting up a new project and defining the template that will be used for the subset.Set Up Connections to the Source and Target Databases
Establish connections to both the Source database (the full-size production database) and the Target database (the copy for testing or development).Import the Data Model of the Source Database
Import the data model of the Source database into DATPROF Subset's metadata to map the structure and relationships between tables.Develop and Configure the Project
Configure the project by defining filters, selecting tables, and setting any additional parameters or rules that determine which data will be included in the subset.Deploy the Subset Project
Once the configuration is complete, deploy the Subset project to generate the reduced-size Target database, which will support testing, QA, or development needs.