In any given test data environment, data is needed. Conventionally, a user creates a separate database using a backup of a (production) environment or restores an existing test database with a backup. This is a process which reserves logical space on the machine where the database is located, while essentially being a copy of data that already exists somewhere else. When creating multiple copies of this same environment, this leads to linearly increasing storage costs.
Btrfs (“Butter FS” or “Better FS“) provides a solution to this problem by creating a file system that can immediately copy existing data without assigning that data its own logical space, and by instead creating a referral to the original data. Only when the data in this copy is edited, a new “Chunk“ of data is created, and the copy understands not to refer to the original data, but to this edited chunk whenever that specific data is being read. For unedited data, the referral to the original data is still used.
This means that if an original environment is copied, the only storage this copy uses on the file system is that of any edited data. This means that overall – less storage is used. Furthermore, whenever existing data is copied, because only a reference is created to existing data initially, cloning or restoring data from a snapshot is (nearly) instantaneous.
Now that we understand how the btrfs file system can facilitate efficient copying of data, we need to understand how this ties into the Runtime interface, and how the end-user can use this functionality. In one sentence:
Virtualize operates by using a pre-existing virtual machine, which contains Docker containers, which, in turn, contain a database image and the desired data.
Concretely, this means that the end-user only interacts with this virtual machine through the Runtime interface after the initial set-up is completed. From the Runtime interface, all of the day-to-day tasks carried out on the virtual machine can be done. This may contain, for example:
Resetting an existing environment’s data to a previous snapshot
Copying an existing environment, and creating a new container with the same image
Reading container logging, or logging returned from the virtual machine
Mostly, when working with the currently offered DATPROF tools, executing runs and designing templates is completely unaffected.
Because the configuration of a VM can be complex, DATPROF distributes a pre-existing VM which you can use as a starting template, with all the necessary settings already configured.
Should you want to configure your own, follow these specifications:
A virtual machine which can be connected to externally, running a Linux distribution such as Ubuntu. On it, the following must at the very least be configured:
Docker, with SSL configured
OpenSSH, for configuring the connection to the VM from a client
BTRFS – a BTRFS file-system must be available.
Virtualize relies on a lot of underlying technologies (Virtual Machines, Docker, btrfs). Because of this, the usage of this tool can quickly become very complex, with multiple containers and interfaces producing logging. Because of this, the user is explicitly responsible for maintaining both the virtual machine and the underlying Docker containers/images, and our standard support does not cover this topic.
Because Virtualize relies heavily on containerization, only database types which have a containerized version, and an image available are supported. Some examples include databases like SQL Server and PostgreSQL. If a database type does not distribute a container image, you can assume that you are unable to utilize Virtualize with this database type.
The Virtualize Menu
In order to access Virtualize, first click on Virtualize in the bottom left corner of the Runtime menu, under the administration header.
Adding a container host
To add a container host, press Connect container host at the top-right corner of the container host overview.
A name for the new container host. This name will also be displayed in the container host overview screen.
The IP-address or host-name of the machine the virtual machine is hosted on.
The port over which to connect to the virtual machine.
The password used to connect to the virtual machine. When setting up the DATPROF provided VM for the first time the user is prompted to provide a password for the VM.
Saves the current configuration, and navigates the user back to the container host overview screen.
Discards any modifications, and navigates the user back to the container host overview screen.
Editing existing container hosts
To edit an existing container host, click on the Options drop-down box on any given row in the container host overview. After this, select Edit. The resulting overview is mostly identical to that shown under Adding a container host, with the only difference being that pre-configured data is present in the data field and you cannot modify the password of the VM.
Removing existing container hosts
To remove an existing container host, click on the Options drop-down box on any given row in the container host overview. After this, select Remove.
Removing an existing container host does not impact the actual virtual machine at all. This merely removes the stored connection profile from Runtime.
Creating a virtualized environment
The process of creating an environment within your container host is largely identical to the regular process of creating an environment, however, because we are using docker to create a container, we need to supply some additional information.
To get started with creating a virtualized environment; select or create a Group in the main menu of Runtime, and press Add Environment in the top-right group overview screen.
Enter the name of the environment you’re creating. This name will be displayed within the group overview screen and is used to identify the environment.
The type of SQL database that’s being added. This is a drop-down menu that currently contains the following options:
IBM DB2 for iSeries
IBM DB2 for LUW (Linux, Unix & Windows)
IBM DB2 for z/OS (Mainframe)
SQL Server (Which also serves as the Azure SQL connection type)
Other.. (default connector)
The environments indicated in this list are shared between virtualized and “regular” database connections, and as such may depict database types which are not supported for Virtualize.
For regular database connections, selecting a database type informs which connectivity information we request from the user. For Virtualize however, all connection types use a standardized connection editor, and this option merely functions a flag.
For more information, refer to the Supported Databases segment of this documentation.
In order to create a virtualize environment, the Create container database option must be selected. For the default environment type (indicated by Existing database option); please refer to the Environments chapter of the documentation.
This is a drop-down menu that lists all configured container hosts from the Virtualize page. The selected container host’s connection data is used for any further operations within this environment.
This is a free text field where the user can specify the name of any Docker image to be used. The name of this image must be a direct match, and any image used must be available within the Docker environment on the virtual machine beforehand.
The volume mapping dictates where all data used is stored, in case the container is brought offline. The Host volume name refers to a location on the Virtual Machine where a volume can be created, and the Container volume refers to the location inside the container where any database information is stored. Generally, the latter uses a standard location, defined within a given database image.
In order to connect to the container, we need to map which ports connect to each other. For this, there’s two ports we need, the Host Port and the Container Port. The container port is the port over which the database communicates, and this is usually static. For databases like SQL Server this is generally port 1433, and for PostgreSQL this is generally 5432. Of course, it is possible to configure this on the database.
The host port must be a unique port on the VM which is not already used by another Runtime environment. This can generally be any free port.
Here, the user can (and in some cases must) pass environment variables to the container. For certain images/databases, an environment variable is needed to successfully create a database.
Because sometimes passwords need to be supplied as an environment variable, every key has a Password checkbox next to it. Ticking this ensures that the configured value is not legible from the UI after configuration.
An example of a database image which cannot start without an environment variable is the PostgreSQL image. if the user does not supply a POSTGRES_PASSWORD and POSTGRES_USER key, deployment will fail.
After all the above steps have been completed, the user can press Add environment to save the current configuration.
Once a container database environment has been created, its settings can be reviewed by selecting the settings page from the group overview. In general, once an environment has been configured, its usage is fairly similar to that of an ordinary environment. Two major differences are the addition of Container and Snapshots sub-menus, which allow the user to configure and control the Virtualize specific aspects of this environment.
Using the container actions a user can give instructions to the container. Among these, the user has to option to:
Start – start an exited or paused container.
Pause – pause a running container.
Stop – exit a running or paused container.
Restart – exit a running or paused container, and then start it again.
Clone as new environment – Creates a new environment using the same environment settings and the same image. takes the user to a new settings menu where the user must input a new environment name, and input a new port on the container host to be used for communication.
This section contains information about the container as returned by Docker.
Portrays the current status of the container. This is a one-on-one representation of Docker’s container statuses. Available statuses are Running, Paused, Exited, Restarting, Dead, and Created.
The name of the used database image. This is configured by the user whilst creating the environment, and should match the name of the image as returned when querying Docker.
To illustrate functionality, we’ll be using the Postgres 14 image. A valid name for the image in Runtime would be postgres:14 , as this is the name docker returns, plus the tag given to this image.
The name of the container used.
A timestamp of when this container was initially created, including the date and timezone.
The environment section displays all used environment variables, including those automatically generated by the chosen image.
Displays the configured storage locations for the docker volumes on the host (virtual machine) and the container.
Displays the configured port configuration between the database container and the host (virtual machine).
This must be an open port on the container host.
This must be an open port over which the database communicates. For certain databases, this is likely to be the default port for the database (5432 for PostgreSQL, 1433/1434 for MS SQL Server, etc.)
Displays the logging for all commands executed on the host (virtual machine) regarding the environment. For instance, when first creating the environment, the command log logs the creation of the data volume directory on the host.
Displays the logging for any activity inside of the container, such as the logging that the database image generates.
Cloning existing containers to create new environments
The user is able to quickly clone an existing virtualize environment.
Creates a new environment using the same environment settings and the same image. takes the user to a new settings menu where the user must input a new environment name, and input a new port on the container host to be used for communication.
The snapshots menu allows a user to create and restore snapshots of a container database. A snapshot is a stored copy of a container at a given timestamp. Concretely, this means that the user can save a database, and quickly roll back to a previous snapshot at any time. This can be desirable when, for instance, developing templates where an incorrect template configuration can damage aspects of your dataset.
When first opening the snapshots sub-menu in a new environment, this is the only option available to the user. Creating a snapshot creates a point-in-time reference to the data in the database in that moment, and is a differential back-up. This means that a snapshot only saves all changes made since the last snapshot, and does not save the entire state of all the data. This significantly reduces the amount of space needed to make a back-up, and means creating one is significantly faster.
If the environment contains one or more snapshots already, this button will be displayed in the top-right corner of the snapshot overview menu.
After creating an initial snapshot, all existing snapshots will be listed in the snapshot overview.
In the snapshot overview, all existing snapshots are listed, along with information pertaining to the snapshot.
Lists the username of the creator of a snapshot. This username refers to the one assigned to a given user in the User management portion of Runtime.
The timestamp of creation for a snapshot.
Restoring from a snapshot
The user can restore from a snapshot by clicking Restore on any given entry in the snapshot overview. This will lead to the loss of any unsaved data in the container, and roll-back the data in the container to represent the data as it was during the timestamp of the snapshot.
Due to how btrfs handles restoring snapshots, this process is (nearly) instantaneous.
Deleting a snapshot
A snapshot can be deleted by pressing delete on any given entry in the snapshot overview. This process is not reversible.