Introduction: Data Virtualization
In any given test data environment, data is needed. Conventionally, a user creates a separate database using a backup of a (production) environment or restores an existing test database with a backup. This is a process which reserves logical space on the machine where the database is located, while essentially being a copy of data that already exists somewhere else. When creating multiple copies of this same environment, this leads to linearly increasing storage costs.
Btrfs (“Butter FS” or “Better FS“) provides a solution to this problem by creating a file system that can immediately copy existing data without assigning that data its own logical space, and by instead creating a referral to the original data. Only when the data in this copy is edited, a new “Chunk“ of data is created, and the copy understands not to refer to the original data, but to this edited chunk whenever that specific data is being read. For unedited data, the referral to the original data is still used.
This means that if an original environment is copied, the only storage this copy uses on the file system is that of any edited data. This means that overall – less storage is used. Furthermore, whenever existing data is copied, because only a reference is created to existing data initially, cloning or restoring data from a snapshot is (nearly) instantaneous.