Deduplication: Stop repeating yourself

New techniques can save disk space and speed backups.

Data deduplication, data reduction, commonality factoring, capacity optimized storage – whatever you call it — is a process designed to make network backups to disk faster and more economical.

Proponents also say it enables you to make more data available online longer in the same amount of disk.

In deduplication, as data is backed up to a disk-based virtual tape library (VTL) appliance, a catalog of the data is built. This catalog or repository indexes individual bits of data in a file or block of information, assigns a metadata reference to it that is used to rebuild the file if it needs to be recovered and stores it on disk. The catalog also is used on subsequent backups to identify which data elements are unique. Nonunique data elements are not backed up; unique ones are committed to disk.

For instance, a 20-slide PowerPoint file is initially backed up. The user then changes a single slide in the files, saves the file and e-mails it to 10 counterparts. When a traditional backup occurs, the entire PowerPoint file and its 10 e-mailed copies are backed up. In deduplication, after the PowerPoint file is modified, only the unique elements of data — the single changed slide – is backed up, requiring significantly less disk capacity.

StorageSpace

Tuesday, September 19, 2006

Deduplication: Stop repeating yourself

New techniques can save disk space and speed backups.

0 Comments:

About Me

Previous Posts