The inefficiencies of duplicated data can drag down business performance and budgets.
Clutter is a very human phenomenon. Some animals hoard things, but none compare to us homo sapiens. We gather and stockpile, collecting objects that take up space in our homes without always adding much to our lives. Still, rather have it and not need it than need it and not have it, right?
This habit quickly gets out of hand in the digital world because it just takes a few clicks to make copies of information—no wonder the average company has an enormous amount of duplicated data.
Estimates vary on how widespread data duplication has become; some experts put the average at around 20 percent duplicate data, while others go beyond 50 percent. A 2016 survey from Veritas pinned the number at 33 percent of redundant data and a further 55 percent for dark data with unknown value. Much of that dark data is also duplicated.
“Data duplication is a big issue for companies, even if they aren’t aware of it,” says Tristan Davies, Solutions Architect of Business Development. “It’s just so easy to create copies of data and then lose track of those files, and it happens in backup systems as well as on employee devices. Many people copy files because it makes their work easier. Remote working and virtual collaboration have made this even more prevalent.”
The Cost of Data Duplication
It’s difficult to pin an exact value lost to duplicated data. To make that determination, one must consider the different ways it impacts costs:
Deduplicate Your Data
Fortunately, fixing duplicate data is almost as easy as creating it. Automated deduplication (dedup) services can run background scans on data, hunting down duplicates.
“You can scan data in several ways to look for duplicates,” says Tristan Davies. “You can scan for files or sub-files, and you can select to remove the extra data, or mark and compress it for later interventions. Backups are often the best place to start.”
Deduplication systems integrate with data loss prevention and data management environments. Some enterprise operating systems include dedup features, and there are solutions that help manage duplicate files in virtual collaboration workspaces.
Yet while deduplication solutions are easy to find, companies should avoid trying to move the mountain all at once.
“Deduplication is not challenging from a technical standpoint, but it is very important to get the planning and integration to fit. If you try and tackle everything at once, you can slow down machine performance and increase downtime from backups. The first step is to analyse your duplication issues and identify where they affect your operations or costs the most. Duplicate data sitting in 5-year old cold backups is not as big a deal as duplicating operational data. Or maybe it is—that depends on your priorities, and that’s where many companies go wrong when they tackle duplication.”
Too many enterprises opt to ignore the problem. But today’s data volumes and importance are pushing for better management of duplicate and redundant data. Organisations that want to use their data effectively shouldn’t skimp on data management and visibility, including managing duplication.
Consider how easy it is to copy one file. Now amplify that across your business. Just how much are you leaving on the table due to duplicate data? Data deduplication is a low-hanging fruit of data management that can deliver great benefits, in particular lower costs and better performance.