Can data deduplication save you money?

The inefficiencies of duplicated data can drag down business performance and budgets.

Clutter is a very human phenomenon. Some animals hoard things, but none compare to us homo sapiens. We gather and stockpile, collecting objects that take up space in our homes without always adding much to our lives. Still, rather have it and not need it than need it and not have it, right?

This habit quickly gets out of hand in the digital world because it just takes a few clicks to make copies of information—no wonder the average company has an enormous amount of duplicated data.

Estimates vary on how widespread data duplication has become; some experts put the average at around 20 percent duplicate data, while others go beyond 50 percent. A 2016 survey from Veritas pinned the number at 33 percent of redundant data and a further 55 percent for dark data with unknown value. Much of that dark data is also duplicated.

“Data duplication is a big issue for companies, even if they aren’t aware of it,” says Tristan Davies, Solutions Architect of Business Development. “It’s just so easy to create copies of data and then lose track of those files, and it happens in backup systems as well as on employee devices. Many people copy files because it makes their work easier. Remote working and virtual collaboration have made this even more prevalent.”

The Cost of Data Duplication

It’s difficult to pin an exact value lost to duplicated data. To make that determination, one must consider the different ways it impacts costs:

Storage: Duplicated files take up additional storage space, creating performance problems on user devices and an exponentially-growing storage burden for production servers and backup systems.

Staff time: Managing unnecessary data consumes valuable staff time—some case studies note a five-times higher full-time equivalent hours demand for companies with 20 percent data duplication.

Data management: Duplicated data creates bloat that reduces the effectiveness of data management and data loss prevention systems.

Security: Too much data duplication encourages poor security habits and data leakage, and the consequences of either can be expensive.

Compliance: Duplicate data can be a significant hindrance when meeting compliance around customer and employee data.

Modernisation: Companies aiming to use modern technologies such as artificial intelligence or data analytics will struggle because they can’t maintain coherent and reliable master data.

Deduplicate Your Data

Fortunately, fixing duplicate data is almost as easy as creating it. Automated deduplication (dedup) services can run background scans on data, hunting down duplicates.

“You can scan data in several ways to look for duplicates,” says Tristan Davies. “You can scan for files or sub-files, and you can select to remove the extra data, or mark and compress it for later interventions. Backups are often the best place to start.”

Deduplication systems integrate with data loss prevention and data management environments. Some enterprise operating systems include dedup features, and there are solutions that help manage duplicate files in virtual collaboration workspaces.

Yet while deduplication solutions are easy to find, companies should avoid trying to move the mountain all at once.

“Deduplication is not challenging from a technical standpoint, but it is very important to get the planning and integration to fit. If you try and tackle everything at once, you can slow down machine performance and increase downtime from backups. The first step is to analyse your duplication issues and identify where they affect your operations or costs the most. Duplicate data sitting in 5-year old cold backups is not as big a deal as duplicating operational data. Or maybe it is—that depends on your priorities, and that’s where many companies go wrong when they tackle duplication.”

Too many enterprises opt to ignore the problem. But today’s data volumes and importance are pushing for better management of duplicate and redundant data. Organisations that want to use their data effectively shouldn’t skimp on data management and visibility, including managing duplication.

Consider how easy it is to copy one file. Now amplify that across your business. Just how much are you leaving on the table due to duplicate data? Data deduplication is a low-hanging fruit of data management that can deliver great benefits, in particular lower costs and better performance.