Better business intelligence is one of the digital era’s opportunities we want to harness. Historical data can reveal trends, machine data can help improve planning and costs, and customer data can generate leads and reduce churn.
Executives and managers can access digital analytics through dashboards, even if they don’t have any data skills. Digital analytics has democratised business analysis. No wonder over ninety percent of enterprises say data and analytics are essential to their business growth and digital transformation.
Yet many analytics projects underperform. Reasons include poor design, unclear objectives, and problems retrieving correct data. There is also a less-considered problem: infrastructure issues. Specifically, many analytics projects don’t scrutinise storage.
“Storage is often an afterthought for analytics,” says Malcolm Tiley, Senior Enterprise Consultant at Sithabile Technology Services. “Projects tend to prioritise computing power and network speeds to look at small batches of data. Yet, as the analytics scales up to include more data and deliver information to more stakeholders, the wrong storage types quickly become bottlenecks.”
How storage affects data analytics
Storage significantly impacts digital performance. Phones and laptops become slow when their storage space runs low. Yet the issue for enterprise analytics is usually not a lack of storage but the wrong storage.
We can segment data analytics into two categories: synchronous and asynchronous. Synchronous analytics, or real-time analytics, typically use database systems such as NoSQL to scale quickly. Asynchronous analytics is not real-time. It uses a process called ETL (Extract, Transform, Load), which collects and transforms data to be usable before loading it into a data warehouse.
These two use cases change storage considerations, says Tiley:
“If you want to run real-time analytics, the best bet are flash storage drives because they are very fast. If the drives are too slow, you’ll get a lot of latency. But if you are working with asynchronous data flows, flash drives might be too expensive for the value you need. Magnetic disk drives make more sense, in the case of cold and archived data, tape drive systems could be the better solution.”
The type of storage matters, as does the storage infrastructure. While flash drives are fast, flash arrays with high-speed controllers are even faster (but more expensive). Hard drives in specific configurations reduce data loss risks. The proper infrastructure can scale well.
Analytics environments realise the most value through balancing storage needs, performance and cost: “You can spend a lot of money and only get the fastest systems. That will avoid some headaches, but it won’t be the best use of your budget, especially when the system requires maintenance and upgrades.”
Spotting storage bottlenecks
It’s tricky to pinpoint storage bottlenecks. Numerous components impact analytics performance, such as computing power, memory, the type of databases, and storage speed, configuration, and controllers.
Yet, while storage is one of the simplest and most affordable upgrades one can make for analytics, it’s often last to be considered. How can you check if storage causes bottlenecks? Experienced storage experts such as Sithabile follow a series of steps to isolate problem areas:
- Look for symptoms: slow and erratic response times, frequent errors, inconsistent data quality, and high disk utilisation are common signs of storage bottlenecks.
- Analyse logs: Other components can cause the above issues, so check storage and event logs. Experts use indicators such as IOPS (input/output operations per second), data transferred per second, latency (input/output operation times), queue depth (pending requests) and utilisation (percentage of storage capacity).
- Pinpoint and benchmark problems: Use synthetic benchmarks to test and compare the performance of problem areas.
This information helps determine your options, ranging from a few software updates to replacing legacy or poorly-aligned infrastructure.
The good news is that you don’t have to do this often: a well-planned storage environment can deliver value for many years, and maintenance and upgrades become predictable. Improving storage is the low-hanging fruit of successful analytics systems.
If poor performance is holding your analytics hostage, look at your storage. If it’s got issues, you get quick wins. And if it works well, you create a reliable foundation for the future of your business intelligence.