The BI technology battle has already been fought… and won.
In recent history, we have witnessed many technology battles such as VHS Vs. Betamax and Apple Vs. Android. The field of Business Intelligence (BI) is no different.
In the BI world, the battle between Cubes Vs. Associative data structures (Associative) has already been fought and won. Both these solutions have attempted to deal with data bloat which is a natural occurrence in row-based databases. The initial front runner, Cubes, held the upper hand in terms of market penetration, however, Associative data structures took the lead after the following key events took place:
- Gartner’s complete redefinition of its BI Magic Quadrant report to focus on the user-driven Associative revolution led by Qlik; and
- SAP & Microsoft’s recent headlong rush to create their own “Associative-ish” copy-cat data engines.
What Is Data Bloat?
Most databases for everyday use are row-based because they are very good at writing data. However, row-based databases have major performance issues when extracting data. Firstly, the extraction process reads every record in a table in full – even if the values from only one field are required. This dramatically slows down extraction. Secondly, all data points are stored in full – even if they are not unique. If we assume that every character requires one byte of storage, every repeated record adds extra, unneeded storage requirements for no discernible benefit.
For example, if we decided to take 10 million records detailing the names of the days of the week, and we assumed that the day names were spread evenly across the data, the storage requirement would be roughly 71 million bytes or 71 meg. However, 9,999,993 records are repeats of the essential data. They add no additional data point value, yet they take up space. This is the very definition of data bloat.
What Are Cubes?
Data Cubes are pre-aggregated hierarchies of data. They attempt to deal with data bloat by stripping out low-level transactional data, summarising by level.
For example, in a sales Cube, the lowest level represents sales by product by salesperson. The next level is sales by salesperson by store. Then sales by store by city etc.
This sounds like a great idea unless you want to see sales by product by store. While the data is technically already in the Cube, the change in hierarchy means that you must build another Cube. Every time you pivot the hierarchy, you need to build another Cube, and the more Cubes you build, the more storage is required. This “Cube Bloat” is a far worse challenge than “Data Bloat” because now you have information redundantly repeated across multiple different data entities with the added administration needed to document which entity is used for which report, and how the data is pivoted.
What initially looked like a smart way to reduce Data Bloat resulted in Cube Bloat with an additional administration overhead to simply manage the rapidly replicating cubes. Not an ideal outcome.
What Is Associative?
Associative is essentially column-based databases. They deal with Data Bloat by storing each field on its own page with an index. Each unique value is stored once, and a table of the indices governs how the data relates to each other.
In the days of the week example discussed above, the Associative model stores the day names once, and a table of indices for the other 9,999,993 records. In this example alone, Associative creates a 7 times compression ratio in comparison to a row-based database. However, across multiple fields with normal data densities, a compression of 10x is more likely.
One of the best upshots of the Associative process is that the low-level transactional data that would normally be stripped out in the cube summary process is now available to provide a richness of context that often leads to unexpected insights.
In recent years most players in the BI industry moved to Cubes except for a Swedish company called Qlik. In the 1990s, they adopted the Associative model and became its sole champion for many years.
As discussed above, at a macro level, the impact on the business intelligence industry has been profound. But how does choosing Associative over Cubes actually impact the daily workload of Business Intelligence professionals?
Well, I’m glad you asked. I will be releasing a followup blog in the coming weeks titled If you want better analytics, here are 7 reasons to ditch data cubes. You can read it now.
If you would like to speak to me or any of our consultants about the advantages of choosing Associative for your organisation, please get in touch.