Why most enterprise data democratization projects fail (but not at Google)

Nov 23, 2024
3 min read

Organizations are filled with dead, abandoned projects at one point praised by leadership sponsors to be the holy grail of data democratization. If you’ve been working with enterprises as a data professional for even just a few years, you’ve seen this cycle at least once. It doesn’t matter what type of organization either - whether it’s tech, finance, or government; they all struggle with this. The story typically goes as follows:

Leaders notice business departments are inputting a mind-boggling amount of operational information into various systems day-to-day. Think of typical CRM or support tools, like HubSpot, Salesforce, ZenDesk.
Executives in charge of line departments are asked to provide updates - whether it’s a summary of what’s happening in the format of a report, a drill down of a critical scenario that is ongoing, a suggestion such as a planned experiment to improve the organizational function over x number of years, and so on.
They all jump in front of the data engineering team and beg for tables to be created with access granted to them so they can slice and dice the data whichever way they need.

Of course the last step is where the wheels fall off the bus. Notice that these individuals aren’t doing anything wrong – everyone is trying to do their job. But let’s see what happens when we fast forward. Suppose you get the green light to create a data engineering team (usually the way this plays out) and after some unspecified (read: infinite) amount of time, you build a pipeline or two (or three), set up some dashboarding tools, and onboard users. Except:

Your analysts & engineers never have time to build pipelines and enrich your datasets because they're working on ad-hoc queries 24/7. Did you think the VP of Sales was going to write SQL themselves? Usually, the first question executives ask is about computing some kind of key performance metric for them. Except the next question that almost always immediately follows is “can I drill down to see which components led to this decreasing or increasing?”, and “what can I do to make the number go up?!” And, you’re not done yet - queue the long ideation sessions and A/B testing splits/flags your pipeline has to support… in perpetuity.
Your IT team is constantly managing on - premise resources instead of leveraging cloud capabilities. Remember all those ad-hoc queries? They aren’t free. Oftentimes, business users don’t even know the cost in resources for such queries. They could be cheap and make a huge impact. Or they could be massive in resource costs and do very little for your business. No one knows. Until they hand you the bill of course. If you’re on premise, you now have to schedule and monitor the systems to ensure uptime, and queue jobs when compute resources are limited, and think about security, and patching… and privacy…. and.. the list goes on
Your analysts struggle to maintain a single source of truth - everyone has different filters, join conditions, and so on. In big enough firms, leaders may even hire analysts on their own teams to build their own queries and reports. It’s not fun to reconcile all of those because the sales analysts missed a flag or two (or three - running theme here).
Your OLTP databases just aren't built for reporting - and compute resources there cost just too much to do dynamic filters, or to connect live to visualization tools. This means users will really not like running queries on their own.

So what’s the solution? One of the only places I’ve seen get this right - was Google. On GCP, using Looker with LookML Explores as a semantic layer given filters and joins built in for key metrics, was a game changer for non-SQL fluent users. Users could simply click specific Measures or KPI to select them in the UI alongside Dimensions, and they would be more automatically grouped and aggregated in dynamically generated queries. This meant everyone had a single source of truth from a governed dataset to work with.

Looker was cloud based so we already avoided complex server setups and maintenance already. But what really turbocharged it was being hooked up to BigQuery under the hood. We instantly had autoscaling functionality using a live OLAP data warehouse which really changed the game in terms of user query experience. The speed of answers people experienced on our Looker Explores led to millions of queries against our tables per month.

If you’re curious to see what this looks like or how to implement it: check out our first course on Udacity about BigQuery and Looker!

A final word of advice: you better make sure the business user/department requesting the data is paying the bill for their queries. You’d be surprised how much more cognizant people are of limited resources when you charge them for information.

Why most enterprise data democratization projects fail (but not at Google)

Comments

Subscribe to Our Newsletter