Community Spotlight: Augmented analytics on business metrics by Cuebook with Apache Druid®

by Peter Marshall · August 18, 2021

Cuebook is putting you, decision-maker, back in the driving seat, powered by Apache Druid®.  In this interview with their founder and CEO, we learn their reason for being, their open source Cuelake tooling, and their reason for using Druid.

Sachin Bansal is an established visionary in tech. His latest venture, Cuebook, is based in Bengaluru, India and he and his team chose Apache Druid as a core part of their solution architecture on day zero.

“We’re building the world’s best augmented analytics product, aiming to build an application and services that ease and automate analytics.”

With machine learning, Cuebook promises to push the envelope on analytics, proactively assisting with anomaly detection and root cause analysis, surfaced through a friendly user interface.

But achieving this goal is no easy feat.

Cuebook had to engineer a business capability to bring data out of customers’ relational databases, to make it ready for democratised analytics, to push that into a database that was perfect for easy analysis, and then build a non-technical interface that enabled creativity, exploration, and investigation.

“We wanted to make a product where insight just works: that is Cuebook.

“People shouldn’t have to know how to write SQL – they don’t even need to be interested in writing SQL.  They don’t need to understand schemas and relationship diagrams and all those things.  They might not even need all the dimensions and measures to start – just the important ones – we don’t want people to be overwhelmed – but we still want people to be able to search for things that they need.

“Multiple measures and dimensions in multiple data sources, where anyone can slice and dice, build dashboards, create all sorts of analytical content.

“Once we saw that Druid enabled us to craft queries programmatically and experienced its speed, we knew Apache Druid was the best database for analytics.”

It was reassuring to hear Sachin echo the message of our own “data temperature” talks – that you have to choose the right database for the right “temperature” of the user interface you’re trying to build.

“If they're using, let's say, five different metrics, and if they have to wait, let's say, five seconds, it's a waste of their time.  And when they have to do it every day, it becomes difficult to justify these latencies and compensate through ease of use.  Remember we are here to reduce the load on data analysts, manually creating dashboards and writing code every day.

“That's a very tedious, cumbersome process!

“Every day, business users come in and say ‘I want this’, ‘I want that’ – and when the analyst comes to the end of their day, they think, ‘Okay, so from dawn to dusk all I'm doing is building out dashboards.’

“We are here to stop that from happening.  And that means the database we use must have hyper-fast query response speeds in order to create that high quality experience.

“For our customers, analysts can focus on much higher-value stuff. Not dashboards.”

One magic ingredient is Cuebook’s expertise in loading, transforming, and pushing data into Druid in the right way for these super-cool hot use experiences.  It’s critical to success.  Throughout the community, data engineers know the importance of this task when working with Druid: getting the right data model and then maintaining the right data layout.

“For people coming from a relational database world, it's a big surprise when you say ‘oh, you cannot join two large data sources’.  But you see, you can't have the best of both worlds: a relational database that acts like Druid. You have to make choices. Your entire company or infrastructure cannot run on just a single database.  You must pick a database that suits the purpose you have in mind for it.

“After we had chosen Druid, we went on to use Apache Spark for transformation. Apache Spark reads data from the data sources and transforms it.  We use Apache Zeppelin notebooks to design the transformations, then the output from this that we load as Druid data sources.”

Cuebook have put a lot of effort into this part of their business – what you might have heard me and others describe as the process of “heating up” cold data and putting it into Druid.  And what’s seriously cool of Sachin and the team is to have open sourced the tooling that they use.

“It's called Cuelake: you can find it in GitHub.​​

“We want others to learn by looking at what we have built.”

When I meet people like Sachin, hearing this is what makes my heart sing!  The principles of community are alive and continue to grow.

“What we do is, essentially, we use parts of Apache Zeppelin and Apache Iceberg.  We built a data lake, we read data from relational databases, then move it to S3. Then do transformations over S3 using Spark, and then we push it to Druid.

There’s even Github integration for notebook change control, and Slack integration when pipelines fail.

“Cuelake frees us to build multiple pipelines, and to feed into multiple Druid clusters, with one cluster for each customer to provide maximum security.”

Jeremy Woelfel from Target said “if you think about it, everything in business is about time.”  And on that point, Sachin and I fully agree.  Time and again it’s put forward as a reason why Druid represents the perfect fit for end-user facing analytics.

“We started our search for an analytical database that was time-series.  The data that describes business is time series. When you store data in a time series format, analysis becomes easy.”

“Now, data scientists might need non-time series data.  But operational decisions – these happen on a daily or a weekly or monthly basis, and business people compare this week versus last week, this month versus last month, this year versus last year.  So why would you not just store data in a time-series format?

“Our next questions were: which is the fastest time-series database, which was easy to query flexibly from our user interface, which was easy to ingest - not just batch but streaming.  That’s why we selected Druid.

“More and more people should use Druid from an analytical standpoint because we believe business decisions are time-based and they need to be taken through exploration and conversation with data – Druid fits that bill.”

*  *  *

The community would love to hear your story!  Email community@imply.io to sign up for a 5-minute interview for your own Community Spotlight, and to discuss opportunities for blog posts and speaking slots, as well as to get the latest information about community activities across the world.  And we’re also here to help you get your name in lights on Apache Druid’s Powered By page.

Back to blog

How can we help?