When a Data Warehouse Can’t Keep it Real-Time

May 04, 2020
Rick Bilodeau

Every day we talk to companies about using Apache Druid for real-time analytics, and inevitably each will ask why they can’t get the job done using their data warehouse, whether it’s Teradata, Snowflake, Amazon RedShift, Google Big Query or something else.

This short post is my attempt to describe how Druid compares against enterprise data warehouses. If it’s not obvious by now, Druid is not a data warehouse and isn’t designed to replace every use case for which a data warehouse can be used.

Druid is a cloud-native real-time database that’s a great fit if you are powering a user-facing analytics application and low latency and high concurrency is important. Druid is really good at ingesting data extremely fast (millions of events per second) while simultaneously answering ad-hoc analytic queries with low latency, even when there are many concurrent users.

This kind of workload is not what data warehouses are designed for. Most data warehouses are built to answer large, complex SQL queries from professional analysts. These queries may take minutes to hours to complete, and that’s fine because they aren’t driven by a real-time requirement.

In contrast, Druid can complete most queries against very large data sets in under a second. The tradeoff between using Druid versus a data warehouse for a particular workload comes down to whether you need the full flexibility of a data warehouse to answer every arbitrary query an analyst can devise, or do you need a real-time responsive end user experience where users can creatively explore data through iterative ad-hoc queries and have sub-second results? For the latter case a data warehouse just isn’t fast enough.

Examples of use cases where ad hoc analysis is important usually are the operational side of analytics. They include quickly understanding anomalies and patterns in clickstreams for digital marketing, product interactions, and user behavior in online games, detecting and diagnosing network traffic issues, and others. This type of analysis has a different flavor than advanced analytics performed by the BI group or data science teams; the queries are simple and build on each other in an unplanned fashion as users explore the data, applying their creatively and business intuition.

Although Druid incorporates architectural concepts from data warehouses such as column-oriented storage, it also includes designs from search systems and time series databases, which makes it a great fit for analyzing various types of event-driven data.

In a nutshell, Druid’s architecture offers the following advantages over traditional data warehouses:

  • Low latency streaming ingest
  • Integration with Apache Kafka, Storm, Spark Streaming, Kinesis and other big data stream processors
  • Time-based partitioning, which enables performant time-based queries
  • Fast search and filter, for fast ad-hoc slice and dice
  • Minimal schema design and native support for semi-structured and nested data

You should consider using Druid to augment your data warehouse if your use case has one or more of the following requirements:

  • involves streaming data
  • requires low-latency ingest at scale
  • expects low-latency query response with high concurrency
  • needs ad-hoc analytics

Druid is great for OLAP-style slice and dice and drill downs in these situations.

To summarize, data warehouse technology is better for use cases where the end user is a technical analyst and query flexibility takes precedence over performance. Druid shines when the use cases involve real-time data and where the end-user (technical or not) wants to apply numerous simple queries through an application. In the latter cases, query response and data freshness take precedence over coding complex queries.

A great way to get hands-on with Druid is through a Free Imply Download or Imply Cloud Trial.

Other blogs you might find interesting

No records found...
Apr 14, 2025

It’s Time to Rethink Observability: The Event-Driven Future

Observability has evolved. Forward-looking teams are already moving beyond static dashboards and fragmented telemetry—treating all observability data as events and unlocking real-time insights across their...

Learn More
Mar 31, 2025

5 Reasons to Use Imply Polaris over Apache Druid for Real-Time Analytics

Introduction Real-time analytics is a game-changer for businesses that need to make fast, data-driven decisions. Whether you’re analyzing user activity, monitoring applications and infrastructure, detecting...

Learn More
Feb 28, 2025

Introducing Apache Druid® 32.0

We are excited to announce the release of Apache Druid 32.0. This release contains over 341 commits from 52 contributors. It’s exciting to see a 30% increase in our contributors! Druid 32.0 is a significant...

Learn More

Let us help with your analytics apps

Request a Demo