When a Data Warehouse Can’t Keep it Real-Time

May 04, 2020
Rick Bilodeau

Every day we talk to companies about using Apache Druid for real-time analytics, and inevitably each will ask why they can’t get the job done using their data warehouse, whether it’s Teradata, Snowflake, Amazon RedShift, Google Big Query or something else.

This short post is my attempt to describe how Druid compares against enterprise data warehouses. If it’s not obvious by now, Druid is not a data warehouse and isn’t designed to replace every use case for which a data warehouse can be used.

Druid is a cloud-native real-time database that’s a great fit if you are powering a user-facing analytics application and low latency and high concurrency is important. Druid is really good at ingesting data extremely fast (millions of events per second) while simultaneously answering ad-hoc analytic queries with low latency, even when there are many concurrent users.

This kind of workload is not what data warehouses are designed for. Most data warehouses are built to answer large, complex SQL queries from professional analysts. These queries may take minutes to hours to complete, and that’s fine because they aren’t driven by a real-time requirement.

In contrast, Druid can complete most queries against very large data sets in under a second. The tradeoff between using Druid versus a data warehouse for a particular workload comes down to whether you need the full flexibility of a data warehouse to answer every arbitrary query an analyst can devise, or do you need a real-time responsive end user experience where users can creatively explore data through iterative ad-hoc queries and have sub-second results? For the latter case a data warehouse just isn’t fast enough.

Examples of use cases where ad hoc analysis is important usually are the operational side of analytics. They include quickly understanding anomalies and patterns in clickstreams for digital marketing, product interactions, and user behavior in online games, detecting and diagnosing network traffic issues, and others. This type of analysis has a different flavor than advanced analytics performed by the BI group or data science teams; the queries are simple and build on each other in an unplanned fashion as users explore the data, applying their creatively and business intuition.

Although Druid incorporates architectural concepts from data warehouses such as column-oriented storage, it also includes designs from search systems and time series databases, which makes it a great fit for analyzing various types of event-driven data.

In a nutshell, Druid’s architecture offers the following advantages over traditional data warehouses:

  • Low latency streaming ingest
  • Integration with Apache Kafka, Storm, Spark Streaming, Kinesis and other big data stream processors
  • Time-based partitioning, which enables performant time-based queries
  • Fast search and filter, for fast ad-hoc slice and dice
  • Minimal schema design and native support for semi-structured and nested data

You should consider using Druid to augment your data warehouse if your use case has one or more of the following requirements:

  • involves streaming data
  • requires low-latency ingest at scale
  • expects low-latency query response with high concurrency
  • needs ad-hoc analytics

Druid is great for OLAP-style slice and dice and drill downs in these situations.

To summarize, data warehouse technology is better for use cases where the end user is a technical analyst and query flexibility takes precedence over performance. Druid shines when the use cases involve real-time data and where the end-user (technical or not) wants to apply numerous simple queries through an application. In the latter cases, query response and data freshness take precedence over coding complex queries.

A great way to get hands-on with Druid is through a Free Imply Download or Imply Cloud Trial.

Other blogs you might find interesting

No records found...
Dec 19, 2025

The Most-Read Imply Blogs of 2025 (and what they signal for 2026)

Before we take on 2026, let’s rewind. 2025 was the year observability teams stopped asking, “How do we reduce data?” and started asking the real question: “How do we build an architecture that can keep...

Learn More
Dec 16, 2025

The Breaking Point for Observability Leaders

Observability is at a crossroads For years, observability has promised to give teams the visibility they need to keep digital services resilient. But as data volumes explode, many leaders are realizing the...

Learn More
Dec 15, 2025

How to Efficiently Scale Splunk with Imply Lumi

The Observability Warehouse that helps you keep more data, move faster, and spend less without changing how you work Observability Is Hitting Its Limits Splunk has long been the system of record for observability...

Learn More

Ready to decouple your observability stack?
No workflow changes. No migrations. More data, less spend.

Request a Demo