Apache Druid or BigQuery

Google BigQuery, like other cloud data warehouses, can be a cost-effective solution for lightly-used business intelligence applications. Modern analytics applications, however, need the high concurrency, sub-second interactivity, and real-time data that is beyond the reach of BigQuery. For this, consider Apache Druid.

Request a Demo

CONSIDERATION 1:

High concurrency

BigQuery’s design is optimized for infrequent use. Benchmark study shows BigQuery can use 12X more infrastructure than Druid.

BigQuery

BigQuery’s value proposition is built on a pay-as-you-go model that saves money when your system is not in use. This makes it ideal for relatively infrequent queries with a small number of users, and is why the number of concurrent interactive queries is limited to 100. If each user is issuing more than one query, a common occurrence in dashboards, then the number of concurrent users is far less. If your plan and quota allow, BigQuery can be automatically scaled to accommodate a large number of concurrent users, but this will quickly become unsustainable economically as shown in this test.

Druid

Druid’s unique architecture handles high concurrency with ease, and it is not unusual for systems to support hundreds and even thousands of concurrent users. Druid is designed to accommodate high queries per second with an efficient use of infrastructure. With Druid, scaling out is the default, not a special feature of a more expensive enterprise offer and not limited in how far you can grow. Learn more here. 

CONSIDERATION 2:

Low Latency Response Times

When query response times are critical, Apache Druid delivers consistent sub-second responses at scale and under load.

BigQuery

Because BigQuery is designed to save money with pay-as-you-go licensing, they implement an architecture of separate storage and compute. While cost efficient for infrequent queries, this architecture incurs latency and network bottlenecks from moving data from deep storage. BigQuery implements a caching layer to compensate, but even with this, initial query times can take minutes. Repeated queries will run faster once cached, but this does not account for ad-hoc, interactive queries or systems where new data is added constantly. Plus, administrators must monitor the cache to keep result sets under a set maximum.BigQuery also does not implement secondary indexes for structured data like Druid. BigQuery does allow the creation of search indexes for unstructured text, but this must be done manually and counts against your processing quota, adding to costs.

Druid

Druid also implements a separate storage-compute architecture for flexibility and cost saving measures like data tiering. Crucially, however, for performance-sensitive queries, Druid pre-fetches data to the compute layer, which means that nearly every query will be sub-second, even as data volumes grow since queries never are waiting on a caching algorithm to catch up. With a very efficient storage design that integrates automatic indexing (including inverted indexes to reduce scans) with highly compressed, columnar data, this architecture provides leading price-performance. Learn more here.

CONSIDERATION 3:

Real-Time Streaming Data Ingestion

Big Query can connect to streaming data, but the implementation is not as elegant as Druid.

BigQuery

While BigQuery has connectors to streaming data (such as Kafka), significant coding effort is required (using the InsertAllRequest Java Class) in order to get one-at-a-time inserts. While this is better than competing products from Snowflake and Amazon that can do only batch inserts, the latency is still measured in minutes. Further, effort must be made by developers and admins to ensure streams do not exceed processing quotas and that duplicates are removed (exactly-once ingestion).

Druid

With native support for both Kafka and Kinesis, you do not need a connector to install and maintain in order to ingest real-time data. Druid can query streaming data the moment it arrives at the cluster, even millions of events per second. There’s no need to wait as it makes its way to storage. Further, because Druid ingests streaming data in an event-by-event manner, it automatically ensures exactly-once ingestion. Learn more here.

CONSIDERATION 4:

Deployment Options

If you need deployment flexibility, Druid provides a more open deployment model than BigQuery.

BigQuery

For many companies, a proprietary, fully-managed cloud is a good choice. But this can be problematic for some regulated industries that require more control.

Druid

Druid is open source, so you are not locked-in to a particular vendor or cloud. Imply offers flexible, enterprise cloud deployments for Druid, including a fully-managed DBaaS with Imply Polaris. Imply’s Enterprise Hybrid is co-managed by you and Imply on your cloud, with you in control. You determine when updates happen, giving you time to fully test your application. Additionally, Imply’s Enterprise solution is ready for organizations that still need to deploy and completely control their own systems.

Take a Closer Look

Examine the details of Druid’s price-performance advantage over BigQuery in this benchmark test.

Druid’s Architecture Advantage

With Druid, you get the performance advantage of a shared-nothing cluster, combined with the flexibility of separate compute and storage, thanks to our unique combination of pre-fetch, data segments, and multi-level indexing.

Developers love Druid because it gives their analytics applications the interactivity, concurrency, and resilience they need.


Leading companies leveraging Apache Druid and Imply

Reddit

“By using Apache Druid and Imply, we can ingest multiple events straight from Kafka and our data lake, ensuring advertisers have the information they need for successful campaigns in real-time.”

Cisco ThousandEyes

“To build our industry-leading solutions, we leverage the most advanced technologies, including Imply and Druid, which provides an interactive, highly scalable, and real-time analytics engine, helping us create differentiated offerings.”

GameAnalytics

“We wanted to build a customer-facing analytics application that combined the performance of pre-computed queries with the ability to issue arbitrary ad-hoc queries without restrictions.  We selected Imply and Druid as the engine for our analytics application, as they are built from the ground up for interactive analytics at scale.”

Sift

“Imply and Druid offer a unique set of benefits to Sift as the analytics engine behind Watchtower, our automated monitoring tool. Imply provides us with real-time data ingestion, the ability to aggregate data by a variety of dimensions from thousands of servers, and the capacity to query across a moving time window with on-demand analysis and visualization.”

Strivr

“We chose Imply and Druid as our analytics database due to its scalable and cost-effective analytics capabilities, as well as its flexibility to analyze data across multiple dimensions. It is key to powering the analytics engine behind our interactive, customer-facing dashboards surfacing insights derived over telemetry data from immersive experiences.”

Plaid

“Four things are crucial for observability analytics; interactive queries, scale, real-time ingest, and price/performance. That is why we chose Imply and Druid.”


© 2023 Imply. All rights reserved. Imply and the Imply logo, are trademarks of Imply Data, Inc. in the U.S. and/or other countries. Apache Druid, Druid and the Druid logo are either registered trademarks or trademarks of the Apache Software Foundation in the USA and/or other countries. All other marks and logos are the property of their respective owners.

Let us help with your analytics apps

Request a Demo