Apache Druid

vs ClickHouse

Why are developers at Confluent, Twitter, Salesforce, Reddit and 1000+ organizations choosing Druid?

Druid is the database of choice for developers building analytics applications at 1000s of companies. It’s a no-compromise, high-performance, real-time analytics database to power analytics applications at any scale, for any number of users, and across streaming and batch data.

Druid has a more flexible architecture

Druid features a unique architecture with the best of both worlds: shared-nothing query performance combined with the flexibility of separate storage and compute. So you get the speed of co-locating storage with compute and the ease of scaling nodes without downtime.

Druid scales out much easier

With ClickHouse, scaling-out is a difficult, manual effort. If you have a lot of data, it could take days to add a node with downtime. With Druid, you simply add a node and Druid automatically balances the cluster.

Druid supports true stream ingestion

With native support for both Kafka and Kinesis, Druid ingests true event-by-event streams with exactly once semantics and streaming data can be queried the moment it arrives at the cluster, even millions of events per second. This effortless connection and performance is one reason why Confluent, the leading distributor of Kafka, uses Druid for their own analytics applications.

Druid is automatically protected against data loss

Druid will not lose data, even if multiple nodes or the entire cluster fails. This is because Druid has something ClickHouse does not: deep storage due to separation of storage and compute. Further, since Druid automatically tracks stream ingestion, autorecovery includes data in both table and stream, even for data arriving after the failure.

Druid automatically indexes data

Druid automatically indexes data optimally for each column’s data type. Indexes are stored alongside the data in segments (instead of shards). Automatic indexing combined with this segmented data architecture and independent data nodes means you never need a workaround or manual effort for any queries. 

Druid provides native tiering capabilities

Druid’s architecture is based on independent, scalable components for coordination, query, data, and deep storage. As a result, older data can be placed on slower (but cheaper) nodes, thus saving money while prioritizing queries for newer data on better resources.

Download the Whitepaper to Find Out More