Considerations for choosing the best real-time analytics database
Developers and architects must look beyond query performance to understand the operational realities of growing and managing a high performance database and if it will consume their valuable time.
Developers building a modern analytics application (for either internal or external customers) have two major open source choices for real-time analytics databases: Apache Druid and ClickHouse. Developers from Netflix, Twitter, Confluent, Salesforce, and many others choose Druid, and this article explains why.
Query Performance or Bust?
The primary consideration of any database is query performance, and rightly so. After all, this is the goal of any analytics system: interact with the data and get immediate answers. ClickHouse positions itself as “the fastest OLAP database on earth.” It is much easier for a marketer to say this than to prove it. ClickHouse tries to back up this audacious claim with various unaudited benchmarks of their own.
Benchmarks are interesting, but they should be recognized for what they are: a test of an entire database system under specific requirements and operating conditions. The reality is any database can be made to appear faster than another with the right benchmark. For example, Imply’s own benchmark shows Druid to be twice as fast as ClickHouse and cheaper too. This conclusion is based on the Star Schema Benchmark dataset, a data set optimized for benchmarking and unlike the typical workloads found on Druid, ClickHouse, or other similar databases.
While this is an interesting data point to a developer, it’s ultimately not enough to make a decision as benchmarks almost never align with real-world use cases. For queries of typical batch data, developers should expect great query performance from either Druid or ClickHouse. This does not mean, however, that the decision is a toss-up. Developers must also consider what it takes to live with these databases at scale and for years to come. The database should not consume valuable creative and development time, especially as an application grows.
Developers should not compromise between performance and ease of management. They should expect both, and only Druid delivers both. To understand why, it is necessary to review the basic design of each system.
How ClickHouse is Designed
ClickHouse derives its performance from shared-nothing architecture, a concept from the mid-1980s in which each node of a cluster has its own storage and compute resources, eliminating contention among nodes. For query processing, there is no data movement in this arrangement and queries are easily distributed in parallel across the nodes.
It is generally recognized that this architecture is a good choice for query processing. But it has a major drawback: the cluster is not elastic. This is because of the way data is distributed. A ClickHouse cluster, like any shared-nothing cluster, will distribute data in independent shards, each on its own node with its own resources (think of a shard as a fragment of data, like a shard of glass).
Now imagine a fourth node is added to the cluster to respond to increased demand from the application and its users. There would be no data on that node, making it useless. To take advantage of the new node, the cluster must be re-balanced. Unfortunately for ClickHouse, this is a manual and complicated process:
- Create a placeholder table on new nodes
- Manually copy data files to new nodes
- Attach data files to the placeholder tables
- Monitor for uneven storage utilization (“shard weight”) across the nodes and manually move data around to accommodate
It is not an exaggeration to say that, in a large cluster, this process can take days and involve many choices and workarounds, as ClickHouse customer Contentsquare discovered. The same issue arises if the cluster is too large (and expensive) for actual needs and nodes need to be removed.
It is useful to think of a shared-nothing cluster like ClickHouse as a “stateful” design. Each server has essential state and is tasked with the “care and feeding” of the shards on it. The loss of a node is an immediate and fatal problem. Because of this, a ClickHouse customer will usually replicate each node at least once. Of course, this quickly becomes expensive. If the replica also fails, the cluster is completely down.
There is no shortage of ways a node and its replicas can fail, both technical and human caused. Therefore, backups of the entire cluster must be done. This is also a manual process in ClickHouse and will be entirely arbitrary without the creation and enforcement of policies. Even still, data added or changed since the last backup will be lost, which is especially likely in a real-time (streaming) environment. This is why search giant Yandex, the creator of ClickHouse, notes in their own documentation that even with a backup, complete recovery cannot be guaranteed.
The Cloud and Response to Shared-Nothing Shortcomings
The inelastic and fragile design of shared-nothing systems like ClickHouse are clearly disadvantageous in rapidly changing and growing environments. For this reason, organizations that prioritized flexibility over query performance turned to cloud data warehouses like Snowflake.
Cloud data warehouses made an impact on the analytics world by separating the relatively expensive compute resources for query processing from the cheaper storage resources. Storage could then be shared among the nodes of a cluster or even across many clusters, each with their own use case.
This design allows administrators to add nodes to the cluster without worrying about re-balancing the data. Scale-out becomes simple. If a node fails, performance will suffer, but there is no worry about data loss as ultimately the data is in a “deep storage” layer that is highly durable.
The implementation of separate storage and compute layers does have a drawback: query performance. It introduces contention over storage, and time is lost either moving data up to the compute layer or continuously trying to optimize a cache to minimize this movement. This contention, combined with a lack of indexing, makes sub-second query response and high concurrency impossible.
This leaves customers who want both performance and flexibility looking for a solution that combines the query performance of a shared-nothing cluster with the flexibility and resilience of separate storage and compute.
How Druid is Designed
Apache Druid combines the best of both designs. This is due to a unique pre-fetch system and a unique segment-based, columnar storage engine that automatically indexes. Pre-fetch means that queries do not wait for data to travel from storage to the compute layer, and compute nodes do not waste cycles optimizing cache.
In contrast to ClickHouse, Druid can be thought of as a “stateless” design where nodes are fungible assets that are essentially interchangeable.
Nodes are added (or removed) and Druid automatically re-balances the cluster. No data copying is required nor is time spent trying to optimize “shard weights” for efficient query distribution. In contrast to the 1,300 word article by a ClickHouse customer referenced above, a Druid customer’s description of scaling-out would simply be, “We added a node.” Further, Druid will continually optimize and rebalance the cluster even if no nodes are added.
When a node fails on Druid, the cluster continues to operate and no data is lost. All data always exists in deep storage, a durable object store (Amazon S3, for instance) or a durable streaming queue. Even if all nodes fail, no data is lost, and the cluster can continue operations as soon as nodes are added back to the cluster.
At this point, the classic analogy of “pets vs. cattle” might come to mind for some. ClickHouse nodes are like pets–each of them individual and cared for separately. Druid nodes are more like cattle, managed as an interchangeable herd by a coordinator. Developers and architects should consider whether herding cats or cattle will be a better use of their time.
Beyond the essentials of flexibility and resilience, developers should consider other advantages of Druid’s unique design.
Druid’s Streaming Data Advantage
CickHouse lacks true streaming data ingestion. Even with a native connector to Kafka, the data must be loaded in batches (ClickHouse recommends 1,000 rows at a time). Not only does this limit performance, it means manual workarounds must be implemented to get “exactly once” ingestion (no duplicates).
Simply put, developers who need real-time data should be prepared to implement workarounds in ClickHouse as illustrated by ClickHouse customer StreamThoughts and others. By contrast, using streaming data in Druid is effortless.
Druid ingests event-by-event streams with “exactly once” semantics and supports millions of events per second. Further, events can be queried the moment they hit the cluster–no waiting for them to persist in storage. This effortless, high-performance connection from Kafka to Druid is one reason Confluent, the world’s foremost experts on Kafka, uses Druid for their own analytics applications (article, video).
“We deploy one of the largest Druid clusters in the world and performance has been great. Because of the native integration between Kafka and Druid, we don’t even need a connector. It just works out of the box.” — Harini Rajendran, Senior Software Engineer, Confluent
Druid’s High Cardinality Advantage
Sub-second query performance with high cardinality data requires optimized indexing. While ClickHouse can do secondary indexes (they call them “data skipping indexes”), it is a manual process to design, deploy, and maintain them. Druid automatically indexes every string column with an index appropriate to the data type. Since the indexes are stored with the data segments, they are very efficient.
Beyond indexing, querying for a small amount of data from a large data set needs special attention from ClickHouse. Recall that in a shared-nothing cluster, data is evenly distributed. If a table is distributed across three nodes, a query will need to contact all three nodes, even for a small amount of data. The same query in Druid will almost certainly contact only one or two nodes.
As a ClickHouse cluster grows, so does this problem of query amplification, meaning that small queries could impact the entire cluster. To mitigate this, ClickHouse recommends bi-level sharding (essentially a sub-cluster). Yandex, the creator of ClickHouse, uses this approach in their Metrica product. Druid needs no such workarounds since query nodes prune based on partitioned keys, and coordinators assign and manage tiers automatically.
Druid’s Tiering Advantage
As pointed out several times, shared-nothing architectures are highly stateful. This has a further unexpected impact as every shard of data must be treated equally. This means that an infrequent query on very old data is very expensive. There’s no ability to segment old data to slower but cheaper hardware.
Since Druid’s unique architecture uses smaller segments instead of shards, and more importantly has coordinating resources, developers can put data in tiers to match their importance or frequency of use, saving expensive resources for more important processes. This is effectively an automated sub-cluster, something that must be continuously managed manually in ClickHouse.
Further Investigation
All considered, Druid provides a host of advantages over ClickHouse that give the developer time to focus on creative tasks and development:
- Flexibility to grow without manual effort or downtime
- Durable, automatic deep storage inherent in the design
- True streaming data support that is effortless and guarantees exactly once ingestion
- Better handling of high cardinality queries
- Support for data tiering
There are other considerations and more to know. For further investigation, see the links listed below. Organizations wanting a personal discussion or demonstration can contact Imply. To try Druid for free, Imply provides Polaris, the database as a service built from Druid. It is set up in minutes and includes a complete development environment, from streaming data ingest with Confluent Cloud, to a slice-and-dice visualizer for users.
© 2022 Imply. All rights reserved. Imply and the Imply logo, are trademarks of Imply Data, Inc. in the U.S. and/or other countries. Apache Druid, Druid and the Druid logo are either registered trademarks or trademarks of the Apache Software Foundation in the USA and/or other countries. All other marks are the property of their respective owners.