We recently launched Polaris, our offering of a fully managed database as a service built on top of Apache Druid. With Polaris, we wanted to make it easier for anyone to get started building modern analytics applications. More importantly, one of our core philosophies with Polaris is to make sure that every capability we make available in Polaris is rock solid and makes no surprises to users. To that end, even though Apache Druid provides the ability for users to replace data atomically, we didn’t offer it because we considered it “surprising” to users. Today we’re announcing that Imply has made it possible for both users of open-source Apache Druid as well as Polaris users to do atomic replacements of data intervals without worrying about surprising quirks.
Apache Druid’s replacement functionality offered users the ability to atomically replace data… with a twist. As readers familiar with Apache Druid may know, data in Druid is partitioned by time, and many of the data management operations that Druid offers work on a time-partition by time-partition basis. When asked to replace an interval of data, Druid will replace whole partitions within that interval with new data, but, and here’s the twist, only for partitions that actually have replacement data. Partitions within the replacement interval for which there’s no replacement data are not touched.
As an example, consider the below setup for an interval of data the user wants to replace, containing four time partitions (also known as time chunks by Druid developers). Now, say originally the data had [a0, b0, c0, d0] for each partition. The replacement data has [a1, <nothing>, b1, c1], meaning that, for the second partition, we didn’t have any data to replace the old data with. Users would generally expect that the resulting data available in Druid after a replacement would look like [a1, null, c1, d1]. Unfortunately, the replacement result is [a1, b0, c1, d1]; the data in the old partition continues to be available.
Partition time range
[t0, t1)
[t1, t2)
[t2, t3)
[t3, t4)
Existing data
a0
b0
c0
d0
Replacement data
a1
c1
d1
Expected output
a1
null
c1
d1
Old behavior output
a1
b0
c1
d1
There are many reasons for this behavior. Most revolve around the trade-offs that Druid makes to put more control into the hands of the experts so they can get peak performance at scale and simplifying data management on a time-partition basis. However, we wanted the expected behavior to be the default behavior to make Druid more accessible to newer users. This work required some architectural surgery to introduce the concept of “tombstones” into Druid. A more technical blog around tombstones will follow soon, but this new capability is exciting because it opens the door to many new data management capabilities. For example, it will help us make data management easier by reducing the strict dependence of data management operations on time partitioning.
This is just one of the many ways Imply is continuously working to make both open-source Druid and our Polaris offering easier to adopt and more approachable to users. On top of this improvement, we’ve also added the ability for Polaris users to upload and ingest CSV files, making it easier to load data from sources that do not export data in JSON. To learn more about Polaris, sign up for a free trial at https://imply.io/polaris-signup and get started building scalable modern analytics applications.
Other blogs you might find interesting
No records found...
Mar 31, 2025
5 Reasons to Use Imply Polaris over Apache Druid for Real-Time Analytics
Introduction Real-time analytics is a game-changer for businesses that need to make fast, data-driven decisions. Whether you’re analyzing user activity, monitoring applications and infrastructure, detecting...
We are excited to announce the release of Apache Druid 32.0. This release contains over 341 commits from 52 contributors. It’s exciting to see a 30% increase in our contributors!
Druid 32.0 is a significant...
We’ve made a lot of progress over the past decade. As we reflect upon the past year, we’re proud to share a summary of the top 2024 product updates across both Druid and Imply.
2024 was a banner year,...