Atomic Replace in Polaris

Mar 23, 2022
Jad Naous

We recently launched Polaris, our offering of a fully managed database as a service built on top of Apache Druid. With Polaris, we wanted to make it easier for anyone to get started building modern analytics applications. More importantly, one of our core philosophies with Polaris is to make sure that every capability we make available in Polaris is rock solid and makes no surprises to users. To that end, even though Apache Druid provides the ability for users to replace data atomically, we didn’t offer it because we considered it “surprising” to users. Today we’re announcing that Imply has made it possible for both users of open-source Apache Druid as well as Polaris users to do atomic replacements of data intervals without worrying about surprising quirks.

Apache Druid’s replacement functionality offered users the ability to atomically replace data… with a twist. As readers familiar with Apache Druid may know, data in Druid is partitioned by time, and many of the data management operations that Druid offers work on a time-partition by time-partition basis. When asked to replace an interval of data, Druid will replace whole partitions within that interval with new data, but, and here’s the twist, only for partitions that actually have replacement data. Partitions within the replacement interval for which there’s no replacement data are not touched.

As an example, consider the below setup for an interval of data the user wants to replace, containing four time partitions (also known as time chunks by Druid developers). Now, say originally the data had [a0, b0, c0, d0] for each partition. The replacement data has [a1, <nothing>, b1, c1], meaning that, for the second partition, we didn’t have any data to replace the old data with. Users would generally expect that the resulting data available in Druid after a replacement would look like [a1, null, c1, d1]. Unfortunately, the replacement result is [a1, b0, c1, d1]; the data in the old partition continues to be available.

Partition time range[t0, t1)[t1, t2)[t2, t3)[t3, t4)
Existing dataa0b0c0d0
Replacement dataa1c1d1
Expected outputa1nullc1d1
Old behavior outputa1b0c1d1

There are many reasons for this behavior. Most revolve around the trade-offs that Druid makes to put more control into the hands of the experts so they can get peak performance at scale and simplifying data management on a time-partition basis. However, we wanted the expected behavior to be the default behavior to make Druid more accessible to newer users. This work required some architectural surgery to introduce the concept of “tombstones” into Druid. A more technical blog around tombstones will follow soon, but this new capability is exciting because it opens the door to many new data management capabilities. For example, it will help us make data management easier by reducing the strict dependence of data management operations on time partitioning.

This is just one of the many ways Imply is continuously working to make both open-source Druid and our Polaris offering easier to adopt and more approachable to users. On top of this improvement, we’ve also added the ability for Polaris users to upload and ingest CSV files, making it easier to load data from sources that do not export data in JSON. To learn more about Polaris, sign up for a free trial at https://imply.io/polaris-signup and get started building scalable modern analytics applications.

Other blogs you might find interesting

No records found...
Dec 19, 2025

The Most-Read Imply Blogs of 2025 (and what they signal for 2026)

Before we take on 2026, let’s rewind. 2025 was the year observability teams stopped asking, “How do we reduce data?” and started asking the real question: “How do we build an architecture that can keep...

Learn More
Dec 16, 2025

The Breaking Point for Observability Leaders

Observability is at a crossroads For years, observability has promised to give teams the visibility they need to keep digital services resilient. But as data volumes explode, many leaders are realizing the...

Learn More
Dec 15, 2025

How to Efficiently Scale Splunk with Imply Lumi

The Observability Warehouse that helps you keep more data, move faster, and spend less without changing how you work Observability Is Hitting Its Limits Splunk has long been the system of record for observability...

Learn More

Ready to decouple your observability stack?
No workflow changes. No migrations. More data, less spend.

Request a Demo