ACID and Apache Druid

What is ACID?

ACID refers to a set of properties necessary for a transaction system.

Atomicity – This refers to transactions completing fully or failing without any partial changes. If a bank account is debited then another account needs to be credited. Atomicity means that either both operations succeed or both fail. There should be no scenario in which one succeeds and the other fails.

Consistency – This means that the database should permit only those transactions that comply with the rules. For instance if the rule is that account balance should not be negative and a transaction tries to debit an amount greater than the available balance then it should be rejected. The database must be in a consistent state before and after the transaction.

Isolation – This refers to one transaction not affecting another. If multiple transactions occur concurrently then the end result would be as if they occurred in sequence one after another. If a bank account has a balance of 500 Dollars and two transactions execute concurrently – one debiting 100 Dollars and another crediting 300 Dollars then the result would be a balance of 700 dollars (as if the transactions occurred one after another).

Durability – This means that all committed transactions must be retained even if the system faces a hardware or software failure.

ACID and Druid

Druid is a database for building a modern Analytics application. Druid cannot be used as a system of record and does not support transactions. Given the focus on analytics,ACID guarantees are not fully relevant to Druid. However, it is interesting to consider some of the capabilities of Druid in this light

Atomicity – There are two scenarios in which everything is committed or nothing is committed

Batch ingestion – In batch ingestion, the ingestion succeeds and commits data to deep storage only if all rows in the ingestion succeed. Otherwise the entire ingestion fails and no rows are committed.
Real time ingestion – Real time ingestion proceeds from a checkpoint for one hour (configurable) and creates another checkpoint and commits data up to that check point. However if any of the rows fail then the ingestion fails in its entirety and begins from the previous checkpoint again.

The above provides sufficient Atomicity guarantee to ensure that Druid does not provide an incorrect picture and that users don’t have to track which data went through and which did not. Taking the example of debit and credit into bank accounts, these records will have same time stamp in the transaction system and when they are ingested into Druid then either both will be ingested or both will not be (with the caveat that both records are part of the same batch ingestion or the same kafka topic). This will ensure that the analytics results are correct.

Consistency – Druid does not allow a lot of rules on the data. There are no primary keys and no foreign keys. If the data types for columns are specified in the ingestion spec then any data that does not conform to the spec will be rejected. When an entity is updated Druid will hold multiple records for that entity as each update is an event in time. Druid provides LATEST sql command to get a consistent picture of the latest state of the entity.

Isolation – Isolation can be seen in Druid’s ability to ingest millions of rows per second while handling 100’s of queries per second at the same time – Historicals handle queries on the data that is already in the system and the indexers respond to queries using the data that is being indexed. Indexers publish the indexed data to deep storage as segments and the historicals download the segments from deep storage and the indexing task is complete only when a Historical has downloaded the segment. Any queries on the data being indexed is handled by the indexer until the Historical has downloaded the segment. This ensures that there are no read-write conflicts (consider the alternate approach of indexers writing the data directly to the historicals. This would have made it very difficult to avoid read-write conflicts).

Durability – The primary persistent store is the deep storage which is usually a blob storage. So any data that has been committed to deep storage is available even if the entire cluster goes down. A new cluster can be brought using the data in deep storage. Batch ingestion succeeds only if all the data in the batch is committed to deep storage and real time ingestion succeeds only when all data from the previous checkpoint is committed to deep storage.

Conclusion

Druid provides sufficient ACID guarantees from an analytics perspective. Full transactional ACID is not relevant to Druid. The guarantees Druid provides enable building a modern analytics application with large volume data, high throughput real time ingestion and high query concurrency

Links:

Apache Druid – druid.apache.org

Other blogs you might find interesting

No records found...

Jul 24, 2026

Why You Shouldn’t Have to Delete Your VPC Flow Logs

When a security incident happens, investigators almost always start with the same questions: Which systems communicated? Where did the traffic originate? What changed before the incident? Was data exfiltrated?...

Learn More

Jun 16, 2026

Splunk Smartstore vs Lumi Loglake: Two Very Different Ways to Search Logs in Object Storage

One copies data back before it can be searched. The other queries it where it lives. Lumi Loglake lets Splunk teams query logs directly in object storage, including AWS S3, Delta Lake, Apache Iceberg, using...

Learn More

Jun 11, 2026

Supercharging Schema-On-Read: Logs in Object Storage Don’t Need a Data Catalog

Machine data architectures are rapidly changing. As telemetry volumes continue to grow and as costs rise, organizations are increasingly moving logs and other machine data into object stores such as AWS S3....

Learn More

Log lake

Real Time Analytics Database

OBSERVABILITY CASE STUDIES

Content

Support

Apache Druid

Other blogs you might find interesting

Ready to decouple your observability stack?
No workflow changes. No migrations. More data, less spend.

Log lake

Real Time Analytics Database

OBSERVABILITY CASE STUDIES

Content

Support

Apache Druid

ACID and Apache Druid

What is ACID?

ACID and Druid

Conclusion

Other blogs you might find interesting

Ready to decouple your observability stack? No workflow changes. No migrations. More data, less spend.

Ready to decouple your observability stack?
No workflow changes. No migrations. More data, less spend.