Druid Summit Lakehouse Panel: A Deep Dive into Data Lakehouses and Apache Druid

At the inaugural in-person Druid Summit this past October, industry leaders gathered to explore the future of data, streaming analytics, and more. In these panels industry experts answered questions about streaming analytics, operations, optimization, real-time user interfaces, and data lakehouses; all in the context of Apache Druid.

I had the honor of moderating the Lakehouse panel, which is near and dear to my heart, after spending the better part of a year working with the original creators of Apache Iceberg at Tabular. This panel featured Atul Mohan, Gian Merlino, Will Xu, and Abishek Balaji Radhakrishnan. This panel was a perfect cap to two earlier sessions. Abishek and Venki Korukanti presented on ingestion of Delta Lake tables into Druid, and Atul presented on Apache Iceberg integration with Druid.

If you missed the session, the full recording is available on the Druid Summit site, but here are some key highlights worth noting:

Metadata and Unified Schemas

Atul kicked things off by discussing the use of a metadata platform like DataHub. This enables a unified schema catalog that spans Druid and data lake tables, making it easier for organizations to manage and query their data effectively.

Query Engine Enhancements

Gian and Will shared insights into the latest query engine enhancements for Druid. Druid is rapidly evolving to become an exceptional tool for querying not only Druid-native data but also data within lakehouse environments. It’s exciting to see how Druid is bridging the gap between traditional databases and modern data lakes.

The Philosophical and Technical Challenges of Deletes

The panel took an intriguing turn when the topic of deletes in Druid and lakehouse tables was brought up. This sparked a fascinating technical and philosophical discussion. Will Xu shared an intriguing technique used by Twitch to handle deletes in a way that ensures efficiency—though I won’t spoil the details here, you’ll definitely want to watch the video to get the full details.

Querying Lakehouse Tables Directly with Druid

A particularly exciting bit of news from the panel was the discussion around querying lakehouse tables directly with Druid, bypassing the need to pull data into Druid segment files. While this feature almost made it into Druid 31, there are still a few bugs to iron out. But don’t worry—this looks like it will be part of an upcoming release.

Roadmap Insights and Future Enhancements

The panel also offered a sneak peek into future enhancements to the Druid storage layer, along with other roadmap updates While the details were still under wraps, these insights gave the audience a sense of the exciting direction Druid is heading in. The knowledge shared by these experts was invaluable, sparking plenty of inspiration and discussion.

The Blurring of Lines Between Streaming and Batch

One question raised during the panel really got me thinking: Could Druid be the next great lakehouse? Traditionally, data lakes, warehouses, and lakehouses were designed for batch processing, while Druid has always excelled in the streaming space. But recent advancements in the data space are blurring the lines between batch and streaming workloads.

Apache Iceberg, a table format designed for cloud object stores, has significantly reduced the latency of querying data.
Stream-processing giant Confluent has adopted Iceberg and also acquired WarpStream, a Kafka engine that’s directly built on Amazon’s S3.
Users with batch-driven use cases are increasingly demanding faster query times, and Druid is rising to the occasion with powerful enhancements such as Dart.

It’s an exciting time to be working in the data space, and Druid’s evolving role in this ecosystem is only going to get more crucial.

A Must-Watch for Data Enthusiasts

I highly encourage you to watch the full recording of the Lakehouse Panel and all the other excellent sessions from from Druid Summit 2024. Whether you’re deep in the world of data lakes, looking to optimize your streaming analytics, or simply curious about the latest advancements in Druid, there’s something for everyone.

I’m already looking forward to the next Druid Summit—hope to see you there!

Other blogs you might find interesting

No records found...

Apr 14, 2025

It’s Time to Rethink Observability: The Event-Driven Future

Observability has evolved. Forward-looking teams are already moving beyond static dashboards and fragmented telemetry—treating all observability data as events and unlocking real-time insights across their...

Learn More

Mar 31, 2025

5 Reasons to Use Imply Polaris over Apache Druid for Real-Time Analytics

Introduction Real-time analytics is a game-changer for businesses that need to make fast, data-driven decisions. Whether you’re analyzing user activity, monitoring applications and infrastructure, detecting...

Learn More

Feb 28, 2025

Introducing Apache Druid® 32.0

We are excited to announce the release of Apache Druid 32.0. This release contains over 341 commits from 52 contributors. It’s exciting to see a 30% increase in our contributors! Druid 32.0 is a significant...

Learn More

By Functional Use

By Application

FEATURED

DRUID CASE STUDIES

Apache Druid

Content

Support

Other blogs you might find interesting

Let us help with your analytics apps