Druid Summit Lakehouse Panel: A Deep Dive into Data Lakehouses and Apache Druid
Jan 30, 2025
Matt Morrissey
At the inaugural in-person Druid Summit this past October, industry leaders gathered to explore the future of data, streaming analytics, and more. In these panels industry experts answered questions about streaming analytics, operations, optimization, real-time user interfaces, and data lakehouses; all in the context of Apache Druid.
If you missed the session, the full recording is available on the Druid Summit site, but here are some key highlights worth noting:
Metadata and Unified Schemas
Atul kicked things off by discussing the use of a metadata platform like DataHub. This enables a unified schema catalog that spans Druid and data lake tables, making it easier for organizations to manage and query their data effectively.
Query Engine Enhancements
Gian and Will shared insights into the latest query engine enhancements for Druid. Druid is rapidly evolving to become an exceptional tool for querying not only Druid-native data but also data within lakehouse environments. It’s exciting to see how Druid is bridging the gap between traditional databases and modern data lakes.
The Philosophical and Technical Challenges of Deletes
The panel took an intriguing turn when the topic of deletes in Druid and lakehouse tables was brought up. This sparked a fascinating technical and philosophical discussion. Will Xu shared an intriguing technique used by Twitch to handle deletes in a way that ensures efficiency—though I won’t spoil the details here, you’ll definitely want to watch the video to get the full details.
Querying Lakehouse Tables Directly with Druid
A particularly exciting bit of news from the panel was the discussion around querying lakehouse tables directly with Druid, bypassing the need to pull data into Druid segment files. While this feature almost made it into Druid 31, there are still a few bugs to iron out. But don’t worry—this looks like it will be part of an upcoming release.
Roadmap Insights and Future Enhancements
The panel also offered a sneak peek into future enhancements to the Druid storage layer, along with other roadmap updates While the details were still under wraps, these insights gave the audience a sense of the exciting direction Druid is heading in. The knowledge shared by these experts was invaluable, sparking plenty of inspiration and discussion.
The Blurring of Lines Between Streaming and Batch
One question raised during the panel really got me thinking: Could Druid be the next great lakehouse? Traditionally, data lakes, warehouses, and lakehouses were designed for batch processing, while Druid has always excelled in the streaming space. But recent advancements in the data space are blurring the lines between batch and streaming workloads.
Apache Iceberg, a table format designed for cloud object stores, has significantly reduced the latency of querying data.
Stream-processing giant Confluent has adopted Iceberg and also acquired WarpStream, a Kafka engine that’s directly built on Amazon’s S3.
Users with batch-driven use cases are increasingly demanding faster query times, and Druid is rising to the occasion with powerful enhancements such as Dart.
It’s an exciting time to be working in the data space, and Druid’s evolving role in this ecosystem is only going to get more crucial.
A Must-Watch for Data Enthusiasts
I highly encourage you to watch the full recording of the Lakehouse Panel and all the other excellent sessions from from Druid Summit 2024. Whether you’re deep in the world of data lakes, looking to optimize your streaming analytics, or simply curious about the latest advancements in Druid, there’s something for everyone.
I’m already looking forward to the next Druid Summit—hope to see you there!
Other blogs you might find interesting
No records found...
Jan 30, 2025
2024 Product Innovation Recap
We’ve made a lot of progress over the past decade. As we reflect upon the past year, we’re proud to share a summary of the top 2024 product updates across both Druid and Imply.
2024 was a banner year,...
Recap: Druid Summit 2024 – A Vibrant Community Shaping the Future of Data Analytics
In today’s fast-paced world, organizations rely on real-time analytics to make critical decisions. With millions of events streaming in per second, having an intuitive, high-speed data exploration tool to...
Pivot by Imply: A High-Speed Data Exploration UI for Druid
In today’s fast-paced world, organizations rely on real-time analytics to make critical decisions. With millions of events streaming in per second, having an intuitive, high-speed data exploration tool to...