2024 Product Innovation Recap

Jan 30, 2025
Larissa Klitzke

Apache Druid® is an open-source distributed database designed for real-time analytics at scale. Since our founding ten years ago, Imply has been committed to growing the Apache Druid project by making it as easy as possible for people to use Druid and build awesome data applications on top of it. 

In 2022, we launched Imply Polaris: a hassle-free, fully managed cloud environment that serves as the “easy button” for Apache Druid. This database-as-a-service brings you all the performance advantages of Druid, plus add-in auto-scaling capabilities for seamless data ingestion, and an easy interface to build analytics applications to visualize your data in real time. Within minutes, you can derive valuable insights from all your data without the complexities of procuring and managing infrastructure.

We’ve made a lot of progress over the past decade. As we reflect upon the past year, we’re proud to share a summary of the top 2024 product updates across both Druid and Imply. 

2024 in Numbers

2024 was a banner year, totaling over 1,348 commits by 106 contributors across Apache Druid 29, 30, and 31. On top of this, we released over 54 unique product updates, 28 other improvements, and 18 fixes to Imply Polaris

Here’s a snapshot of 2024 growth across data usage and community activity overall:

2024 Product Innovation Strategy

While there are many ways to categorize our progress across features and functionalities, there were four main strategic pillars that guided our product innovation in 2024:

(1) ECOSYSTEM: Power Partner Connections & Seamless Data Ingestion

In our co-founder Eric Tschetter’s Druid Summit keynote, he introduced a new perspective on data solutions and infrastructure. The future lies in thinking about data in terms of entities (“things” like videos on Netflix or communities on Reddit) and events (“actions” like views or plays). Druid is focused on an events-first mindset. At its core, Druid provides a high-performance, ultra-low latency query engine that ingests events from streams and batch storage, transforms them by aggregating events and reconstructing entities, and then queries those events and entities for real-time and historical analysis.

To build upon this vision, expanding connections to more partners and data types is just one piece of the puzzle (e.g. expanded support for data lakes and Azure in 2024). Adding a data type is one thing, but making it so the data can be queried efficiently is a game changer. Our goal is to make ingestion as easy as possible, minimizing the workarounds to prepare your data for transformation and analysis (e.g. async query, dimension tables). 

(2) ANALYTICS: Simplify Complex Queries with SQL Completeness

SQL is the standardized language most developers use when building applications, so we strive to improve ANSI SQL compatibility across all analytics powered by our query engine. 

In addition to making significant enhancements to our SQL planner engine in 2024 (e.g. new window functions, larger joins), we also explored ways to simplify the steps to do complex data transformations and analysis, from reducing subqueries to enabling easier visualization (e.g. Imply Pivot enhancements and time series analysis for Imply Polaris).

(3) PERFORMANCE: Push Scale & Operational Efficiency Boundaries

Druid is optimized for running on clusters with thousands of nodes without having to worry about complexity around data synchronization, metadata management and data movement. Our goal is to push scale and operational efficiency boundaries to new limits.

While the aforementioned SQL planner improvements impact analytical efficiency, there are three main areas that we explored in 2024 to improve query engine performance while compacting storage from a system automation perspective: a new query engine called Dart, continuous defragmentation, and enhanced auto-scaling for Imply Polaris. 

(4) PROTECTION: Enhance Resiliency, Security & Governance

The three strategic pillars above focus on the scope, functionality and efficiency of Druid’s query engine, with the majority of updates available in both Druid and Imply. That said, protecting the database itself is of equal importance. While OSS users have to manually implement Druid without the support of Imply’s managed service, they also have to construct entire systems to protect the resiliency, security and governance of the database.

Our fourth strategic pillar focuses exclusively on Imply Polaris, our fully managed database-as-a-service. With Polaris, we took advantage of all of the capabilities above, included all the Druid features, and added a whole layer of enterprise-grade security (private networking, role-based access control, audit logs, compliance), resiliency (fault tolerance, cluster self-healing), elasticity (scaling, no downtime, high availability) and operational efficiency (ZK monitoring UI, data deletion controls, no downtime or patching), deployed in multiple clouds and regions globally.

There are many technical details to explore within these updates, so we encourage you to dive deeper in our 2024 Product Updates: Technical Summary. This expansive Developer Center article fully defines the purpose and value of the top 2024 product updates shown in the image above, plus a few extra updates that didn’t make the top of the list. 

Learn More: Full Update Summary

Conclusion: 2024 Summary & 2025 Vision Preview

Throughout 2024, we extended our commitment to provide better experiences for Imply and Druid users alike by supporting additional data types, making it easier and more cost effective to conduct complex analysis, improving performance for operational efficiency at scale, and enhancing data protection for better security, resilience, and governance. 

While most of these updates are available in Druid OSS, Imply customers benefit from earlier access to updates, 24/7 professional support, enhanced auto-scaling, more robust security and resilience, and many more features summarized here. Many of our customers have found that the cost of Imply pays for itself, reducing the total cost of ownership (TCO) by 50% or more through time savings and reduced data storage.

So what’s next for 2025? As we continue to innovate across ecosystem, analytics, performance, and data protection functionalities, there are three standout projects to look out for in the coming year:

  • Dart engine (GA), the new high-parallelism engine complementing Druid’s native engine, will continue to be tested and refined. As initial tests show a potential up to 2000% performance improvement for demanding queries, we’re excited to see how Dart’s continued development will make waves across the industry.
  • A Virtual Storage Fabric will enable segment loading on-demand to unify storage across datalake, cloud, local, and in-memory storage. By decoupling the compute and storage layer, this allows independent scaling to simplify data management and enable smoother, more efficient data querying directly within Druid.
  • Projections embed materialized views directly within Druid’s storage layer, allowing for up to 10x faster query speeds on pre-aggregating data without affecting existing workflows.

Whether you’re using Druid, Imply, or other analytics tools today, we hope you’ve found some inspiration from what we’ve been working on to reshape the real-time data space. 

Additional Resources

 

Other blogs you might find interesting

No records found...
Jan 30, 2025

Druid Summit Lakehouse Panel: A Deep Dive into Data Lakehouses and Apache Druid

At the inaugural in-person Druid Summit this past October, industry leaders gathered to explore the future of data, streaming analytics, and more. In these panels industry experts answered questions about streaming...

Learn More
Nov 14, 2024

Recap: Druid Summit 2024 – A Vibrant Community Shaping the Future of Data Analytics

In today’s fast-paced world, organizations rely on real-time analytics to make critical decisions. With millions of events streaming in per second, having an intuitive, high-speed data exploration tool to...

Learn More
Oct 29, 2024

Pivot by Imply: A High-Speed Data Exploration UI for Druid

In today’s fast-paced world, organizations rely on real-time analytics to make critical decisions. With millions of events streaming in per second, having an intuitive, high-speed data exploration tool to...

Learn More

Let us help with your analytics apps

Request a Demo