Apache Druid hits 10,000 Github star milestone

by Matt Sarrel · September 9, 2020

We’ve always believed that community growth and collaboration is critical to the success of Apache Druid. For this reason, we’re excited to announce that last week, the Druid Github repository passed 10,000 stars! In case you don’t know, stars are used on Github as a way to bookmark or show appreciation for a repository. While they aren’t the only measure of popularity, stars have been shown to correlate with developer activity and interest in software packages on GitHub. We initially developed Druid to help people solve difficult data problems at speed and scale, so we’re really excited to see so many people contribute to, extend and leverage the power of Druid.

Thank you, fellow stargazers, for joining us on this journey. Here’s a little treat, a fun gource video that shows the journey that Druid code has taken on Github. The root directory of the project is at the center of the animation, and directories appear as branches with files as leaves. Developers can be seen working on the tree at the times they contributed to the project. Can you spot your own name?

Druid has grown far beyond its initial roots in ad tech, expanding to address a broad array of real-time analytics use cases from network monitoring to clickstream analysis to fraud detection and supply chain analytics. Over 100 companies have added themselves to our powered by Apache Druid page, including global leaders like Airbnb, Alibaba, BT, Cisco, Netflix, NTT, Tencent and Walmart.

Druid’s real-time ingestion and split-second query capabilities offer a high-speed complement to data warehouses, and we’re continuing to enhance Druid’s query engine (particularly with query vectorization) to make sure that Druid continues to be the fastest analytics solution around.

The year 2020 has been challenging for everyone, so we’re happy that our community continues to thrive.

In March, we added a long-requested capability, support for SQL Joins, in the Druid 0.18 release. The addition of SQL Join functionality vastly extends the analytics use cases for which Druid is applicable. Druid can now run the complex SQL queries required for both dashboards and exploratory slice-and-dice analytics over data warehouse-type data, and run them extremely quickly and efficiently. Data warehouses frequently rely on a star schema of fact and dimension tables, and Druid leverages SQL joins to update dimensions as frequently as needed without the need for re-ingestion. We can’t wait to see the innovative applications our community creates around this functionality.

In April, our first Virtual Druid Summit featured a full day of talks by companies from a variety of industries who are using Druid to solve real life data analytics problems. The event was capped off with an Apache Druid Fireside Chat during which Imply founders and Druid authors/contributors Fangjin Yang, Gian Merlino, and Vadim Ogietvetsky fielded questions from around the globe.

Just last week we hosted our second Virtual Druid Summit, featuring Netflix, Twitch and other practicioners. You can find recordings of all of the Virtual Druid Summit talks from both events here. Together, they drew over 1,000 attendees,and there are more virtual events are on the way.

Druid gives users the ability to create interactive data applications that leverage the power of a real-time, scalable analytics-focused data store. Why don’t you JOIN us (get it)? There are many ways to participate in the Druid community. Come learn and contribute. Help us build the future with Druid.

Back to blog

How can we help?