Apr 26, 2023

Introducing KUDRAS – Kubernetes Druid Autoscaler for Maximum Resource Utilization and Speed

In this session I would like to talk about the huge amount of data we ingested into Druid (raw data is 9 Terra per day) by using EMR, all orchestrated by Airflow. While the data grew we started experiencing many problems. After trying many scaling options, we decided to change the approach and came up with KUDRAS, the Kubernetes Druid Autoscaler.

This project is written in Python and is being used in our Apache Druid production environment. KUDRAS is a service developed using fastAPI which scales middlemanager nodes up and down in the most effective way, minimizing ingestion task costs to the bare minimum while maximizing ingestion speed.

See similar videos

No records found...
Oct 22, 2024

Keynote: Powering Event-Driven Data with Apache Druid

The distinction between OLTP and OLAP is becoming less relevant as data architectures shift toward entities and events. In this session, we’ll delve into how Apache Druid’s event-first approach synthesizes...

Watch now
Oct 22, 2024

Closing Keynote: Charting the Future of Druid

What lies ahead for Apache Druid? Join us as we explore the evolving landscape of Druid’s query and storage engines, and how they are positioned to address the biggest challenges in event data for the future. Speaker: Gian...

Watch now
Oct 22, 2024

Salesforce: Tracing Service Dependencies at Scale with Druid and Flink

At Salesforce, we manage approximately 300 million distributed spans to infer service dependencies. We have successfully utilized a combination of Druid and Flink to handle this scale with high availability....

Watch now

Let us help with your analytics apps

Request a Demo