Oct 22, 2024

Optimizing Druid Configurations at Netflix through Parallel Testing and Metrics Analysis

As a data-driven company, Netflix continually seeks to enhance the performance and reliability of its data infrastructure. This talk will delve into our sophisticated approach to optimizing Apache Druid configurations through parallel runs and A/B testing methodologies. We will explore how Netflix tests various Druid setups by running them concurrently against dual systems, enabling a direct comparison of key performance metrics reported by different clusters. Attendees will gain insights into the following areas:

1. Cluster Management and Deployment: An overview of Netflix’s strategies for managing and deploying Druid clusters, emphasizing automation and scalability.
2. Centralized Logging and Metrics: Techniques for aggregating and analyzing logs and metrics to facilitate real-time monitoring and post-mortem analysis.
3. Cluster Architecture Patterns: Best practices and patterns employed by Netflix to architect Druid clusters for optimal performance and reliability.
4. Parallel Testing Framework: Detailed methodologies for executing parallel runs and conducting A/B testing to evaluate different Druid configurations, including the tools and frameworks used.

This session will provide practical knowledge and actionable insights, empowering attendees to apply similar strategies within their own organizations to optimize Druid deployments. Join us to learn how Netflix leverages advanced testing and analytical techniques to push the boundaries of what is possible with Apache Druid.

Speaker:
Ben Sykes, Software Engineer, Netflix

[Timestamp] Table of Contents:
[0:00] Introduction
[1:54] Cluster Architecture Pattern
[4:00] Cluster Management and Deployment
[15:48] Centralized Logging and Metrics
[17:15] Config Testing Framework
[22:15] Parallel Testing Framework

See similar videos

No records found...
Mar 05, 2026

Imply Lumi: What’s New, What’s Next — and How to Unlock More Observability Value Today

Observability teams must retain more data, investigate faster, and control costs without disrupting existing tools. This live Imply Lumi update shows new ingestion, retention, search, Splunk interoperability,...

Watch now
Jan 07, 2026

Strategies for Managing Your Splunk Spend at Scale in 2026

Learn how a decoupled architecture for Splunk—powered by Imply Lumi and Federated Search—helps you keep more data searchable, reduce costs, and scale efficiently without changing existing Splunk workflows.

Watch now
Nov 19, 2025

Observability at a Breaking Point: How Decoupling Unlocks Speed, Scale, & Savings

Learn how decoupled observability helps you do more with your Splunk data, reduce costs, and scale efficiently with Federated Search.

Watch now

Ready to decouple your observability stack?
No workflow changes. No migrations. More data, less spend.

Request a Demo