Because we care deeply about the health of the community while continuing to deliver the most interesting Apache Druid stories, we’re hosting the second edition of Virtual Druid Summit II on September 2, 2020. The first Virtual Druid Summit was a grand success thanks to all of our attendees and speakers and we hope the next installment will prove just as compelling!
Once again, we’ll be opening up the summit with a talk from Gian Merlino (Apache Druid PMC Chair) followed by lightning talks from esteemed practitioners at TrafficGuard, Target, Twitch, and Netflix. Each 30-minute talk will cover real-world applications of Druid, followed by 15 minutes of Q&A.
Virtual Druid Summit is free and the online format allows members of the Druid community both near and far to attend. Visit the Virtual Druid Summit II registration page to sign up and select your talks!
Virtual Druid Summit II September 2, 2020 8 AM - 12:45 PM Pacific Time
Maximizing Apache Druid Performance: Beyond the Basics Gian Merlino, Apache Druid PMC Chair 8:00am - 8:45am PT
Druid is a powerful real-time database, and part of that power is the level of control you get over cluster configuration, allowing you to get maximum performance for your specific data and query types.
In this talk, Gian Merlino, one of the orginal authors of Druid and CTO and co-founder of Imply, will walk you through some advanced techniques that can provide a multiplier to your Druid performance. Afterwards, he’ll take your questions about performance, or anything else Druid-related.
How TrafficGuard uses Druid to Fight Ad Fraud and Bots Raigon Jolly, Head of Analytics and Data Science, TrafficGuard 9:00am - 9:45am PT
In this session, TrafficGuard’s Head of Data Science, Raigon Jolly, will discuss how TrafficGuard uses Druid and its partnership with Imply to:
- Provide granular reporting to clients in near-real time
- Monitor rules and concept drift
- Staying ahead of the moving target that is ad fraud
- Facilitate performance tuning and right-sizing infrastructure so our team can focus on innovation of our core product
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the Capabilities of Druid Jeremy Woelfel, Principal Engineer, Target 10:00am - 10:45am PT
Target is one of the largest retailers in the United States, with brick-and-mortar stores in all 50 states and one of the most-visited ecommerce sites in the country. In addition to typical merchandising functions like assortment planning, pricing and inventory management, Target also operates a large supply chain, financial/banking operations and property management organizations. As a data-driven organization, we need a data analytics platform that can address the unique needs of each of these various business units, while scaling to hundreds of thousands of users and accommodating an ever-increasing amount of data.
In this talk we’ll cover why Target chose to create our own analytics platform and specifically how Druid makes this platform successful. We’ll cover how we utilize key features in Druid, such as union datasources, arbitrary granularities, real-time ingestion, complex aggregation expressions and lightning-fast query response to provide analytics to users at all levels of the organization. We’ll also cover how Druid’s speed and flexibility allow us to provide interactive analytics to front-line, edge-of-business consumers to address hundreds of unique use-cases across several business units.
Self-Service Analytics at Twitch Nicholas Ngorok, Engineering Manager - Data Infrastructure, Twitch 11:00am - 11:45am PT
As Twitch grew, both the amount of data we received and the number of employees interested in the data grew rapidly. In order to continue empowering decision making as we scaled, we turned to using Druid and Imply to provide self-service analytics to both our technical and non technical staff allowing them to drill into high level metrics in lieu of reading generated reports.
In this talk, learn how Twitch implemented a common analytics platform for the needs of many different teams supporting hundreds of users, thousands of queries, and ~5 billion events each day. This session will explain our Druid architecture in detail, including:
- The end-to-end architecture deployed on Amazon that includes Kinesis, RDS, S3, Druid, Pivot and Tableau
- How the data is brought together to deliver a unified view of live customer engagement and historical trends
- Operational best practices we learnt scaling Druid
- An example walk through using the platform
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experience Ben Sykes, Sr. Software Engineer, Netflix 12:00pm - 12:45pm PT
Ensuring a consistently great Netflix experience while continuously pushing innovative technology updates is no easy feat.
We’ll look at how Netflix turns log streams into real-time metrics to provide visibility into how devices are performing in the field (including sharing some of the lessons learned around optimizing Druid to handle our load).