Dec 06, 2023

Joins in Druid? Yes, of course!

In this session you will learn about what it takes to process joins in distributed databases and then we’ll take a deep dive into how Apache Druid does joins in each of its processing engines. Druid’s native query engine is designed for fast queries and it can do joins, we’ll talk about query design to take advantage of how it works. You will also learn its limits and when a use case is better resolved by the Multi-Stage Query engine (MSQ). We’ll review the current state of MSQ-based joins and when to use broadcast vs sortMerge join algorithms based on how each works and how they use resources.

See similar videos

No records found...
Jan 29, 2024

Physical Hardware, Digital Analytics: IoT Challenges, Best Practices, and Solutions

Electric vehicle maker Rivian and German startup Thing-it were kind enough to talk us through how real-time data and analytics play a key part in the evolving landscape of IoT (Internet of Things). The wealth...

Watch now
Dec 11, 2023

Analyzing streaming data with Apache Druid

Streaming data is not only data in motion—it’s a potential source of valuable insights, ready to be harvested and utilized. The challenge is to analyze streaming data at scale and extract these insights—before...

Watch now
Dec 06, 2023

Real-Time Analytics in the Real World

Engineering teams increasingly have to deliver insights in real-time. But as they aim to reduce latency from event-to-insight, they also face the challenge of dealing with larger and more complex data and concurrent...

Watch now
Dec 06, 2023

Wave Money Transaction Analytics Journey

How Wave Money transformed its transaction history and analytics framework to support its millions of user needs with an event driven architecture.

Watch now
Dec 06, 2023

Strivr: Using inline datasources with Druid queries

At Strivr, we have Business Intelligence information which we want to append to analytics in Druid. With the help of inline datasources we are enriching our analytics with these BI tags in real time. In this...

Watch now
Dec 06, 2023

Streaming Ingestion – A Look Under The Hood

I've always thought that an intricate understanding of how something works helps you use it more effectively. In this session, I'll try to convey what I've learned from some of Apache Druid's creators about...

Watch now
Dec 06, 2023

Splunk’s Journey to Imply: Data Compaction At Scale

The audience will get to learn about one of the largest migrations that Splunk's observability group did by moving their 27 OSS Druid clusters to Imply's Enterprise Hybrid deployment. The session will dive...

Watch now
Dec 06, 2023

Revenue at scale: billing for millions of events a second

Learn how Orb uses Apache Druid to keep up with the world's fastest growing infrastructure and AI companies. In this session, we'll explore how to architect Druid to ensure correctness and maintain customer...

Watch now
Dec 06, 2023

Real-time telemetry at IoT scale

Rivian makes adventurous electric vehicles with a mission of having a sustainable planet and keeping the world adventurous forever. Rivian's vehicles are born in the cloud and embody the tenets of a software...

Watch now
Dec 06, 2023

Learn Druid with learn-druid

In this session you will learn about what it takes to process joins in distributed databases and then we'll take a deep dive into how Apache Druid does joins in each of its processing engines. Druid's native...

Watch now
Dec 06, 2023

Enhancing Druid’s Analytics with Apache Arrow and Flight SQL

Currently Druid has two result formats: a JSON format and a protobuf-based format using Apache Calcite Avatica. Both of these formats are row-oriented while all of Druid's internal data representations are...

Watch now
Dec 06, 2023

Druid Operator: Bridging Kubernetes and Apache Druid

Apache Druid is a real-time distributed data store designed for low-latency queries. It can ingest data in real-time and make it available for querying as soon as an event occurs. The standard service level...

Watch now
Dec 06, 2023

Druid + Kubernetes: Cheaper and more responsive auto-scaling ingestion

Imply Polaris offers Druid-as-a-Service to users with the ability to pay for ingestion only when it is used. To achieve this and keep costs down, Druid needs to scale its ingestion resources in response to...

Watch now
Dec 06, 2023

Demystifying Druid Myths

With any open source technology that’s been around for a while, misconceptions and myths about what tech can or can’t do inevitably pop up. Sometimes there’s some truth but often, it’s either an outdated...

Watch now
Dec 06, 2023

Druid 28 and Beyond

Gian Merlino, Druid Committer and chair of the Apache Druid PMC, presents a tour of what’s new in recent releases, including Druid 28.0, and where Druid is heading in the future.

Watch now
Dec 06, 2023

SpectatorHistogram: Efficient Percentile Approximations

Building on the content of our session from an earlier Druid Summit where we compared different solutions for computing percentiles, we'll do a deeper dive into how to use the SpectatorHistogram extension....

Watch now
Dec 06, 2023

Query From Deep Storage

Druid can run queries on data in deep storage directly without loading them on historicals. The talk will deep dive into the changes that were required on the druid side to make that happen and also a bit about...

Watch now
Dec 06, 2023

Moving ingestion from 3 hours to 5 minutes – Challenges and Mitigations

This is a real world account from a Druid cluster in production. A story of 48 hours of debugging, learning and understanding Druid better, filing a couple of issues in Druid github and finally a stable production...

Watch now
Dec 06, 2023

Load, Load, Don’t Drop, Drop & then Kill

As a follow on to last year's Load, Drop & Kill session, we'll do a recap on how retention rules are defined and how they are used by the Coordinator to oversee segment cacheing in a multi-tiered Apache Druid...

Watch now
Dec 06, 2023

From Reaction to Action: Atlassian’s Proactive Scaling Journey

Scaling should never be a surprise. Atlassian’s proactive approach to scalability is a testament to our commitment to performance and reliability. In this talk, we’ll unveil the blueprint of our new load...

Watch now
Dec 06, 2023

Druid on K8S at NAVER

In this presentation, we will explain how to easily set up a Druid cluster in a Kubernetes environment. You can install all the necessary components for a Druid cluster with just one Helm chart. By using the...

Watch now
Dec 06, 2023

A Truly Technical Introduction to Apache Druid

Are you brand new to Apache Druid? Want to know about the nuts and bolts? In this talk, Peter Marshall goes into the deep detail of how Druid's architecture works to optimize data and carry out massively-parallelised...

Watch now
Dec 06, 2023

When I Decide To Use Druid Instead Of A Data Warehouse

Speaker: Ben Rogojan, Data, Automation and Analytics Consultant, Seattle Data Guy

Watch now
Oct 25, 2023

Data Products at Scale: How Atlassian’s Big Data Platform Team Delivers Insights and More with Apache Druid

Analytics are not used solely to gain insights into your customers – they also provide insights for your customers, often in real time. Watch this video to learn about customer-facing analytics applications...

Watch now
Jul 11, 2023

AdTech 실시간 광고 분석의 Apache Druid 활용 사례

이 비디오에서는 Apache Druid / Imply를 활용하여 실시간 광고 성과 분석 플랫폼을 구축한 사례, 많은 AdTech 기업들이 Druid / Imply를 선택한 이유, 그리고 실시간 광고...

Watch now
Jun 28, 2023

druid.NEXT

Watch the recording from our druid.NEXT event to see what’s new in Apache Druid 26.0 with technical demos from Apache Druid Committee (PMC) Chair and Imply CTO Gian Merlino along with other Druid committers...

Watch now
Apr 27, 2023

Build Real-Time Analytics for your Kafka Data

Watch this recorded webinar to learn about building real-time analytics for streaming data. Darin Briskman, Imply’s Director of Technology and all around tinkerer, will demonstrate how to use Apache Kafka...

Watch now
Apr 27, 2023

Apache Druid Explained | Core Concepts

This video provides a technical overview of Apache Druid, a popular real-time analytics database used for sub-second queries on streaming and batch data at scale and under load. The intent of this video is...

Watch now
Apr 26, 2023

Apache Druid in 5 minutes

Apache Druid is a real-time analytics database used by 1000s of companies like Netflix, Confluent, Salesforce, and Target. But what's the big deal? Why use Druid instead of a data warehouse - like Snowflake,...

Watch now
Apr 26, 2023

Embedding Polaris Visualizations Demo

Set up an embedded visualization solution in under 5 minutes. This short demo highlights the steps needed to build a highly customized application.

Watch now
Apr 26, 2023

Speeding Up Clickstream Data Analytics at Danggeun Market Using Druid

Danggeun market is an online marketplace in Korea focussed on used goods sales in the neighbourhood. This session will detail out how Daangn uses druid to speed up clickstream data analytics. Realtime data...

Watch now
Apr 26, 2023

Easy onboard with Polaris

Case study to share my experience how easy is to setup a working Analytics DB by using Imply Polaris. Leverage the batch ingestion APIs using python, using Apache airflow scheduler. How we leveraging Theta...

Watch now
Apr 26, 2023

Percentiles at Scale

At Netflix, our engineers often need to see metrics as distributions to get a full picture of how our users are experiencing playback.

Watch now
Apr 26, 2023

Powering Clickstream Analytics at TrueCar with Druid

Come learn about the way that TrueCar uses Druid as the engine to optimize the buying experience through clickstream analytics.

Watch now
Apr 26, 2023

Keynote: Analytics Meets Applications

How do developers at Netflix, Confluent, and Reddit do more with analytics? They joined a community of like-minded people who bring analytics to applications to power a new generation of operational workflows....

Watch now
Apr 26, 2023

Druid SQL for Queries – from an Overview to Tips and Tricks

SQL is a powerful and elegant query language for databases. Druid is increasingly supporting and moving forward with SQL, which is arguably easier to learn and write than the original native (JSON) query language....

Watch now
Apr 26, 2023

How VMware uses Druid for Network & Security Analytics

VMware NSX Intelligence is a distributed analytics platform powered by Druid that can process and visualize network traffic flows, detect anomalous network behaviors, and recommend security policies. We will...

Watch now
Apr 26, 2023

Kafka Ingestion logs

Apache Druid can handle streaming ingestion jobs at any scale. This occasionally poses challenges. Apache Kafka is a popular distributed event streaming platform, so it’s a natural choice for this presentation....

Watch now
Apr 26, 2023

Generating Time-centric Data for Druid

Sometimes we want to understand how Druid will perform using various types of data, but the actual data may not be available. The developer relations team at Imply has created a data generator program that...

Watch now
Apr 26, 2023

Leverage Autonomous Intelligence Driven Decisions Without Using Dashboards

Customer Intelligence facilitates informed decisions to increase business efficiency and success. Out of The Blue™ (OTB) provides SaaS solutions that autonomously and continuously monitor KPIs to create actionable...

Watch now
Apr 26, 2023

Apache Druid 24.0 New Feature! – Nested JSON

Apache Druid 24.0 has some amazing new capabilities. One of the most exciting ones is the ability to load complex nested JSON structures automatically. In this session we review the motivation for this feature;...

Watch now
Apr 26, 2023

User Activity Analytics in Digital Publishing Domain: Success Story by Indian Express

Indian Express is one of the leading media publishing company based in Mumbai is using Imply for user activity analysis. In this talk, we share how Indian Express migrated their use case from Google Analytics...

Watch now
Apr 26, 2023

The present and future of compaction in Druid

Compaction in the Druid database has been evolving into use cases beyond improving query performance and reducing storage. This talk will first go over what is compaction, current use cases for compaction,...

Watch now
Apr 26, 2023

Moving Away from Multi-level Aggregates: Efficiently Scaling Data with Druid

What if I told you moving away from multi-level aggregates could actually speed up your reporting system? In this session, we look at how Druid can be used to retrieve data for reporting quickly and efficiently....

Watch now
Apr 26, 2023

Analyzing Ads Across Medialab’s Global Brands with Druid

In this talk, we will learn more about Medialab's analytics strategy to optimize ad campaigns across several global consumer internet brands using Druid. Juan will highlight the company's journey from Open...

Watch now
Apr 26, 2023

Load, Drop & Kill – The Life of a Segment

In Apache Druid, after ingesting some data and publishing a segment to Deep Storage, it is the Load, Drop rule a Kill process behavior that govern the caching of segments on historicals across multiple tiers....

Watch now
Apr 26, 2023

Why & How We Built An Open-Source Spark Druid Connector – Spark Druid Segment Reader

Spark Druid Segment Reader is a Spark connector that extends Spark DataFrame reader and supports directly reading Druid data from PySpark. By using the DataSource option you can easily define the input path...

Watch now
Apr 26, 2023

Pushing the Limits of Druid Real-time with Data Sharding

Most Druid applications aim for sub-second analytics queries - but this becomes difficult when dealing with large volumes of real-time data.

Watch now
Apr 26, 2023

Real-Time Upserts of Data in Druid

Druid has enabled ThousandEyes to reduce dashboard latency by 10x and provide our customers with a real-time, interactive view of their terabytes of data. In this talk we will learn how the Data Visualization...

Watch now
Apr 26, 2023

Realtime Product Observability with Druid

Statsig enables companies to run experiments so that they can understand the impact of the changes they are making to their product, and do in depth analysis so that they can understand how people use their...

Watch now
Apr 26, 2023

Building Data Pipelines using Druid at Lyft

In this talk, we'll learn more about how Lyft builds data pipelines using Apache Druid, which is useful for several use cases including metrics tracking, model forecasting, and internal tools. We'll also talk...

Watch now
Apr 26, 2023

Principles of Data Modeling in Apache Druid

Apache Druid is often used to power applications that are fun, fluid, and fast. It’s designed for the last mile, sitting behind a UI where customers, suppliers, and employees are waiting for fast responses...

Watch now
Apr 26, 2023

Using Druid at High Throughput for Fast Queries at Low Cost

ironSource exposed a Druid cluster for its external users, called Real Time Pivot, which runs at a scale of 2-3M events/sec, with tens of TBs added per day, while serving parallel queries within 1-2 seconds....

Watch now
Apr 26, 2023

Speeding up ML Model Training Using Druid

Machine learning models require thousands of data points to train models. This session will look at a number of mechanisms druid offers to speed up model training. The tuple sketch can be used to train models...

Watch now
Apr 26, 2023

Keynote: Real time Analytics in Modern SaaS Applications

Real-time analytics has gone from cutting-edge to competitive advantage and is now becoming table stakes. In 8 years and 3 companies of working in the space, Gwen talked to hundreds of companies building real-time...

Watch now
Apr 26, 2023

Production ML Model Quality Monitoring Using Druid

Machine learning and AI have progressed out of the initial technology development phase and into full production. In this new domain, the concept of monitoring has been redefined. Monitoring now fully encompasses...

Watch now
Apr 26, 2023

Apache Druid 24.0 New Feature! – SQL Based Ingestion

Apache Druid 24.0 has some amazing new capabilities. The Multi-stage Query Architecture (MSQA) that is now a part of the database will enable significant new functionality. In this initial release it enables...

Watch now
Apr 26, 2023

Introducing KUDRAS – Kubernetes Druid Autoscaler for Maximum Resource Utilization and Speed

In this session I would like to talk about the huge amount of data we ingested into Druid (raw data is 9 Terra per day) by using EMR, all orchestrated by Airflow. While the data grew we started experiencing...

Watch now
Apr 26, 2023

Atlassian: Delivering insights to Confluence Users at Enterprise Scale

Druid is a powerful tool that has paved the way for a whole new wave of data-focused tech. In the Confluence Analytics team, we ingest millions of events every day and use this tool to power a plethora of features...

Watch now
Apr 26, 2023

New Druid Challenges and Solutions in Pinterest

Pinterest has been using Druid with large ingestion scales and low query latency for a few years. In this talk, we will share recent optimizations we made on querying skewed data, how we deal with large scale...

Watch now
Apr 26, 2023

Low Latency Real Time Ads Pacing Queries

Advertising backend systems need low latency for the efficient delivery and availability of budget data to the ad serving infrastructure. For the reddit team, we helped automate user behavior analysis to serve...

Watch now
Apr 26, 2023

Keynote: Building Next-Gen Observability at Confluent

In this Q&A, Matt Armstrong, Head of Engineering - Observability & Data Platform at Confluent will discuss the use cases and architecture for Confluent's observability platform. Confluent leverages both Druid...

Watch now
Apr 24, 2023

How is Druid so Fast?

A look at how Apache Druid is able to achieve high concurrency subsecond queries at scale.

Watch now
Apr 24, 2023

Apache Druid’s Fit in the Modern Data Stack

Apache Druid is a real-time analytics database for analytics applications. It’s used by 1000s of companies like Netflix, Confluent, Target, and Salesforce. In a crowded database market, it can be hard to...

Watch now
Apr 24, 2023

Handling row level updates in druid using a kafka lookup

This video shows how to configure and populate a kafka lookup and then use it in a query to handle row level updates in druid

Watch now
May 25, 2022

Modern Analytics Applications in the Financial Industry

Join Eric Tschetter, Field CTO of Imply and co-creator of Apache Druid, Ravi Maurya and Shubham Gupta from Paytm in this inspiring virtual event

Watch now
May 10, 2022

The New Database Battle: Apache Druid vs. ClickHouse

They both say they’re fast; they both say they scale. Should you flip a coin or are there real technical differences that matter? Watch this webinar and get the facts that lay out the differences.

Watch now
Mar 16, 2022

How-to Half-hour: Using Multi-value Dimensions in Apache Druid

If you’ve tried, you probably did tons of JOIN operations to build an array. Did you know that with multi-value dimensions you can filter and group complex data quickly and easily?

Watch now
Feb 17, 2022

Keynote – Building Modern Analytics Applications with Apache Druid

Fangjin Yang, Co-Founder and Chief Executive Officer of Imply, presents the opening keynote at the Druid Summit 2021 virtual conference.

Watch now
Dec 10, 2021

Technical deep dive into how Outbrain scales its real-time analytics

Because Outbrain processes billions of impressions and events a day, they risk running into scaling problems.

Watch now
Dec 03, 2021

Imply x Kafka: Capture, interact and scale streaming data

Imply and Kafka is the perfect architecture to capture and surface streaming data through interactive queries and unlimited scale

Watch now
Jul 22, 2021

Confluent Cloud 연동하여 Imply로 실시간 데이터 분석 및 시각화

라이브 데모: 대화형 쿼리 및 제한없는 확장을 통해 스트리밍 데이터를 캡쳐하고 보여주는 아키텍쳐를 구축하는 방법

Watch now
Jun 09, 2021

Making Real-Time Data a Reality for your Business

Technaura takes you through an hour of practical Real-time, data-driven outcomes.

Watch now
Jun 03, 2021

Apache Druid Engine Roadmap

如果你还在每天等待数据刷新,是时候试试Apache Druid 亚秒级数据库了。 该开源数据库在Twitter、Pinterest和Snapchat广泛部署。此次研讨会中, Imply的产品经理Will...

Watch now
May 10, 2021

Comparing drill-down workflows between Tableau & Snowflake with Imply

In this video, Jad Naous demonstrates drill-down workflows between Tableau & Snowflake with Imply.

Watch now
May 07, 2021

How Spideo turbo-charged data analytics by using Imply

In this video, Spideo, a humanized recommendation provider, shares with us its data analytics journey.

Watch now
Apr 30, 2021

Pivot 2.0 – The Next Gen Visualization Tool

In this webinar, will walk you through the exciting new features that are coming soon to Pivot.

Watch now
Apr 22, 2021

Introducing… Druid’s Components

In this talk, we look at the three families of components that create “Apache Druid.”

Watch now
Apr 21, 2021

积跬步,至千里——详解运用流数据的商业价值

商业用户越来越意识到企业活动产生的数据能够及时帮助他们洞见日常运营对自身现有商业模式的影响以及是否有利于企业的长远决策。

Watch now
Apr 14, 2021

Analyser Des Flux De Donnees Massifs Avec Imply

NOVAGEN présentera un cas avec IMPLY, inspiré de ses projets menés dans le secteur de la grande distribution, qui a su répondre à ces nouveaux enjeux :

Watch now
Apr 09, 2021

Using Imply to prevent fraud

Learn a new methodology for anomaly detection and analysis that can be applied to everything from fraud detection to factory accident prevention.

Watch now
Mar 02, 2021

Inside Apache Druid’s storage and query engine

We’ll cover how Druid stores data, what kinds of compression it uses, how it indexes data, how the storage engine is linked with the query processing engine.

Watch now
Mar 01, 2021

First look: Imply CrossTab (beta)

In this short video, Imply Chief Product Officer Vadim Ogievetsky demonstrates a cool new beta feature, Imply CrossTab.

Watch now
Feb 24, 2021

Imply CrossTab: PivotTables to infinity and beyond

In this webinar we will introduce Imply CrossTab, a new visualization that makes the pivot table user experience feel instantaneous.

Watch now
Feb 22, 2021

Elasticsearch: pros and cons for real-time analytics

In this video, solution architect Itai Yaffe shares what Elasticsearch provides for analytic use cases compared to Imply.

Watch now
Feb 22, 2021

Where do Imply and Elasticsearch fit in my big data stack?

Itai Yaffe shares how Imply and Elasticsearch fit in a modern architecture for big data.

Watch now
Feb 22, 2021

How Imply fulfills the requirements of real-time analytics

Data wizard Itai Yaffe explains how Imply's platform fills the requirements of general-purpose operational analytics.

Watch now
Feb 17, 2021

Near term Apache Druid roadmap for 2021

Gian Merlino, Apache Druid PMC Chair, lays out the near team Druid roadmap.

Watch now
Feb 17, 2021

What’s new in Apache Druid?

Looking for a recap of the key improvements to Apache Druid in its most most recent releases? Here’s the video for you.

Watch now
Jan 29, 2021

5 minute explanation of Imply’s advantages for self service analytics

We describe the Imply full-stack, multi-cloud data platform, and Itai breaks down the unique combination of features Imply brings to self-service analytics.

Watch now
Jan 28, 2021

The Superstars of Apache Druid meetup

The Superstars of Apache Druid meetup was recorded live on 1-21-2021. Talks from Dan Prince from Target, Gian Merlino, and Vadim Ogievetsky

Watch now
Jan 21, 2021

Comparing Elasticsearch and Imply for operational analytics

If you’re considering using it for interactive analytics, be advised that it’s quite common to see companies struggling with Elastic-based analytic solutions.

Watch now
Jan 19, 2021

3 different approaches to multi-tenant applications

The three main challenges with multi-tenant applications are cost, performance, and data management.

Watch now
Jan 19, 2021

How Apache Druid ensures quality of service

Gian Merlino covers Druid’s use of tiers, lanes and priorities to address QoS and deliver consistent performance as you scale demand.

Watch now
Jan 14, 2021

Building interactive data applications for event streams w/Confluent

Data apps let business users explore and investigate all of a company's event data and come to insights that impact day-to-day and long-range decision making.

Watch now
Jan 06, 2021

Outbrain’s real-time analytics architecture

Outbrain’s real-time analytics architecture consists of modern big data technologies like Kafka, Spark, Druid and Imply Pivot for query and visualization.

Watch now
Jan 05, 2021

Lyft’s modern data architecture feat. Apache Druid, Kafka and Flink

Take a look at Lyft’s modern data architecture. They have an app implemented in Flink that reads real-time event data from Kinesis and transforms the data.

Watch now
Jan 05, 2021

Technical reasons why Lyft chose Apache Druid for real time analytics

In this video, Sharanya Santhanam from Lyft explains the key technical reasons why they chose Druid as the engine for their real-time data pipeline.

Watch now
Jan 05, 2021

Lyft’s Apache Druid uses cases

Lyft uses Apache Druid for three core use cases: geo-spatial data lookups for rideshare customer service, analytics on AWS cloud infrastructure spend.

Watch now
Jan 05, 2021

Achieve the event-driven Nirvana with Apache Druid

We will explain how Apache Druid enables self-service BI on event data and allows business users to ask their own questions leading to real-time insights.

Watch now

Let us help with your analytics apps

Request a Demo