The Imply Blog

We write about our product, technology, and company

Tag: apache-druid

Community Spotlight: SuperAwesome kid-safe internet advertising powered by Apache Druid

by Peter Marshall · in User stories · January 24, 2021

SuperAwesome’s mission is to “make the internet safer for kids”: a safe, effective, and entertaining place that is 100% “Kid Safe”, including an advertising system that protects the personal data of children.

Read More

Community Spotlight: Amobee leapfrogs advertising competitors with Apache Druid

by Peter Marshall · in User stories · December 2, 2020

Amobee provides end-to-end advertising campaign and portfolio management across TV, digital and social media for some of the world’s largest brands, from Pringles to Spotify.

Read More

Join Lyft, Outbrain, Innowatts and FullContact for the next Virtual Druid Summit November 18

by Jelena Zanko · in User stories · October 30, 2020

We’re thrilled to announce that the fourth edition of Virtual Druid Summit will be taking place on November 18, 2020!

Read More

Community Spotlight: LiquidM powers real-time highly-targeted adtech with Apache Druid

by Matt Sarrel · in User stories · October 26, 2020

LiquidM provides modular cloud-based software that allows agencies and trading desks to run their adtech activities and campaigns on a customizable, standardized, open platform. LiquidM provides real time efficiency, control, and insights into media planning and buying.

Read More

Introducing Apache Druid 0.20.0

by Will Xu · in Apache Druid · October 19, 2020

Apache Druid 0.20.0 contains over 140 updates from 36 contributors, including new features, major performance enhancements (6x-11x on some queries!), bug fixes, and major documentation improvements.

Read More

Community Spotlight: Pollfish survey insights, powered by Apache Druid

by Peter Marshall · in User stories · October 14, 2020

Pollfish, the “easiest and most affordable way to get real-time insights from real consumers”, delivers democratic, real-time insights with an innovative Apache Druid®-powered pipeline that includes microservices leveraging an open-source Scala library for Apache Druid: Scruid. Anastasios Skarlatidis, Director of Data Engineering and Science at Pollfish, tells more.

Read More

Community Spotlight: Innowatts provides AI-driven analytics for the power industry

by Matt Sarrel · in User stories · September 30, 2020

Innowatts provides an AI-driven data analytics SaaS platform for power utilities and retailers worldwide. Customers rely on Innowatts and the 40 million plus meters they are managing for the data needed to be more predictive, proactive and connected to their customers and ratepayers, helping them better manage risk, improve profitability, maintain grid reliability and anticipate sustainability trends.

Read More

Continuing the Virtual Druid Summit Conversation: Netflix Closes the Loop

by Jelena Zanko · in User stories · September 30, 2020

We promised you closure, and we’re offering exactly that in a neatly-packaged blog post. Ben Sykes, Sr. Software Engineer at Netflix, has answered the questions that he wasn’t able to address during his Virtual Druid Summit II session: How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experience.

Read More

Virtual Druid Summit Returns with Talks from Pinterest, Splunk, GameAnalytics, Nielsen, and Zeotap

by Jelena Zanko · in User stories · September 18, 2020

Though you might have heard that Virtual Druid Summit was returning, did you guess that it would be so soon? We’re stoked, too! The third installment of Virtual Druid Summit will be taking place on October 7, 2020.

Read More

Community Spotlight: Mindhouse Achieves Monitoring Nirvana with Apache Druid

by Matt Sarrel · in User stories · September 10, 2020

We recently discussed Mindhouse’s use of Apache Druid for clickstream data analysis and user behavior funnel analysis with Ankur Gupta, the company’s engineering technical lead. Ankur’s team relies on Druid to “segment users and understand how they are using our app”, and finds that “it’s especially helpful when we launch a new feature because we can understand the acceptance of the feature based on current user activity”.

Read More

Apache Druid hits 10,000 Github star milestone

by Matt Sarrel · in User stories · September 9, 2020

We’ve always believed that community growth and collaboration is critical to the success of Apache Druid. For this reason, we’re excited to announce that last week, the Druid Github repository passed 10,000 stars!

Read More

Join Target, Twitch, Netflix and TrafficGuard on the Sep 2 Edition of Virtual Druid Summit!

by Jelena Zanko · in User stories · August 19, 2020

Because we care deeply about the health of the community while continuing to deliver the most interesting Apache Druid stories, we’re hosting the second edition of Virtual Druid Summit II on September 2, 2020.

Read More

Introducing Apache Druid 0.19

by Will Xu · in Apache Druid · July 16, 2020

The Apache Druid community released Druid 0.19 on July 21st, 2020. This release contains over 200 new features, performance enhancements, bug fixes, and major documentation improvements from 47 contributors.

Read More

The Secret to Apache Druid Success

by Rob Meyer · in Apache Druid · June 10, 2020

There is no doubt that Apache Druid is a success, or that the benefits of implementing Druid can be huge. It is the leading real-time analytics database on the market. Thousands of companies use it to use it to equip their employees and customers with self-service analytics to make better, faster decisions.

Read More

Introduction to JOINs in Apache Druid

by Gian Merlino · in Apache Druid · June 4, 2020

In Apache Druid 0.18/Imply 3.3, we added support for SQL Joins in Druid. This capability, which has long been demanded from Druid by the community, opens the door to a large number of possibilities in the future. In this blog I want to highlight some of the motivations behind us undertaking the effort and give you, the reader, an understanding of how it can be useful and where we’re going with it.

Read More

Continuing the Virtual Druid Summit Conversation with Athena Health

by Karthik Urs, Athena Health · in User stories · May 11, 2020

When an engaged technical audience asks great questions, it’s easy to run out of time during Q&A. And that’s exactly what happened at Virtual Druid Summit! Because our speakers weren’t able to address all of your questions during the live sessions, we’re following up with the answers you deserve in a series of blog posts.

Read More

When a Data Warehouse Can’t Keep it Real-Time

by Rick Bilodeau · in Industry · May 4, 2020

This short post describes how Druid compares against enterprise data warehouses. Druid is not a data warehouse. It's a real-time database for user-facing analytics application needing sub-second query response at high concurrency.

Read More

Continuing the Virtual Druid Summit Conversation: Twitter has Answers

by Swapnesh Gandhi, Twitter · in User stories · May 1, 2020

Thanks again to everyone who attended Virtual Druid Summit, and for being so engaged – as we previously mentioned, our speakers received more than 150 questions across their collective sessions! Unfortunately, there wasn’t enough time to answer all of your very good questions during the live sessions. In an effort to bring you some closure, we’ve invited our esteemed speakers to address the remaining questions in a series of blog posts.

Read More

What Went Down at the Virtual Druid Summit

by Rachel Pedreschi · in User stories · April 24, 2020

I want first to thank everyone involved with making our first, albeit a tad non-standard, Druid Summit a smashing success. We, and the hundreds of folks who had purchased tickets, were extremely disappointed to postpone our first physical Druid Summit to November, but we were fortunate to have 10 great speakers agree to do their presentations virtually.

Read More

Announcing Virtual Druid Summit - April 15, 2020

by Rick Bilodeau · in User stories · March 24, 2020

I am happy to tell you that 10 days after we officially postponed Druid Summit, we have now launched Virtual Druid Summit, which will take place as a series of online talks on April 15. Each talk will be a spicy 30 minutes of real-world information, followed by Q/A. There are 5 talks from Druid practitioners from a variety of industries and spanning 3 continents. The summit will open with a Apache Druid Roadmap and Vision talk from Apache Druid PMC Chair Gian Merlino, and will close with a 2-way voice-interactive “ask us anything” session featuring Druid authors and contributors.

Read More

Imply Releases A Reference Architecture for Apache Druid on Microsoft Azure

by Matt Sarrel · in Apache Druid · March 17, 2020

Tijo Thomas, a Solutions Architect at Imply, recently wrote a reference architecture for Apache Druid on Microsoft Azure that includes some best practices for running on services such as Azure VM, Azure Blob Storage, Azure Database Service and HDInsight.

Read More

A Reference Architecture for Real-Time IoT Analytics feat. Apache Druid

by Eric Graham · in Apache Druid · February 28, 2020

Analyzing the potential petabytes or more of data from all these devices goes way beyond existing data warehouses or data lakes. Fortunately companies have already implemented IoT analytics using Imply, the real-time intelligence platform built on Apache Druid, the leading open source real-time time analytics database.

Read More

Introducing Druid 0.17.0

by Gian Merlino · in Apache Druid · January 30, 2020

Earlier this week, the Apache Druid community released Druid 0.17.0. This is the project’s first release since graduating from the Apache Incubator, and it therefore represents an important milestone.

Read More

Apache Druid on Google Cloud Platform (GCP) Reference Architecture

by Matt Sarrel · in Apache Druid · January 29, 2020

Muthu Lalapet, a Solutions Architect at Imply, recently wrote a reference architecture for Apache Druid on Google Cloud Platform (GCP) that includes some best practices for leveraging GCP services such as Compute Engine, Cloud Storage and Cloud SQL. The document describes example cluster architectures and their accompanying machine types and configurations. As such, it’s a helpful resource for planning and implementing Druid on GCP.

Read More

Why Vertica Customers Adopt Apache Druid for Real-Time Analytics

by Rob Meyer · in Industry · January 13, 2020

If you are a Vertica customer, you probably already know this. Vertica is not built for real-time operational analytics at scale. If you do not know Vertica very well, you might be surprised. This statement may seem controversial. It’s not. Nearly ¼ of Imply customers were existing Vertica customers who purchased Imply, a commercially supported version of Apache Druid, because they were trying to implement operational analytics and hit limitations with Vertica. Other Vertica customers also use open source Druid and self-support.

Read More

Apache Druid and Imply Best Practices for GDPR and CCPA Compliance

by Rob Meyer · in Apache Druid · January 8, 2020

If you are using Apache Druid to analyze customer-oriented data you are probably familiar with the General Data Privacy Regulation (GDPR), which went into effect May 25, 2018. However, you may be less familiar with a new law, the California Consumer Privacy Act (CCPA), which went into effect January 1, 2020 and is likely become a *de facto* standard in the US.

Read More

How Nielsen Marketing Cloud Uses Druid for Audience and Marketing Performance Analysis

by Itai Yaffe · in User stories · November 21, 2019

Nielsen Marketing Cloud uses Druid to profile the various audiences that marketers and publishers would like to target on digital media, activate via various ad networks, and then gain insights on that activation after the fact.

Read More

Clickstream Funnel Analysis with Apache Druid

by Mike McLaughlin · in Solutions · September 11, 2019

Apache Druid is commonly used for clickstream funnel analysis, and in this blog post we’ll deep dive into how you can collect and analyze funnel data. While there are applications designed for clickstream analysis, such as Google Analytics and Adobe SiteCatalyst (previously Omniture), Druid is ideal when you have significant scale.

Read More

Apache Druid helps Zeotap Master Multi-Channel Attribution at Scale

by Chaitanya Bendre, Zeotap · in User stories · August 22, 2019

Below is a transcript of a short interview we conducted with Chaitanya Bendre, Lead Data Engineer at Zeotap, where we discussed there use of Druid to help address the difficult problem of identity resolution and multi-channel attribution.

Read More

Blueshift: Scaling real-time campaign analytics with Apache Druid

by Anuraj Pandey, Blueshift · in User stories · August 8, 2019

Blueshift is an AI-powered customer data activation platform enabling CRM and product marketers to intelligently manage their audiences and orchestrate large-scale personalized messaging campaigns at scale. Blueshift offers real-time campaign analytics as a core capability in the platform. Campaign analytics break down engagement metrics like impressions, clicks, conversions etc by channel, trigger, experiment etc. Currently two billion+ user interactions are tracked on a monthly basis.

Read More

Independent Performance Benchmark: Apache Druid versus Presto and Apache Hive

by Rick Bilodeau · in Industry · July 17, 2019

A recent paper by independent researchers at the University of Minho in Portugal compared the performance of Apache Druid to well-known SQL-on-Hadoop technologies Hive and Presto. In the tests, Druid outperformed Presto from 10X to 59X (90% to 98% speed improvement) and Hive by over 100X.

Read More

Hadoop Indexing for Apache Druid at Scale - Configuration Best Practices

by Rommel Garcia · in Solutions · July 12, 2019

When Hadoop is pushing data into Druid, Hadoop indexer performance is key and becomes challenging at scale. There are a quite a few things to consider when running large scale Hadoop indexing.

Read More

Introducing Apache Druid 0.15.0

by Gian Merlino · in Apache Druid · June 27, 2019

Today, the Apache Druid community released Druid 0.15.0-incubating. Druid is known as an extremely high-performance database and much of the early design work has been focused on providing speed at scale. Lately we have made a pivot towards those “ease of” factors that help users get productive with Druid quickly.

Read More

Apache Druid vs. Time-Series Databases

by Rick Bilodeau · in Industry · June 11, 2019

Although Druid draws ideas from a number of TSDB concepts, it is designed for a wider range of analytic use cases than those for which a TSDB is usually employed.

Read More

Announcing Druid 0.14.0

by Gian Merlino · in Apache Druid · April 9, 2019

Today the Apache Druid community released Druid 0.14.0, our second release under the Apache umbrella and the first major release of 2019. I thought I'd take this opportunity to talk about what's new in this release and what's coming in the future.

Read More

Apache Cassandra vs. Apache Druid

by Rachel Pedreschi · in Industry · March 6, 2019

If you are reading this because you are considering whether to use Apache Cassandra/DSE/ScyllaDB or Apache Druid/Imply, then you can just stop right now.

Read More

Securing Druid

by Jon Wei · in Apache Druid · September 13, 2018

Security is a critical requirement in every deployment of a system that holds and processes data. In this blog post, we will discuss how we secured Apache Druid, and validated our implementation.

Read More

March 2018 Druid Meetup - eBay Monitoring Platform

by Gian Merlino · in User stories · March 22, 2018

March 2018 Druid Bay Area Meetup - eBay Monitoring Platform, links to slides

Read More

End-to-end Security in Druid

by Jon Wei · in Imply platform · December 18, 2017

It is now possible to deploy a druid cluster in a secure setting.

Read More

November 2017 Druid Bay Area Meetup - MZ, Slack, Druid Roadmap

by Gian Merlino · in User stories · November 29, 2017

November 2017 Druid Bay Area Meetup video, with talks from MZ and Slack, and a Druid roadmap update from Imply

Read More

Druid February Meetup Slides

by Fangjin Yang · in User stories · February 23, 2017

Our last Druid meetup had great talks about how Druid is used at Branch, the ongoing work around better integrating Druid with the Hadoop ecosystem, and our roadmap plans.

Read More

Druid December Meetup Videos

by Fangjin Yang · in User stories · December 21, 2016

Our last Druid meetup at Sift Science had 3 great talks about use cases with Druid & Imply, and the upcoming Druid roadmap.

Read More

Compressing Longs in Druid

by David Li · in Apache Druid · December 7, 2016

With the Druid 0.9.2 release, Druid has added additional column compression methods for longs to significantly improve query performance in certain use cases. In this blog post, we’ll highlight how these various compression methods impact data storage size and query performance.

Read More

Farewell Lambda Architectures: Exactly-Once Streaming Ingestion in Druid

by David Lim · in Industry · July 5, 2016

Today, many companies are turning to streaming solutions which are enabling them to understand and make business decisions from their data immediately, resulting in an operational agility that was unthinkable only a few years ago. The new Kafka indexing service is an exciting milestone in the maturity of Druid's ingestion technology, giving users a way to stream data into Druid with exactly-once correctness.

Read More

How can we help?