The Imply Blog

We write about our product, technology, and company

Clickstream Funnel Analysis with Apache Druid

Mike McLaughlin · September 11, 2019

Apache Druid is commonly used for clickstream funnel analysis, and in this blog post we’ll deep dive into how you can collect and analyze funnel data. While there are applications designed for clickstream analysis, such as Google Analytics and Adobe SiteCatalyst (previously Omniture), Druid is ideal when you have significant scale.

Read More

Apache Druid helps Zeotap Master Multi-Channel Attribution at Scale

Chaitanya Bendre · August 22, 2019

Below is a transcript of a short interview we conducted with Chaitanya Bendre, Lead Data Engineer at Zeotap, where we discussed there use of Druid to help address the difficult problem of identity resolution and multi-channel attribution.

Read More

Blueshift: Scaling real-time campaign analytics with Apache Druid

Anuraj Pandey · August 8, 2019

Blueshift is an AI-powered customer data activation platform enabling CRM and product marketers to intelligently manage their audiences and orchestrate large-scale personalized messaging campaigns at scale. Blueshift offers real-time campaign analytics as a core capability in the platform. Campaign analytics break down engagement metrics like impressions, clicks, conversions etc by channel, trigger, experiment etc. Currently two billion+ user interactions are tracked on a monthly basis.

Read More

Independent Performance Benchmark: Apache Druid versus Presto and Apache Hive

Rick Bilodeau · July 17, 2019

A recent paper by independent researchers at the University of Minho in Portugal compared the performance of Apache Druid to well-known SQL-on-Hadoop technologies Hive and Presto. In the tests, Druid outperformed Presto from 10X to 59X (90% to 98% speed improvement) and Hive by over 100X.

Read More

Announcing Imply 3.0

Vadim Ogievetsky · July 15, 2019

We are delighted to announce that Imply 3.0 is now available! It contains many usability features such as a visual data loader, on-premises cluster and alerts functionality.

Read More

Hadoop Indexing for Apache Druid at Scale - Configuration Best Practices

Rommel Garcia · July 12, 2019

When Hadoop is pushing data into Druid, Hadoop indexer performance is key and becomes challenging at scale. There are a quite a few things to consider when running large scale Hadoop indexing.

Read More

Druid @ Zscaler - A Retrospective

Subramanian Srinivasan · Kevin Fletcher · July 8, 2019

This document outlines the journey Zscaler made in building Zscaler Private Access (ZPA) and focuses on the analytics component of our solution. We will discuss some of our early requirements, why we picked certain technologies, such as Druid/Imply, and how we run things today.

Read More

Interactive Analytics at MoPub (Twitter): Using Druid and Imply to Query Terabytes of Data in Seconds

Rick Bilodeau · July 5, 2019

MoPub, a Twitter company, has just launched a new solution called MoPub Analytics based on Apache Druid and using Imply Pivot as the drag-and-drop UI. The solution allows users to determine the root cause of new data trends by interactively analyzing the data across many different time slices, dimensions, and metrics.

Read More

Introducing Apache Druid 0.15.0

Gian Merlino · June 27, 2019

Today, the Apache Druid community released Druid 0.15.0-incubating. Druid is known as an extremely high-performance database and much of the early design work has been focused on providing speed at scale. Lately we have made a pivot towards those “ease of” factors that help users get productive with Druid quickly.

Read More

Results of the first Apache Druid (incubating) community survey

Gian Merlino · June 25, 2019

We recently conducted our first Druid community survey. Every so often we’ll be asking our community a short set of questions to understand how they use Druid, and how they would like to see it improved.

Read More

Clickstream Analysis - An Open Source Architecture

Mike McLaughlin · Peter Marshall · June 12, 2019

A triad of open source projects - Divolte, Apache Kafka and Apache Druid - can power real-time collection, streaming and interactive visualisation of clickstreams, so you can investigate and explore what’s happening on your digital channels as easily as looking out of your office window.

Read More

Apache Druid vs. Time-Series Databases

Rick Bilodeau · June 11, 2019

Although Druid draws ideas from a number of TSDB concepts, it is designed for a wider range of analytic use cases than those for which a TSDB is usually employed.

Read More

Tutorial: Using Apache Druid and Imply With Google Cloud Dataproc For Hadoop Indexing

Rommel Garcia · June 6, 2019

To help you get to know GCP and Druid, the tutorial below will walk you through how to install and configure Druid to work with Dataproc (GCP’s managed Hadoop offering) for Hadoop Indexing. Then it will show you how to ingest and query data as well.

Read More

Tutorial: An End-to-end Streaming Analytics Stack for Juniper Streaming Telemetry

Eric Graham · May 23, 2019

In this tutorial, we will step through how to set up Imply, Kafka, and Open-NTI to build an end-to-end streaming analytics stack that can handle Juniper Native streaming telemetry data.

Read More

When a Data Warehouse Can’t Keep it Real-Time

Rick Bilodeau · May 6, 2019

This short post is my attempt to describe how Druid compares against enterprise data warehouses. If it’s not obvious by now, Druid is not a data warehouse and isn’t designed to replace every use case for which a data warehouse can be used.

Read More

Tutorial: An End-to-end Streaming Analytics Stack for syslog Data

Eric Graham · April 18, 2019

In this tutorial, we will step through how to set up Imply, Kafka, and syslog-ng kafka to build an end-to-end streaming analytics stack that can handle many different forms of log data.

Read More

Swimming in the Data River or: How We Got to Streaming Analytics

Rachel Pedreschi · April 16, 2019

Businesses need to understand how their metrics change across many facets of their operations, and this is the core idea behind data analytics.

Read More

Announcing Imply 2.9

Vadim Ogievetsky · April 10, 2019

Imply 2.9 is based off of the just announced Druid 0.14. Druid 0.14 contains many new features, improvements, and bug fixes. This blog post will focus on the new Imply components and features not available in Druid 0.14.

Read More

Announcing Druid 0.14.0

Gian Merlino · April 9, 2019

Today the Apache Druid community released Druid 0.14.0, our second release under the Apache umbrella and the first major release of 2019. I thought I'd take this opportunity to talk about what's new in this release and what's coming in the future.

Read More

How WalkMe uses Druid and Imply Cloud to Analyze Clickstreams and User Behavior

Yotam Spenser · April 3, 2019

WalkMe uses Imply Cloud to monitor behavioral analytics for its leading Digital Adoption Platform.

Read More

Tutorial: An End-to-end Streaming Analytics Stack for Network Telemetry Data

Eric Graham · March 26, 2019

In this tutorial, we will step through how to set up Imply, Kafka, and pmacct to build an end-to-end streaming analytics stack that can handle many different forms of networking data.

Read More

How to analyze AWS VPC logs with Imply

Eric Graham · March 14, 2019

Have you ever wanted more visibility in your AWS network traffic? This how-to blog covers how to analyze VPC flow logs with Imply.

Read More

Apache Cassandra vs. Apache Druid

Rachel Pedreschi · March 6, 2019

If you are reading this because you are considering whether to use Apache Cassandra/DSE/ScyllaDB or Apache Druid/Imply, then you can just stop right now.

Read More

Using Druid to fight ad fraud

Raigon Jolly · February 27, 2019

TrafficGuard helps some of the world’s biggest digital advertisers and agencies protect their ad spend from fraud. Our clients need access to reliable reporting in real-time to allow them to optimise their ad campaigns with current insights.

Read More

Why GameAnalytics migrated to Apache Druid

Ramón Lastres Guerrero · February 14, 2019

At GameAnalytics, our user base has grown several times over in the past 12 months, and this growth has promoted us to rethink our user experience analytics system.

Read More

Kappa architecture at NTT Com: Building a streaming analytics stack with Druid and Kafka

Paolo Lucente · January 8, 2019

One of the key activities at the heart of any internet backbone is flow analytics, which enables visibility into global traffic for many technical, economical, and security use cases. By providing real-time traffic visibility and rapid explanation capabilities for this data, we unlock tremendous business value for the whole organization.

Read More

Announcing Imply 2.8

Vadim Ogievetsky · December 18, 2018

Imply 2.8 comes with the first Apache release of Druid and a host of features aimed at performance improvements and ease of use.

Read More

Imply lookups for enhanced network flow visibility

Eric Graham · November 26, 2018

Within Druid there are multiple ways to enhance visibility for existing network flow records. This how-to blog covers one way to do this using Druid lookup tables.

Read More

Modernizing Rubicon Project’s Analytics Stack for Programmatic Advertising

Ken Lin · October 22, 2018

Rubicon Project, one of the world’s largest digital advertising exchanges, has modernized their analytics stack with Druid and Imply.

Read More

Securing Druid

Jon Wei · September 13, 2018

Security is a critical requirement in every deployment of a system that holds and processes data. In this blog post, we will discuss how we secured Apache Druid, and validated our implementation.

Read More

Announcing Imply 2.7

Vadim Ogievetsky · September 11, 2018

In Imply 2.7, we have added a selection of new visualizations and a new Explain feature that allows you to discover the contributing factors to any slice of data. We are also introducing advanced access control features, and have made several improvements to loading and managing data.

Read More

Who is knocking on our door? Analyzing AWS Netflows

Vadim Ogievetsky · June 27, 2018

We ingested our internal AWS VPC netflows into Imply and found something surprising.

Read More

Announcing Imply 2.6

Vadim Ogievetsky · June 19, 2018

Imply 2.6 introduces time compares, data export, advanced aggregation measures, and more!

Read More

Announcing Imply 2.5

Vadim Ogievetsky · March 23, 2018

Imply 2.5 comes with an improved streaming data loader, a sunburst visualization, many dashboard improvements, and more.

Read More

March 2018 Druid Meetup - eBay Monitoring Platform

Gian Merlino · March 22, 2018

March 2018 Druid Bay Area Meetup - eBay Monitoring Platform, links to slides

Read More

Imply Raises Series A funding

Fangjin Yang · March 13, 2018

I'm excited to announce that Imply has raised a $13.3M Series A, led by Andreessen Horowitz, and joined by our seed investor, Khosla Ventures.

Read More

Announcing Imply Cloud: Managed Service for AWS

Vadim Ogievetsky · March 13, 2018

Today, we are excited to announce that Imply Cloud, a fully managed service for AWS, is now generally available.

Read More

Announcing Imply 2.4

Vadim Ogievetsky · December 19, 2017

Imply 2.4 comes with a preview of a dataset manager, new features, improvements, bug fixes, and more!

Read More

End-to-end Security in Druid

Jon Wei · December 18, 2017

It is now possible to deploy a druid cluster in a secure setting.

Read More

November 2017 Druid Bay Area Meetup - MZ, Slack, Druid Roadmap

Gian Merlino · November 29, 2017

November 2017 Druid Bay Area Meetup video, with talks from MZ and Slack, and a Druid roadmap update from Imply

Read More

Announcing Imply 2.3: An Integrated Platform

Vadim Ogievetsky · August 23, 2017

Imply 2.3 comes with brand new apps, new features, improvements, bug fixes, and more!

Read More

Announcing Imply 2.2: Pivot Features Galore

Vadim Ogievetsky · May 16, 2017

We're excited to announce Imply 2.2, with tons of new features for Pivot.

Read More

Announcing Imply 2.1

Vadim Ogievetsky · April 19, 2017

We are extremely excited to announce Imply 2.1, a feature-packed release, available now for download. Native SQL comes to Druid and dashboarding is now possible in Pivot.

Read More

Druid February Meetup Slides

Fangjin Yang · February 23, 2017

Our last Druid meetup had great talks about how Druid is used at Branch, the ongoing work around better integrating Druid with the Hadoop ecosystem, and our roadmap plans.

Read More

Druid December Meetup Videos

Fangjin Yang · December 21, 2016

Our last Druid meetup at Sift Science had 3 great talks about use cases with Druid & Imply, and the upcoming Druid roadmap.

Read More

Compressing Longs in Druid

David Li · December 7, 2016

With the Druid 0.9.2 release, Druid has added additional column compression methods for longs to significantly improve query performance in certain use cases. In this blog post, we’ll highlight how these various compression methods impact data storage size and query performance.

Read More

Pivot 0.10.27 Release

Vadim Ogievetsky · December 1, 2016

It’s been some time since our last release of Pivot. Today, after much hard work, we are excited to announce version 0.10.27, which represents a significant step towards a more holistic data exploration experience.

Read More

Druid 0.9.2 Release

Gian Merlino · December 1, 2016

The Druid community is pleased to announce our next major release, 0.9.2. We’ve added hundreds of performance improvements, stability improvements, and bug fixes.

Read More

Announcing Imply 2.0

Fangjin Yang · December 1, 2016

We are extremely excited to announce Imply 2.0, our largest release ever, available now for download. This release contains significant updates to both Druid and Pivot.

Read More

Farewell Lambda Architectures: Exactly-Once Streaming Ingestion in Druid

David Lim · July 5, 2016

Today, many companies are turning to streaming solutions which are enabling them to understand and make business decisions from their data immediately, resulting in an operational agility that was unthinkable only a few years ago. The new Kafka indexing service is an exciting milestone in the maturity of Druid's ingestion technology, giving users a way to stream data into Druid with exactly-once correctness.

Read More

Announcing Imply 1.3.0

Fangjin Yang · June 29, 2016

We are extremely excited to announce the next version of Imply Analytics Platform, IAP 1.3.0, available immediately on our download page. This is one of our biggest releases to date and includes major updates for both Druid and Pivot.

Read More

Announcing Imply 1.2.1

Vadim Ogievetsky · May 9, 2016

We are pleased to announce the latest version of the Imply Analytics Platform. This release focuses on improvements to Pivot and Tranquility as well as adding programmatic querying options to PlyQL.

Read More

Programmatic PlyQL via HTTP, ODBC, and JDBC

Vadim Ogievetsky · May 4, 2016

Some time ago at Imply, we launched PlyQL, a command line utility that provides an SQL-like interface to Druid via Plywood. We heard a lot of positive feedback as many people prefer to use SQL over Druid’s native JSON-over-HTTP interface. The most common question we hear about PlyQL is how one can interface to it programmatically either from user created apps or from existing SQL based BI tools.

Read More

Architecting Distributed Databases for Failure

Fangjin Yang · December 10, 2015

Everything is going to fail. If this is your first time working with or building out a distributed system, the fact that everything is going to fail may seem like an extremely scary concept, but it is one you will always have to keep in mind.

Read More

A Tour Through the "Big Data" Zoo

Fangjin Yang · November 4, 2015

I recently read a great article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky that showcases the various techniques for visualizing and interacting with diverse data sets. I thought it may be useful to write something similar to showcase the various open source systems that exist in the “big data” space, including Druid, which is an open source data store I work on.

Read More

Pivot: A Fast Data Exploration UI for Druid

Vadim Ogievetsky · October 26, 2015

A large part of what we do at Imply is help organizations build custom applications and visualizations on top of their data. While Druid is a powerful backend for powering applications, there are aspects of the development process that could definitely be easier. To enable people to better understand the power of Druid, we have released Pivot, an exploration UI that makes the most of the power of the Druid database.

Read More

Announcing Imply: An Enterprise Solution for Druid and Interactive Analytics at Scale

Fangjin Yang · October 19, 2015

Today, Gian Merlino, Vadim Ogievetsky, and I are extremely excited to announce Imply, a company for interactive analytics at scale, centered around the Druid open source data store. We first began working on Druid at a startup called Metamarkets, and over the last few years, we’ve been proud to watch the project grow and take on a life of its own.

Read More

How can we help?