Apache Druid on Google Cloud Platform (GCP) Reference Architecture

Jan 29, 2020
Matt Sarrel

Apache Druid on Google Cloud Platform (GCP) Reference Architecture

When I approach a new distributed technology, I usually find it helpful to read through a reference architecture and a quickstart or two before I get my hands dirty. It helps me prepare for potential snags like dependencies, permissions, provisioning unsuitable instances; and hopefully avoid some and minimize others.

Muthu Lalapet, a Solutions Architect at Imply, recently wrote a reference architecture for Apache Druid on Google Cloud Platform (GCP) that includes some best practices for leveraging GCP services such as Compute Engine, Cloud Storage and Cloud SQL. The document describes example cluster architectures and their accompanying machine types and configurations. As such, it’s a helpful resource for planning and implementing Druid on GCP.

Apache Druid is a real-time analytics database designed for ultrafast query response on large datasets. Druid can scale to ingest millions of events per second, store trillions of events ( petabytes of data), and perform queries with sub-second response times at scale. Druid’s most common use cases are where real-time (streaming) ingestion, fast query performance, and no downtime are critical. This makes Druid a good choice for operational analytics projects that provide real-time intelligence, the process of delivering information as events occur so that businesses can gain immediate insight. Some of the largest online entities in the world rely on Druid for use cases such as clickstream analytics, application/device/network performance monitoring, and BI/OLAP. While Druid ingests data from a variety of sources, it is commonly paired with Kafka on GCP for event monitoring, financial analysis, and IoT monitoring.

Druid is cloud-native and runs as server types that host groups of processes. At a high level, there’s the Master server to coordinate data ingestion and storage, the Data server to store and ingest data, and the Query server(s) that act as endpoints for users and client applications to interact with. Druid also relies on external metadata storage, deep storage, and Apache Zookeeper to coordinate its processes.

There’s a lot of detail (and years of development) underlying this simple explanation, and you can learn all about it when you download the reference architecture.

Google Cloud Platform is a combination of services and resources built and operated by Google in their global data centers. Google’s many offerings include IaaS, PaaS, and serverless computing options for data storage and data analytics. You can subscribe to replace part of or all of an entire enterprise IT infrastructure, leverage an automated machine learning environment, and leverage open source technology on Google’s hardware.

Many GCP customers are drawn to the platform because of its tight integration with open source frameworks, making them easier to learn, develop for, and operate in production. Druid is one such open source technology, and there are a number of very large Druid deployments on GCP.

While there are many services offered on Google’s infrastructure, Compute Engine, Cloud Storage, and Cloud SQL are the most important components when it comes to running Druid. Compute Engine provides high-performance virtual machines that can be used to run the Druid components described above (Master, Data, Query servers). Druid leverages Google Cloud Storage as deep storage and Cloud SQL as metadata storage. Cloud SQL allows you to run MySQL, PostgreSQL or SQL Server to house Druid metadata, and it automates tasks to create a high-availability metadata environment. As Druid stores segments into deep storage, Google Cloud Storage can provide durability and high-availability to data as long as the Druid processes can connect to it.

The reference architecture provides guidance on the type of instance to choose when getting started, and for those who want to dig deeper, here’s a more general discussion of Druid server sizing and cluster tuning.

Here’s a tip from the reference architecture: Please make sure to use Standard Storage class for best availability and high performance. How’s that for something helpful to learn before provisioning resources?

Get this and more tips in the Druid Reference Architecture for GCP.

Other blogs you might find interesting

No records found...
Feb 03, 2026

Imply Lumi product update: what’s new and what’s coming

Since releasing Imply Lumi in September 2025 as a decoupled data layer for observability, the Imply R&D team has been hard at work to make it easier and more economical to retain, query, and analyze observability...

Learn More
Dec 19, 2025

The Most-Read Imply Blogs of 2025 (and what they signal for 2026)

Before we take on 2026, let’s rewind. 2025 was the year observability teams stopped asking, “How do we reduce data?” and started asking the real question: “How do we build an architecture that can keep...

Learn More
Dec 16, 2025

The Breaking Point for Observability Leaders

Observability is at a crossroads For years, observability has promised to give teams the visibility they need to keep digital services resilient. But as data volumes explode, many leaders are realizing the...

Learn More

Ready to decouple your observability stack?
No workflow changes. No migrations. More data, less spend.

Request a Demo