> Articles

Articles

Gain a clear understanding of the data ecosystem, best practices, design patterns, and key decision criteria for various technical objectives and use cases.

Larissa Klitzke

2024 Product Updates: Technical Summary

We released over 1,500 updates to Druid and Imply last year, focused on four key pillars that we’ll continue to build upon in 2025. From here, we’ll dive deep into these four strategic pillars to define the purpose and value of the top 2024 product updates, plus a few extra updates that you may be keen to explore further.

#architecture #deployment

Adhip Gupta

Building a Real-Time Analytics Architecture with Imply Polaris on Azure

Imply Polaris is now available on Azure! Learn more about what Polaris can do for your Azure-based applications

#architecture #implypolaris #polaris #polarisonazure

David Wang

Things to Consider When Scaling Analytics for High QPS

In the era of analytics where query volume is the sine qua non “V” of data, how should we think about system architecture – what matters and why?

#architecture #concurrency #qps

David Wang

Why Analytics Needs More than a Data Warehouse

For decades, analytics were defined by business intelligence and executive-style reports powered by read-optimized data warehouses. This article dives how an analytics are shifting from batch reporting workflows to real-time application workflows.

#architecture #datawarehouse #realtimeanalytics

Darin Briskman

Why Data needs more than CRUD

After over 30 years of working with data analytics, we’ve been witness (and sometimes participant) to three major shifts in how we find insights from data – and now we’re looking at the fourth.

#architecture #transactional-db

Matt Morrissey

Overcome tradeoffs with schemaless databases

In this article, we explore the challenges posed by schemaless databases and introduce Druid, a groundbreaking database that seamlessly combines schema flexibility with high-performance capabilities, eliminating the need for trade-offs.

#architecture #schema #streaming

William To

Three Ways to Use Apache Druid for Machine Learning Workflows

Apache Druid is an excellent addition to any machine learning environment and can facilitate analytics, streamline monitoring, and add real-time data to operations and training.

#architecture #ai #machinelearning

Will Xu

Distributed by Nature: Druid at Scale

This blog explains how Druid’s architecture and built-in automation makes it easy to operate and scale in cloud and k8s environments.

#architecture #operations

David Wang

Real-Time Analytics: Building Blocks and Architecture

There’s an increasing need for immediacy in data analytics, and it’s happening at scale on large data sets. This post unpacks the key building blocks and data architecture for real-time analytics.

#architecture #realtimeanalytics

David Wang

Apache Druid: Making 1000+ QPS for Analytics Look Easy

This post dives into Apache Druid’s architecture with details on how it can efficiently handle analytics applications needing high QPS.

#architecture #concurrency #qps

Larissa Klitzke

2024 Product Updates: Technical Summary

#architecture #deployment

Matt Morrissey

Keeping up with changing schemas in streaming data

Discover how Apache Druid delivers a unique solution for managing schema changes in streaming data. Its approach helps alleviate challenges, solidifying Druid’s position as the top database for real-time analytics.

#ingestion #kafka #schema #streaming

Soumyava Das

Exploring Unnest in Druid

This article shows how Druid supports multi-value strings through multi-value dimensions (MVDs), which automatically flattens during a group-by.

#ingestion #modeling

Senthil Vallinayagam

The Promise (and Limitations) of Range Partitions

Learn how to improve read times with range partitions.

#modeling #msq #rangepartitions #sql

Kumar Abhishek

An Introduction to Window Functions

Learn all about window functions

#modeling #sqlsyntax #windowfunctions

Kashif Faraz

Multi-dimensional range partioning in Druid

Druid always partitions data by the timestamp dimension to benefit time-based analytical queries. A secondary partitioning is available to further break down the time chunks into manageable partition sizes.

#modeling

Gian Merlino

Joins in Apache Druid

This blog explores Druid’s multiple options for joins, including ingestion-time and query-time joins, catering to different use cases and data scenarios.

#modeling

Soumyava Das

Exploring Unnest in Druid

This article shows how Druid supports multi-value strings through multi-value dimensions (MVDs), which automatically flattens during a group-by.

#ingestion #modeling

Sergio Ferragut

Learn how to achieve sub-second responses with Apache Druid

A review of Druid’s query processing engine with an eye on performance. Provides many data modeling and query tips that improve response times.

#modeling #performance

Matt Morrissey

The Significance of Schema Auto-Discovery in Apache Druid

This article provides a technical overview of the schema auto-discovery feature in Apache Druid through a practical IoT telemetry use case.

#modeling #schema

Peter Marshall

Imply Polaris: Powered by Apache Druid

A guide to how Apache Druid features and concepts relate to Imply Polaris Imply Polaris serves as the “easy button” for Apache Druid, delivering key advantages that simplify operations, enhance performance, and reduce costs. Welcome to Imply Polaris! Conceived as a cloud data warehouse for easily building real-time data applications, Polaris has been adopted by […]

#development #operations #apachedruid #polaris

Will Xu

Distributed by Nature: Druid at Scale

This blog explains how Druid’s architecture and built-in automation makes it easy to operate and scale in cloud and k8s environments.

#architecture #operations

Peter Marshall

Imply Polaris: Powered by Apache Druid

#development #operations #apachedruid #polaris

David Wang

Four Key Considerations for Customer-Facing Analytics

Analytics aren’t just for internal stakeholders anymore. If you’re building an analytics application for customers, then you’re probably wondering…what’s the right database backend?

#development #customer-facing #usecases

David Wang