Introducing Apache Druid 0.19

Jul 16, 2020
Will Xu

The Apache Druid community released Druid 0.19 on July 21st, 2020. This release contains over 200 new features, performance enhancements, bug fixes, and major documentation improvements from 47 contributors.

As always,you can visit the Apache Druid download page to download the software and read the full release notes detailing every change. This Druid release is also available as part of the Imply distribution, which includes Imply Pivot as well.

We want to encourage you to try and explore Druid with this new release.

Performance enhancements

Druid has always cared deeply about performance. Vectorized execution is a feature of many modern databases to speed up execution by parallelizing the work Druid does during query execution. In some cases, you can see a performance gain of 2-5X.

Vectorized queries were introduced in Druid 0.16 release but had been kept as opt-in only. Over the past few releases, the feature has been stabilized and battle-tested. Thus, in 0.19, we are turning on vectorized queries by default.

Currently, vectorized queries only work on data on historicals for GroupBy and Timeseries queries.

Easier and more flexible ingestion

As the usage of Druid expands, the data sources are getting more diverse. In 0.19, we’ve expanded the support of new data sources. Specifically, there is a newly added SqlInputSource that allows you to ingest data from MySQL and Postgres databases. There is also an enhancement to native batch ingestion to support Avro Object Container Files.

Previously, Druid users depended on translating those data sources into intermediate file formats that Druid could consume. The intermediate file formats can be error-prone,and complex to maintain, and are generally slower.

With this release, if you have any data currently in MySQL, Postgres, or in Avro format, you can load them directly into Druid with a single step.

As we continue to mature Druid’s ingestion system, we are improving the feature coverage of native batch ingestion. When setting up Druid data sources, users often use range and hash partition to improve the query performance. However, neither partition type supports appends of late arrival data. In 0.19, we ’ve extended the support for appending late data when the existing data source is stored by range or hash partitions. With this, you can ingest late arrival data while maintaining the performance improvement offered by range and hash partitioning.

Apache Ranger authorization integration

Apache Ranger is an open-source security solution for Hadoop ecosystem. In this integration with Druid, it enables cluster administrators to restrict access to data sources by granting read-only or read-write permissions. This integration will help users who are thinking about using or already having a Ranger deployment to have an integrated security management experience.

More diverse cloud support

As Druid is being adopted by more and more users, the underlying platform support is also widening. It’s very exciting to see more integration with various cloud infrastructure providers.

In 0.19, Druid has added support for Alibaba Object Storage Service when used as Druid deep storage. This is the native object storage solution offered by Alibaba Cloud.

Also, on Google Compute Engine platform, Druid overlord now supports autoscaling using Managed instance groups.

Both of those features enable Druid to better leverage the underlying cloud infrastructure. As they are contrib extensions and not packaged as part of Druid by default. You can use this guide here to install and use those extensions.

Other items

For a full list of all new functionality in Druid 0.19.0, head over to the Apache Druid download page and check out the release notes!

Other blogs you might find interesting

No records found...
Jun 17, 2024

Community Spotlight: Using Netflix’s Spectator Histogram and Kong’s DDSketch in Apache Druid for Advanced Statistical Analysis

In Apache Druid, sketches can be built from raw data at ingestion time or at query time. Apache Druid 29.0.0 included two community extensions that enhance data accuracy at the extremes of statistical distributions...

Learn More
Jun 17, 2024

Introducing Apache Druid® 30.0

We are excited to announce the release of Apache Druid 30.0. This release contains over 409 commits from 50 contributors. Druid 30 continues the investment across the following three key pillars: Ecosystem...

Learn More
Jun 12, 2024

Why I Joined Imply

After reviewing the high-level technical overview video of Apache Druid and learning about how the world's leading companies use Apache Druid, I immediately saw the immense potential in the product. Data is...

Learn More

Let us help with your analytics apps

Request a Demo