Jul 16, 2020

Introducing Apache Druid 0.19

The Apache Druid community released Druid 0.19 on July 21st, 2020. This release contains over 200 new features, performance enhancements, bug fixes, and major documentation improvements from 47 contributors.

As always,you can visit the Apache Druid download page to download the software and read the full release notes detailing every change. This Druid release is also available as part of the Imply distribution, which includes Imply Pivot as well.

We want to encourage you to try and explore Druid with this new release.

Performance enhancements

Druid has always cared deeply about performance. Vectorized execution is a feature of many modern databases to speed up execution by parallelizing the work Druid does during query execution. In some cases, you can see a performance gain of 2-5X.

Vectorized queries were introduced in Druid 0.16 release but had been kept as opt-in only. Over the past few releases, the feature has been stabilized and battle-tested. Thus, in 0.19, we are turning on vectorized queries by default.

Currently, vectorized queries only work on data on historicals for GroupBy and Timeseries queries.

Easier and more flexible ingestion

As the usage of Druid expands, the data sources are getting more diverse. In 0.19, we’ve expanded the support of new data sources. Specifically, there is a newly added SqlInputSource that allows you to ingest data from MySQL and Postgres databases. There is also an enhancement to native batch ingestion to support Avro Object Container Files.

Previously, Druid users depended on translating those data sources into intermediate file formats that Druid could consume. The intermediate file formats can be error-prone,and complex to maintain, and are generally slower.

With this release, if you have any data currently in MySQL, Postgres, or in Avro format, you can load them directly into Druid with a single step.

As we continue to mature Druid’s ingestion system, we are improving the feature coverage of native batch ingestion. When setting up Druid data sources, users often use range and hash partition to improve the query performance. However, neither partition type supports appends of late arrival data. In 0.19, we ’ve extended the support for appending late data when the existing data source is stored by range or hash partitions. With this, you can ingest late arrival data while maintaining the performance improvement offered by range and hash partitioning.

Apache Ranger authorization integration

Apache Ranger is an open-source security solution for Hadoop ecosystem. In this integration with Druid, it enables cluster administrators to restrict access to data sources by granting read-only or read-write permissions. This integration will help users who are thinking about using or already having a Ranger deployment to have an integrated security management experience.

More diverse cloud support

As Druid is being adopted by more and more users, the underlying platform support is also widening. It’s very exciting to see more integration with various cloud infrastructure providers.

In 0.19, Druid has added support for Alibaba Object Storage Service when used as Druid deep storage. This is the native object storage solution offered by Alibaba Cloud.

Also, on Google Compute Engine platform, Druid overlord now supports autoscaling using Managed instance groups.

Both of those features enable Druid to better leverage the underlying cloud infrastructure. As they are contrib extensions and not packaged as part of Druid by default. You can use this guide here to install and use those extensions.

Other items

For a full list of all new functionality in Druid 0.19.0, head over to the Apache Druid download page and check out the release notes!

Related Posts