Druid 0.9.2 Release

Dec 01, 2016
Gian Merlino

The Druid community is pleased to announce our next major release, 0.9.2. We’ve
added hundreds of performance improvements, stability improvements, and bug
fixes.

You can find the full list of changes here
and documentation for this release here.

Major Highlights

New groupBy engine

Druid now includes a new groupBy engine, rewritten from the ground up for better performance and memory management. Benchmarks show a 2–5x performance boost on our test datasets. The new engine also supports strict limits on memory usage and the option to spill to disk when memory is exhausted, avoiding result set row count limitations and potential OOMEs generated by the previous engine.

The new engine is off by default, but you can enable it through configuration
or query context parameters. We intend to enable it by default in a future
version of Druid.

You can find additional “implementation details” on
http://druid.io/docs/0.9.2/querying/groupbyquery.html#implementation-details
for documentation and configuration.

Ability to disable rollup

Since its inception, Druid has had a concept of “dimensions” and “metrics” that
applied both at ingestion time and at query time. Druid is unique in that it is
one of the only databases that supports aggregation at data loading time, which
we call “rollup”. But, for some use cases, ingestion-time rollup is not
desired, and it’s better to load the original data as-is. With rollup disabled,
one row in Druid will be created for each input row.

Query-time aggregation
is, of course, still supported through the groupBy, topN, and timeseries
queries.

For additional information, see the “rollup” flag on http://druid.io/docs/0.9.2/ingestion/index for documentation. By default, rollup remains enabled.

Ability to filter on longs

Druid now supports sophisticated filtering on integer-typed columns, including
long metrics and the special __time column. This opens up a number of new
capabilities:

  • Filtered aggregations on time, useful for time comparison queries using two filtered aggregators and a post-aggregator. This can also be used for retention analysis with theta sketches. You can find examples here.
  • Filtering on integer-typed columns, which is especially useful when rollup is disabled using the new rollup-disabling flag.
    Druid does not yet support grouping on longs. We intend to add this capability in a future releases.

New long encodings

Until now, all integer-typed columns in Druid, including long metrics and the
special __time column, were stored as 64-bit longs optionally compressed in
blocks with LZ4. Druid 0.9.2 adds new encoding options which, in many cases,
can reduce file sizes and improve performance:

  • Long encoding option “auto”, which potentially uses table or delta encoding to use fewer than 64 bits per row. The “longs” encoding option is the default behavior, which always uses 64 bits.
  • Compression option “none”, which is like the old “uncompressed” option, except it offers a speedup by bypassing block copying.

The default remains “longs” encoding + “lz4” compression. In our testing, two
options that often yield useful benefits are “auto” + “lz4” (generally smaller
than longs + lz4) and “auto” + “none” (generally faster than longs + lz4, file
size impact varies). See the PR for full test results. See “metricCompression”
and “longEncoding” on
http://druid.io/docs/0.9.2/ingestion/batch-ingestion.html
for documentation.

We’re creating an additional blog post for this work. Stay tuned to our blog for more information.

Release Notes

In addition to these major highlights, Druid 0.9.2 contains a number of other
improvements. For more information about these changes, bug fixes, and more,
check out the full release notes here.

Thanks!

The Druid community’s efforts made all the great improvements in this release
possible. Thanks to all the members of the community who contributed to this
release.

Other blogs you might find interesting

No records found...
Apr 14, 2025

It’s Time to Rethink Observability: The Event-Driven Future

Observability has evolved. Forward-looking teams are already moving beyond static dashboards and fragmented telemetry—treating all observability data as events and unlocking real-time insights across their...

Learn More
Mar 31, 2025

5 Reasons to Use Imply Polaris over Apache Druid for Real-Time Analytics

Introduction Real-time analytics is a game-changer for businesses that need to make fast, data-driven decisions. Whether you’re analyzing user activity, monitoring applications and infrastructure, detecting...

Learn More
Feb 28, 2025

Introducing Apache Druid® 32.0

We are excited to announce the release of Apache Druid 32.0. This release contains over 341 commits from 52 contributors. It’s exciting to see a 30% increase in our contributors! Druid 32.0 is a significant...

Learn More

Let us help with your analytics apps

Request a Demo