Druid 0.23 - Features And Capabilities For Advanced Scenarios

By Abhishek Agarwal & Will Xu

Features and capabilities for advanced scenarios

This is part two of the Druid 0.23.0 release blog. Many of Druid’s improvements focus on building a solid foundation, including making the system more stable, easier to use, faster to scale, and better integrated with the rest of the data ecosystem. This blog is intended for advanced users as well as potential/existing contributors of the Druid project who might want to peek behind the scenes.

Streaming: Druid improves integration with Kinesis

Kinesis supports dynamic re-sharding to accommodate traffic growth. During re-sharding, empty intermediate shards are created. Druid can potentially be stuck due to empty shards. In this release, we’ve added a new capability to ignore those empty shards. You can do this by setting skipIgnorableShards = True as part of Druid common settings or part of the ingestion context.

At the same time, Druid now supports newer, faster Kinesis APIs to query for Kinesis shards. You can access this by setting useListShards = True.

We recommend both settings for users who are using Kinesis ingestion and will make those settings the default in the future.

Task system: Task reports for parallel tasks

Druid now publishes task reports for parallel tasks. This is useful to monitor parallel tasks and is a necessary feature to move to native batch ingestion. The following image shows the parent task and the associated sub-tasks:

Auto-compaction improvements

Auto-compaction system helps you get optimal segment files to achieve good query performance. In this release, we have introduced two new changes to make this system more useful.

The first change supports auto-compaction of mixed granularity overlapping intervals, which was previously not possible. This paves the way to support changing the granularity of data based on the age of data in the future.

The second change allows resources to be used by the auto-compaction system to be adjusted independently from other tasks. This enables users to run auto-compaction more frequently than other tasks such as segment balancing and data drops.

There are also changes in webconsole’s segment view to help visualize the segment fragmentation issues. Specifically, if you see a significant variance between the size of the segments, it’s a good indication that there is some fragmentation of your data. Again, this is where applying auto-compaction can help your overall performance.

Over the next few releases, we aim to make the auto-compaction system enabled by default so segment files can stay optimized.

Querying: Better JDBC

There are a number of improvements around JDBC connection, such as handling trailing slashes, better logging as well as sanitization of exceptions. If you are using JDBC today, we definitely recommend you to upgrade and give the new version a try.

New things for Druid contributors

Overview for contributors

Below are the highlights from the release notes. For full details, please check out the full release notes.

Easier development

Add SQL query ID to response header for failed SQLl query (#11756)
Improved query IDs to make it easier to link queries and sub-queries for end-to-end query visibility (#11809)

Better internal typing systems

Added ARRAY_CONCAT_AGG to aggregate array inputs together into a single array (#12226)
Added a query context to use internally generated SegmentMetadata query (#11429)
Added support for Druid complex types to the native expression processing system to make all Druid data usable within expressions (#12016)
Added the ability to store null columns in segments (Store null columns in the segments #12279)
Druid now returns an empty result after optimizing a GROUP BY query to a time series query (Return empty result when a group by gets optimized to a timeseries query #12065)

Better memory management to reduce OOM during ingestion

Fixed the OOM failures in the dimension distribution phase of parallel indexing (Fix OOM failures in dimension distribution phase of parallel indexing #12331)
Druid no longer creates a materialized list of segment files and eliminates looping over the files to reduce OOM issues (Avoid materializing list of segment files when finding a partition file during shuffle #11903)

Metrics

The Druid 0.23.0 release includes the following metrics and metric dimensions to help you better monitor and operate a Druid cluster:

New metrics

Auto-compaction duty group
Whether a query is vectorized
Shenandoah GC
CPU and CPU sets for cgroups
Jetty server thread pool
Batch tasks finish waiting for segment

New metric dimensions

Auto-compaction duty cycle
Work category for tasks

Druid also now includes a Prometheus emitter by default as well as supports proxying data through HTTP proxy.

Looking for contributors

We’re very thankful to all of the 81 contributors who have made Druid 0.23.0 possible – but we need more!

Are you a developer? A tech writer? Someone who is just interested in databases, analytics, streams, or anything else Druid? Join us! Take a look at the Druid Community to see what is needed and jump in.

Try this out today

For a full list of all new functionality in Druid 0.23.0, head over to the Apache Druid download page and check out the release notes!

Other blogs you might find interesting

No records found...

Jul 23, 2024

Streamlining Time Series Analysis with Imply Polaris

We are excited to share the latest enhancements in Imply Polaris, introducing time series analysis to revolutionize your analytics capabilities across vast amounts of data in real time.

Learn More

Jul 03, 2024

Using Upserts in Imply Polaris

Transform your data management with upserts in Imply Polaris! Ensure data consistency and supercharge efficiency by seamlessly combining insert and update operations into one powerful action. Discover how Polaris’s...

Learn More

Jul 01, 2024

Make Imply Polaris the New Home for your Rockset Data

Rockset is deprecating its services—so where should you go? Try Imply Polaris, the database built for speed, scale, and streaming data.

Learn More

APACHE DRUID

IMPLY PRODUCTS

INTEGRATIONS

By Functional Use

By Application

FEATURED

DRUID CASE STUDIES

Apache Druid

Content

Support

Other blogs you might find interesting

Let us help with your analytics apps

APACHE DRUID

IMPLY PRODUCTS

INTEGRATIONS

By Functional Use

By Application

FEATURED

DRUID CASE STUDIES

Apache Druid

Content

Support

Druid 0.23 – Features And Capabilities For Advanced Scenarios

Features and capabilities for advanced scenarios

Streaming: Druid improves integration with Kinesis

Task system: Task reports for parallel tasks

Auto-compaction improvements

Querying: Better JDBC

New things for Druid contributors

Overview for contributors

Easier development

Better internal typing systems

Better memory management to reduce OOM during ingestion

Metrics

Looking for contributors

Try this out today

Other blogs you might find interesting

Let us help with your analytics apps