Extensions

Druid implements an extension system that allows for adding functionality at runtime. Extensions are commonly used to add support for deep storages (like HDFS and S3), metadata stores (like MySQL and PostgreSQL), new aggregators, new input formats, and so on.

Production clusters will generally use at least two extensions; one for deep storage and one for a metadata store. Many clusters will also use additional extensions.

Loading extensions

Loading bundled extensions

Imply bundles many commonly used extensions out of the box, including all core Druid extensions. See the list of extensions below for your options. You can load bundled extensions by adding their names to your common.runtime.properties druid.extensions.loadList property. For example, to load the postgresql-metadata-storage and druid-hdfs-storage extensions, use the configuration:

druid.extensions.loadList=["postgresql-metadata-storage", "druid-hdfs-storage"]

Note that Imply bundles two sets of configurations: one for the quickstart and one for a clustered configuration. Make sure you are updating the correct common.runtime.properties for your setup.

Loading community and third-party extensions

You can also load community and third-party extensions not already bundled with the Imply distribution. To do this, first download the extension and then install it into your dist/druid/extensions/ directory. You can download extensions from their distributors directly, or if they are available from Maven, the included pull-deps can download them for you. To use pull-deps specify the full Maven coordinate of the extension in the form groupId:artifactId:version. For example, for the (hypothetical) extension com.example:druid-example-extension:1.0.0, run:

java \
  -cp "dist/druid/lib/*" \
  -Ddruid.extensions.directory="extensions-tmp" \
  -Ddruid.extensions.hadoopDependenciesDir="hadoop-dependencies-tmp" \
  io.druid.cli.Main tools pull-deps \
  --no-default-hadoop \
  -c "com.example:druid-example-extension:1.0.0"

You can install downloaded extensions by copying them into dist/druid/extensions. For example,

cp -R extensions-tmp/druid-example-extension dist/druid/extensions/druid-example-extension

You only have to install the extension once. Then, add "druid-example-extension" to druid.extensions.loadList in common.runtime.properties to instruct Druid to load the extension. If you used pull-deps, then once an extension is installed, you can remove the extensions-tmp and hadoop-dependencies-tmp directories that it created.

List of extensions

Some of these extensions are bundled with the Imply distribution, and some are not. To load bundled extensions, see Loading bundled extensions. To load non-bundled, see Loading community and third-party extensions.

For additional documentation on these extensions, see the Druid documentation.

Core extensions

Core extensions are maintained by Druid committers. Some are in experimental status and some are fully production-tested Druid components.

Name Description Docs
druid-avro-extensions Support for data in Apache Avro data format. link
druid-caffeine-cache A local cache implementation backed by Caffeine. link
druid-datasketches Support for approximate counts and set operations with DataSketches. link
druid-hdfs-storage HDFS deep storage. link
druid-histogram Approximate histograms and quantiles aggregator. link
druid-kafka-eight Kafka ingest firehose (high level consumer). link
druid-kafka-extraction-namespace Kafka-based namespaced lookup. Requires namespace lookup extension. link
druid-kafka-indexing-service Supervised, exactly-once Kafka ingestion. link
druid-kerberos Kerberos authentication for druid nodes. link
druid-lookups-cached-global A module for lookups providing a jvm-global eager caching for lookups. It provides JDBC and URI implementations for fetching lookup data. link
druid-lookups-cached-single Per lookup caching module to support the use cases where a lookup need to be isolated from the global pool of lookups. link
druid-s3-extensions Interfacing with data in AWS S3, and using S3 as deep storage. link
druid-stats Statistics related module including variance and standard deviation. link
mysql-metadata-storage MySQL metadata store. link
postgresql-metadata-storage PostgreSQL metadata store. link

Community Extensions

Community extensions are contributed by Druid community members but are not necessarily maintained on an ongoing basis by Druid committers. The Druid documentation contains a list of community extensions.

Imply does not provide support for community extensions.

How can we help?