Introducing hands-on developer tutorials for Apache Druid

Jun 06, 2023
Katya Macedo

At Imply, we are always looking for innovative ways to help you learn Apache Druid. To get you started with the Druid APIs, we’ve developed a set of interactive tutorials focused on Druid API fundamentals. These tutorials are available as Jupyter Notebooks and can be downloaded individually or as a Docker container. 

For those of you not familiar with the Jupyter Notebook, it is an open source interactive web application developed by Project Jupyter.

Notebooks are great for creating interactive tutorials because they combine computer code with Markdown text making it possible to call APIs and run commands from the same page. No more context switching!

Explore the notebooks

The following notebook tutorials work with the Druid 25.0 release and later.

Learn the basics of the Druid API

This notebook introduces you to the basics of the Druid REST API. You’ll learn how to retrieve basic cluster information, ingest data, and query data.

Visit Learn the basics of the Druid API to view the notebook on GitHub.

Learn the Druid Python API

This notebook provides a quick introduction to the Druid Python API, a Python wrapper around the Druid REST API. Although the Druid Python API is primarily intended to help with the Jupyter-based Druid tutorials, you can use it in your own notebooks, or in a regular Python program. 

Visit Learn the Druid Python API to view the notebook on GitHub.

Learn the basics of Druid SQL

This notebook introduces you to the unique aspects of Druid SQL with the primary focus on the SELECT statement.

Visit Learn the basics of Druid SQL to view the notebook on GitHub.

Run the notebooks

You can run the notebooks locally on your system or in Docker using the Docker Compose file. The Docker Compose file provides a custom Jupyter container that includes all of the Jupyter-based Druid tutorials and prerequisites. In addition to the Jupyter container, you can run the containers for Druid and Apache Kafka.

Jupyter in Docker requires that you have Docker and Docker Compose. We recommend installing these through Docker Desktop.

Docker Compose setup

Ready to hit the ground running? This method gets you started with the tutorials in no time!

You can run the containers for Jupyter and Druid using the Docker Compose file provided in the Druid GitHub repo.

To get started, download docker-compose.yaml and environment from tutorial-jupyter-docker.zip.

Alternatively, you can clone the apache/druid repo and access the files in druid/examples/quickstart/jupyter-notebooks/docker-jupyter.

In the same directory as docker-compose.yaml, start the application with the following command:

DRUID_VERSION=26.0.0 docker-compose --profile druid-jupyter up -d

The first time you run the compose environment, it can take several minutes to load.

Tip: You pass in the version of Druid as an environment variable that gets read into the docker-compose file. When new versions of Druid come out, update the variable when you launch the tutorials. For example, DRUID_VERSION=27.0.0

Another benefit of using Docker Compose is that you can run different combinations of services, based on what you have specified in the profile flag. For example, if you already have Druid running locally, you can just run the Jupyter container as follows:

docker-compose --profile jupyter up -d

For detailed instructions on how to run the notebooks in Docker, see Docker for Jupyter Notebook tutorials.

——

We’re continuously adding more tutorials to our library. If you have an idea for your own notebook tutorial, please make a contribution! We’ll work with you to merge it to the repo.
In the meantime, don’t be a stranger, check out our Jupyter Notebook-based Druid tutorials in the apache/druid repo and share your feedback.

Other blogs you might find interesting

No records found...
Jun 16, 2026

Splunk Smartstore vs Lumi Loglake

Lumi Loglake lets Splunk teams query logs directly in object storage — AWS S3, Delta Lake, Apache Iceberg — using standard SPL, with results returned as native Splunk events that work with existing dashboards,...

Learn More
Jun 11, 2026

Supercharging Schema-On-Read: Logs in Object Storage Don’t Need a Data Catalog

Machine data architectures are rapidly changing. As telemetry volumes continue to grow and as costs rise, organizations are increasingly moving logs and other machine data into object stores such as AWS S3....

Learn More

Ready to decouple your observability stack?
No workflow changes. No migrations. More data, less spend.

Request a Demo