2024 Product Innovation Recap
We’ve made a lot of progress over the past decade. As we reflect upon the past year, we’re proud to share a summary of the top 2024 product updates across both Druid and Imply. 2024 was a banner year,...
Learn MoreIf you are ever in need of a Druid cluster deployment on your laptop, here’s how to do it. This is a quick guide to deploy Apache Druid ® using k8s minikube and min.io on your laptop. Why not use standalone or docker-compose versions instead, you ask? Good point, for the most part it is probably easier to do that. But if you are learning Apache Druid® and at some point you’ll want to scale out on some Kubernetes service in the cloud or on-premises, then it is good to understand how to deploy and manage in a Kubernetes context. This blog post is a starting point that lets you work with Kubernetes but without any cloud fees or extra hardware needed. These instructions are for MacOS with at least 6 CPUs and 8.5GB of memory to spare.
We use the Apache Druid Helm Chart to deploy and manage the Druid database on Kubernetes. Our intent is to get you from zero to running Druid in just a few minutes and without any prior knowledge. These instructions will help you deploy Apache Druid as a distributed cluster on minikube adding min.io on k8s that uses local disk for deep storage. In the end, you have a deployment that does not require access to the internet.
Throughout this post, the kubernetes namespace is set to “dev”, you can change it to whatever you want as long as you change it everywhere. For all such parameters, we’ve color coded them like this, to indicate they are changeable.
So without further ado, the step by step instructions:
Minikube can use different VM management engines like docker and hyperkit. In this test we used hyperkit.
brew install hyperkit
Step 1– You’ll need homebrew to do this step. If you don’t already have it installed, you can find how to do that here.
Step 2 – Open a terminal window and run:
brew install minikube
brew install kubectl
brew install helm
Step 3 – Create the minikube cluster with the following commands from the same terminal window. We’ll need enough memory and CPUs:
minikube start --memory 8192 --cpus 6 —vm-driver=hyperkit
This will take some time and the output should look like this:
😄 minikube v1.23.2 on Darwin 11.3.1 ✨ Automatically selected the hyperkit driver 👍 Starting control plane node minikube in cluster minikube 🔥 Creating hyperkit VM (CPUs=6, Memory=8192MB, Disk=20000MB) ... 🐳 Preparing Kubernetes v1.22.2 on Docker 20.10.8 ... ▪ Generating certificates and keys ... ▪ Booting up control plane ... ▪ Configuring RBAC rules ... 🔎 Verifying Kubernetes components... ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5 🌟 Enabled addons: storage-provisioner, default-storageclass 🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
Step 1 – Open another terminal for min.io setup.
Step 2 – Add the min.io chart repository for helm:
helm repo add minio https://charts.min.io/
Step 3 – Create a minio_values.yaml file with the following content:
mode: standalone replicas: 1 rootUser: rootuser rootPassword: rootpass123 persistence: size: 50Gi enabled: true resources: requests: memory: 512M environment: MINIO_SITE_REGION: “us-west-1”
Step 4 – Install min.io using helm:
helm install -n dev --create-namespace -f minio_values.yaml minio minio/minio
NAME: minio LAST DEPLOYED: Fri Feb 18 15:04:35 2022 NAMESPACE: dev STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: MinIO can be accessed via port 9000… ...
Step 5 – Setup port forwarding from localhost into the min.io console service:
export POD_NAME=$(kubectl get pods --namespace dev -l "release=minio" -o jsonpath="{.items[0].metadata.name}")
kubectl port-forward $POD_NAME 9001 --namespace dev
Forwarding from 127.0.0.1:9001 -> 9001 Forwarding from [::1]:9001 -> 9001
This kubectl command will run as a foreground process, the terminal will continue to display port forwarding activity for as long as you need access to the Min.io UI.
Step 6 – Configure Deep Storage bucket – On a web browser go to http://localhost:9001
Step 7 – In Users, select the console user, then:
Step 8 – Restart the min.io service using the UI.
Now that we have an S3 API compatible storage manager running in the kubernetes cluster, we can deploy Apache Druid. All other dependencies of Druid are already a part of the project’s helm chart. So let’s start another terminal window and execute the following:
Step 1 – Clone the latest Apache Druid code:
git clone https://github.com/apache/druid
cd druid
Step 2 – Retrieve chart dependencies for zookeeper and postgresSQL:
helm dependency update helm/druid
Getting updates for unmanaged Helm repositories... ...Successfully got an update from the "https://charts.helm.sh/incubator" chart repository ...Successfully got an update from the "https://charts.helm.sh/stable" chart repository ...Successfully got an update from the "https://charts.helm.sh/stable" chart repository Hang tight while we grab the latest from your chart repositories... ...Successfully got an update from the "minio" chart repository Update Complete. ⎈Happy Helming!⎈ Saving 3 charts Downloading zookeeper from repo https://charts.helm.sh/incubator Downloading mysql from repo https://charts.helm.sh/stable Downloading postgresql from repo https://charts.helm.sh/stable Deleting outdated charts
Step 3 – Create the parameter overrides needed to deploy locally by creating a file called k8s_minikube.yaml with the following content :
configVars: druid_extensions_loadList: '["druid-histogram","druid-datasketches","druid-lookups-cached-global","postgresql-metadata-storage","druid-s3-extensions"]' druid_storage_type: s3 druid_storage_bucket: druidlocal druid_storage_baseKey: k8s-minikube/segments druid_s3_accessKey: access123 druid_s3_secretKey: secret1234567890 AWS_REGION: "us-west-1" druid_s3_forceGlobalBucketAccessEnabled: "false" druid_storage_disableAcl: "true" druid_indexer_logs_type: s3 druid_indexer_logs_s3Bucket: druidlocal druid_indexer_logs_s3Prefix: k8s-minikube/logs druid_indexer_logs_disableAcl: "true" druid_s3_endpoint_signingRegion: "us-west-1" druid_s3_endpoint_url: "http://minio:9000" druid_s3_protocol: "http" druid_s3_enablePathStyleAccess: "true"
One other useful property is the druid image version that’s being deployed. Add this section to the yaml file to change it to any version you need to work with:
image: tag: 0.22.1
Step 4 – Deploy the Druid cluster:
helm install druid helm/druid --namespace dev -f k8s_minikube.yaml
W0218 15:36:46.492231 25538 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0218 15:36:46.754347 25538 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
NAME: druid
LAST DEPLOYED: Fri Feb 18 15:36:46 2022
NAMESPACE: dev
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
the notes include port forwarding instructions, but we are using a slightly different parsing to find the router pod (see below)
Step 5 – We’ll need to redirect port 8888 from localhost into the cluster’s router service which provides the Apache Druid UI:
export PODNAME=$(kubectl get po -n dev | grep router | cut -d" " -f1)
kubectl port-forward pod/$PODNAME 8888 -n dev
Step 6 – You may need to wait a bit until all pods are ready. Some may restart as they timeout waiting for dependencies to get started, but they will ultimately all end up in READY 1/1 state. You can monitor the startup with the following command:
kubectl get pods -n dev
We’re done! Now you can use Druid by accessing http://localhost:8888 and run through some tutorials.
We’ll leave you with a few useful commands to manage the cluster.
You can see which pods are running on the cluster by using:
kubectl get pods -n dev
NAME READY STATUS RESTARTS AGE druid-broker-744c5f46b7-crxbg 1/1 Running 1 (4m8s ago) 7m28s druid-coordinator-7c79f9c6c9-4wg67 1/1 Running 1 (4m8s ago) 7m28s druid-historical-0 1/1 Running 0 7m28s druid-middle-manager-0 1/1 Running 0 7m28s druid-postgresql-0 1/1 Running 0 7m28s druid-router-84d7cc6d87-qkl9m 1/1 Running 0 7m28s druid-zookeeper-0 1/1 Running 0 7m28s druid-zookeeper-1 1/1 Running 0 4m26s druid-zookeeper-2 1/1 Running 0 3m54s minio-844b956d8b-tfcq6 1/1 Running 0 39m
Notice that middle managers and historicals are deployed as stateful sets and therefore have a predetermined naming convention while router, broker, coordinator are all using ephemeral names. If you feel adventurous and want to see what’s inside, you can open a shell into any of the pods using:
kubectl exec -it <podname> -n dev -- /bin/sh
You can retrieve the logs of any of the pods with:
kubectl logs <podname> -n dev
Add a -f to tail the log indefinitely, very useful when debugging!
minikube stop – stops the whole kubernetes cluster, use this if you want to be able to use the same deployments later. Any loaded data will still be available at restart.
minikube start – restarts the cluster and brings back any install deployments (minio and druid).
minikube delete – this is destructive, it will wipe out everything including the kubernetes cluster, restarting will require a new set of helm install commands to redeploy.
If you want to have a Druid cluster handy but without using resources while you aren’t using it, you can use minikube stop to pause and minikube start to resume. You will need to restart the port-forwarding commands to gain access to the corresponding UIs after restart completes.
To remove a deployment you can use helm uninstall:
helm uninstall druid -n dev removes druid deployment.
helm uninstall minio -n dev removes min.io deployment.
If you want to make changes to the configuration, update the yaml file and apply the changes live with:
helm upgrade druid helm/druid --namespace dev -f k8s_minikube.yaml
helm upgrade minio minio/minio --namespace dev -f minio_values.yaml
We’re planning a series of Druid on Kubernetes related blog posts. Coming soon, we’ll delve into how to use the helm chart to deploy to the cloud using more CPU, memory and storage. Also feel free to reach out with comments or suggestions at sergio.ferragut@imply.io.
2024 Product Innovation Recap
We’ve made a lot of progress over the past decade. As we reflect upon the past year, we’re proud to share a summary of the top 2024 product updates across both Druid and Imply. 2024 was a banner year,...
Learn MoreDruid Summit Lakehouse Panel: A Deep Dive into Data Lakehouses and Apache Druid
At the inaugural in-person Druid Summit this past October, industry leaders gathered to explore the future of data, streaming analytics, and more. In these panels industry experts answered questions about streaming...
Learn MoreRecap: Druid Summit 2024 – A Vibrant Community Shaping the Future of Data Analytics
In today’s fast-paced world, organizations rely on real-time analytics to make critical decisions. With millions of events streaming in per second, having an intuitive, high-speed data exploration tool to...
Learn More