Tutorial: Tranquility Kafka

This tutorial will show you how to load data using Tranquility Kafka, but Imply additionally supports using the experimental Kafka indexing service. See the Loading from Kafka page for more information about choosing an option.

Prerequisites

You will need:

  • Java 7 or better
  • Node.js 4.x or better
  • Linux, Mac OS X, or other Unix-like OS (Windows is not supported)
  • At least 4GB of RAM

On Mac OS X, you can use Oracle's JDK 8 to install Java and Homebrew to install Node.js.

On Linux, your OS package manager should be able to help for both Java and Node.js. If your Ubuntu- based OS does not have a recent enough version of Java, WebUpd8 offers packages for those OSes. If your Debian, Ubuntu, or Enterprise Linux OS does not have a recent enough version of Node.js, NodeSource offers packages for those OSes.

Start Imply

If you've already installed and started Imply using the quickstart, you can skip this step.

First, download Imply 2.0.0 from imply.io/download and unpack the release archive.

tar -xzf imply-2.0.0.tar.gz
cd imply-2.0.0

Next, you'll need to start up Imply, which includes Druid, Pivot, and ZooKeeper. You can use the included supervise program to start everything with a single command:

bin/supervise -c conf/supervise/quickstart.conf

You should see a log message printed out for each service that starts up. You can view detailed logs for any service by looking in the var/sv/ directory using another terminal.

Later on, if you'd like to stop the services, CTRL-C the supervise program in your terminal. If you want a clean start after stopping the services, simply remove the var/ directory.

Start Kafka

Apache Kafka is a high throughput message bus that works well with Druid. For this tutorial, we will use Kafka 0.10.10. To download Kafka, issue the following commands in your terminal:

curl -O http://www-us.apache.org/dist/kafka/0.10.1.0/kafka_2.11-0.10.1.0.tgz
tar -xzf kafka_2.11-0.10.1.0.tgz
cd kafka_2.11-0.10.1.0

Start a Kafka broker by running the following command in a new terminal:

./bin/kafka-server-start.sh config/server.properties

Run this command to create a Kafka topic called tutorial-tranquility-kafka, to which we'll send data:

./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic tutorial-tranquility-kafka

Enable Tranquility Kafka service

Imply includes Tranquility Kafka to support loading data from Kafka. To enable this in the Imply quickstart-based configuration:

  • In your conf/supervise/quickstart.conf, uncomment the tranquility-kafka line.
  • Stop your bin/supervise command (CTRL-C or bin/service --down) and then start it up again by running bin/supervise -c conf/supervise/quickstart.conf.

As part of the output of supervise you should see something like:

Running command[tranquility-kafka], logging to[/home/imply/imply-2.0.0/var/sv/tranquility-kafka.log]: bin/tranquility kafka -configFile conf-quickstart/tranquility/kafka.json

You can check the log file in var/sv/tranquility-kafka.log to confirm that the server is starting up properly.

Send data

Let's launch a console producer for our topic and send some data!

In your Imply directory, generate some metrics by running:

bin/generate-example-metrics

In your Kafka directory, run:

./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic tutorial-tranquility-kafka

The kafka-console-producer command is now awaiting input. Copy the generated example metrics, paste them into the kafka-console-producer terminal, and press enter. If you like, you can also paste more messages into the producer, or you can press CTRL-D to exit the console producer.

Once the data is sent to Druid, you can immediately query it.

Query data

After sending data, you can immediately query it using any of the supported query methods. To start off, try a SQL query with PlyQL:

bin/plyql -h localhost:8082 -q 'SELECT server, COUNT(*) FROM tutorial-tranquility-kafka GROUP BY server'

Next, try configuring a datacube in Pivot:

  1. Navigate to Pivot at http://localhost:9095.
  2. Click on the Gear Icon in the top right of the header bar and select "General settings".
  3. Click "Data Cubes" in the left sidebar to bring up the Data Cubes tab.
  4. In the Data Cubes tab click on "Create new data cube".
  5. Select the source "druid: tutorial-tranquility-kafka" and ensure "Auto-fill dimensions and measures" is checked.
  6. Click "Next: configure data cube".
  7. Click "Create cube". You should see the confirmation message "Data cube created".
  8. View your new datacube by clicking the Home icon in the top-right and selecting the cube you just created.

Load your own Kafka topics

So far, you've loaded data into Imply using an ingestion spec that we've included in the distribution. Each ingestion spec is designed to work with a particular dataset. You can load your own data types into Imply by writing a custom ingestion spec.

To customize Tranquility Kafka ingestion, you can edit the conf-quickstart/tranquility/kafka.json configuration file. See the Tranquility documentation for more details about how to interpret and modify the configuration. After updating the configuration, you can restart Tranquility Kafka by running:

bin/service --restart tranquility-kafka

Note that when loading from your own Kafka topic, Tranquility will only read messages where the timestamp is recent enough (within windowPeriod of the current time). Older events will not be sent to Druid. See the Segment granularity and windowPeriod section of the Tranquility documentation for more details.

Further reading

To read more about loading data with Imply, see our ingestion documentation.

How can we help?