Tutorial: An End-to-end Streaming Analytics Stack for syslog Data

Read Part 1. This is Part 2 of our ongoing series on using Imply for network telemetry data.

In this tutorial, we will step through how to set up Imply, Kafka, and syslog-ng kafka to build an end-to-end streaming analytics stack that can handle many different forms of log data. The setup described will use a single AWS instance for simplicity, but can be used as reference architecture for a fully distributed production deployment.

Prerequisites

A bare metal server or cloud instance (such as an AWS m5d.xlarge instance) with 16GB RAM, 100GB of disk, and an ethernet interface.
The server should be running Linux.
You should have sudo or root access on the server.
A router, switch, firewall, or host that can send syslog data.

The architecture we will be setting up looks like the following:

Install librdkafka, syslog-ng and syslog-ng kafka

Follow the installation steps for librdkafka, syslog-ng and the syslog-ng kafka connector from the following URL but ignore the configurations:
https://syslogng-kafka.readthedocs.io/en/latest/readme.html

For the syslog-ng configuration use the following as an example. Fill in the source IP and destination port of your device sending syslog data. This setup is specific for UDP syslog and can be used to collect from external devices, like routers, sending syslog.

sudo vim /etc/syslog-ng/syslog-ng.conf

######################## # Sources ######################## # This is the default behavior of sysklogd package # Logs may come from unix stream, but not from another machine. # source s_src { #  system(); #  internal();  udp(ip(<your source router IP>) port(514)); };

Configure the syslog-ng Apache Kafka destination

vim /etc/syslog-ng/conf.d/kafka.conf

destination syslog_to_kafka { python(   class("syslogng_kafka.kafkadriver.KafkaDestination")     on-error("fallback-to-string")     options(       hosts("<host_ip_for_kafka>:9092")       topic("syslog")       msg_key("src_ip")       verbose("True")       display_stats("True")     ) ); }; log { source(s_src); destination(syslog_to_kafka); };

Modify kafkadriver.py to send JSON formatted messages

Edit the file

sudo vim /usr/local/lib/python2.7/dist-packages/syslogng_kafka/kafkadriver.py

Change the following line

msg_string = str(msg)

msg_string = json.dumps(msg)

After this change when you start syslog-ng any messages sent to Kafka will have key:value correctly quoted.

Install Imply

Download the most recent Imply distribution by going to the following URL:
https://imply.io/get-started
Refer to the following quickstart for installation help and system requirements:
https://docs.imply.io/on-prem/quickstart
Modify conf-quickstart/druid/\_common/common.runtime.properties with the right directories for segments and logs. If you have plenty of local disk you can keep the default configuration. A good reference is the Imply quickstart documentation:
https://docs.imply.io/on-prem/quickstart
Start Imply from the Imply directory with the quickstart configuration by typing the following:

sudo bin/supervise -c conf/supervise/quickstart.conf &

Install Kafka

Download the most recent Kafka distribution from the following URL:
https://kafka.apache.org/downloads

Note: The Imply distribution already includes Apache Zookeeper, which Kafka will use when you start it.

Start Kafka with the following command from within the Kafka directory:
```
sudo ./bin/kafka-server-start.sh config/server.properties &
```
Create a Kafka topic using the following command where \<topic name> is replaced with the name you want – such as syslog. From the Kafka installation directory, run:
```
sudo ./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic <topic name>
```

Start syslog-ng with Kafka

sudo service syslog-ng start

To see messages that are received
```
sudo syslog-ng -F
```
Start sending syslog to the system you have just set up. Make sure to change your security rules to allow the source IP of the syslog sender and the destination port that you configured on the router to send to. If everything is working properly you should see syslog messages displayed on your console. When you see syslog messages registered, you can check your Kafka consumer by running the following from the Kafka installation directory.:
```
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic <topic name> --from-beginning
```

Connect Kafka and Imply

Start Imply by opening a browser and either going to localhost:9095 (if browser is being run from your localhost) or \<public_ip:9095>. Remember to modify your security rules to allow destination port 9095 from your source IP.
Select the Data/+Load Data (upper right), and the following options will be displayed.

Select the Apache Kafka.
Fill in the details for the Kafka process including IP:consumer port (typically 9092) and the topic name that you created previously (e.g. 192.168.1.2:9092).

Select “Sample and continue”
Select “Next” for the remaining screens to start loading your syslog data into Imply.

When your data is loaded you can now slice and dice your syslog data at amazing speeds.

A great way to get hands-on with Druid is through a Free Imply Download or Imply Cloud Trial.

Other blogs you might find interesting

No records found...

Jan 30, 2025

2024 Product Innovation Recap

We’ve made a lot of progress over the past decade. As we reflect upon the past year, we’re proud to share a summary of the top 2024 product updates across both Druid and Imply. 2024 was a banner year,...

Learn More

Jan 30, 2025

Druid Summit Lakehouse Panel: A Deep Dive into Data Lakehouses and Apache Druid

At the inaugural in-person Druid Summit this past October, industry leaders gathered to explore the future of data, streaming analytics, and more. In these panels industry experts answered questions about streaming...

Learn More

Nov 14, 2024

Recap: Druid Summit 2024 – A Vibrant Community Shaping the Future of Data Analytics

In today’s fast-paced world, organizations rely on real-time analytics to make critical decisions. With millions of events streaming in per second, having an intuitive, high-speed data exploration tool to...

Learn More

APACHE DRUID

IMPLY PRODUCTS

INTEGRATIONS

By Functional Use

By Application

FEATURED

DRUID CASE STUDIES

Apache Druid

Content

Support

Other blogs you might find interesting

Let us help with your analytics apps