Integrations

Analyze stream data at scale

Druid was designed from the outset for rapid ingestion and immediate querying of stream data. No connectors are needed, as Druid includes inherent exactly-once ingestion for data streams using Apache Kafka® and Amazon Kinesis APIs.

See documentation

Streaming analytics with Druid

Apache Druid is purpose built for stream ingestion. It ingests event-by-event, not a series of batched data files sent sequentially to mimic a stream. This means that Druid supports query-on-arrival. It’s true real-time analytics with no wait for data to be batched then delivered.

integrations-streams-diagram
  • icon-massive-scalability

    Massive scalability

    Druid handles data streams up to millions of events per second with ease, ideal for highly dynamic data.

  • icon-exactly-once-semantics

    Exactly-once semantics

    Druid guarantees data consistency - preventing duplicates or data loss - through its native indexing service.

  • icon-continuous-backup

    Continuous backup

    Druid ensure no data loss of streaming data as it persists data segments to deep storage automatically.

Apache Kafka

Kafka is natively integrated with Druid so there is no need for a connector.  The data is loaded into Druid from a Kafka stream using Druid’s Kafka indexing service. The connection to Kafka topics is part of Druid. Define an ingestion spec with “type”: “kafka” that defines the the topic and parameters you want.

Whenever an event is added to the topic, it will immediately become available for your queries in Druid. After an interval, the events will be indexed and added to a segment, committed to both data nodes and durable deep storage. Once the event is fully committed, it is removed from the topic, so every event is written to Druid once and only once.

Learn more about Kakfa and Druid

integrations-streams-apache-kafka

Apache Kafka APIs

Any data stream that supports Kafka APIs can be used for Druid stream ingestion the same as open source Kafka.

This includes:

  • Confluent Enterprise
  • Confluent Cloud
  • Amazon Managed Services for Kafka
  • Azure Event Hub
  • Red Panda
  • Aiven Managed Kafka
  • Alibaba
  • Instaclustr
integrations-streams-Kafka-APIs

Amazon Kinesis

Inherent connection to Kinesis data streams is part of Druid. Define an ingestion spec with “type”: “kinesis” that defines the the stream and parameters you want.

Whenever an event is added to the stream, it will immediately become available for your queries in Druid. After an interval, the events will be indexed and added to a segment, committed to both data nodes and durable deep storage. Once the event is fully committed, it is removed from the stream, so every event is written to Druid once and only once.

integrations-streams-amazon-kinesis
“Druid has native Kafka integration out of the box…we don't need anything to make Apache Kafka and Apache Druid work together. It just works.”
Harini Rajendran  |  Software Engineer, Confluent

Let us help with your analytics apps

Request a Demo