Batch ingestion with Druid

Oracle

The initial copy of the data from Oracle can be completed using Oracle SQL developer which provides a wizard for exporting data and metadata from the database. That data is then ingesting into Druid via SQL. Subsequent changes to the Oracle analytical data can then be replicated in Druid using LAST_UPDATED date time field to determine the updated or new records along with code to facilitate the migration, using various ELT tools or the Debezium Oracle connector to automate the CDC (Change Data Capture process).

Snowflake

The initial copy of data from Snowflake to Apache Druid can be completed using the Snowflake bulk unloader and ingesting that data into Druid via SQL. After the initial ingest data updates can be handled by a simple change data capture process using the LAST_UPDATED date time field to determine the updated or new records along with code to facilitate the migration or using various ELT tools.

PostgreSQL

Data in PostgreSQL can be exported using the pg_dumpall command line utility. The .sql export can then be imported into Druid using SQL ingestion. After the initial ingest data updates can be handled by a simple change data capture process using the LAST_UPDATED date time field to determine the updated or new records along with code to facilitate the migration, using various ELT tools or the Debezium Postgres connector which automates the CDC (Change Data Capture) process.

MySQL

Data in MySQL can be exported using mysqldump utility. The .sql export can then be imported into Druid using SQL ingestion. After the initial ingest data updates can be handled by a simple change data capture process using the LAST_UPDATED date time field to determine the updated or new records along with code to facilitate the migration, using various ELT tools or the Debezium MySQL connector which automates the CDC (Change Data Capture) process.

MariaDB

Data in MariaDB can be exported using the mariadb-dump/mysqldump utility. The .sql export can then be imported into Druid using SQL ingestion. After the initial ingest data updates can be handled by a simple change data capture process using the LAST_UPDATED date time field to determine the updated or new records along with code to facilitate the migration or using various ELT tools.

SQL Server

Data in SQL Server can be exported using the SQL Server import and export wizard utility. The data can then be imported into Druid. For example, a .csv export can be ingested into Druid via the load data option in the Druid UI. After the initial ingest data updates can be handled by a simple change data capture process using the LAST_UPDATED date time field to determine the updated or new records along with code to facilitate the migration, using various ELT tools or the Debezium SQL Server connector which automates the CDC (Change Data Capture) process.

InfluxDB

Data stored in InfluxDB buckets can be ingested into Druid using Druid batch ingestion. Since the indexing is parallel, extremely large files will ingest faster if they are split into multiple smaller files. For incremental ingesting a data pipeline can be established using ELT tools such as Apache Nifi, Fivetran, or Ascend.io.

MongoDB

The initial copy of data from MongoDB to Apache Druid can be completed using mongoexport, a command-line tool that exports data from a MongoDB collection into a file for Druid batch ingestion. Subsequent database changes in the MongoDB are copied to Druid using Debezium MongoDB connector to place data mutations into a Kafka topic, for Druid stream ingestion.

DataStax Cassandra

Data in Apache Cassandra can be exported using the COPY command. The data can then be imported into Druid. For example, a .csv export can be ingested into Druid via the load data option in the Druid UI. After the initial ingest data updates can be handled by a simple change data capture process using the LAST_UPDATED date time field to determine the updated or new records along with code to facilitate the migration, using various ELT tools or the Debezium Cassandra connector which automates the CDC (Change Data Capture) process.

Elasticsearch

Objects stored in Elasticsearch can be exported in JSON format using the export API. The data can then be imported into Druid using Druid batch ingestion. After the initial ingest data updates can be handled using various ELT tools, such as Apache Nifi, Fivetran, or Ascend.io.

AWS S3

AWS S3 buckets can be ingested into Druid using Druid batch ingestion. Since the indexing is parallel, extremely large

files will ingest faster if they are split into multiple smaller files. For incremental ingesting a data pipeline can be established using ELT code such as Apache Nifi, Fivetran, or Ascend.io.

Azure Blob & Azure Data Lake Store

Objects stored in Azure Blob and ADLS can be ingested into Druid using Druid batch ingestion. Since the indexing is parallel, extremely large files will ingest faster if they are split into multiple smaller files. For incremental ingesting a data pipeline can be established using ELT code such as Apache Nifi, Fivetran, or Ascend.io.

APACHE DRUID

IMPLY PRODUCTS

INTEGRATIONS

By Functional Use

By Application

FEATURED

DRUID CASE STUDIES

Apache Druid

Content

Support

Import database objects easily

Simple fast SQL ingestion

Flexible data schema

Flexible data transformation