Imply Announces Automatic Schema Discovery for Apache Druid, Reinforcing Druid’s Leadership for Real-Time Analytics on Streaming Data

The third milestone of Imply’s Project Shapeshift brings industry-leading developer ease of use and operational efficiency to Apache Druid in the Druid 26.0 release

Jun 06, 2023

The third milestone of Imply’s Project Shapeshift brings industry-leading developer ease of use and operational efficiency to Apache Druid in the Druid 26.0 release

June 06, 2023 09:00 AM Eastern Daylight Time

BURLINGAME, Calif.–(BUSINESS WIRE)–Imply, the company founded by the original creators of Apache Druid®, today unveiled the third milestone in Project Shapeshift, an initiative designed to evolve Apache Druid and solve the most pressing issues developers face when building real-time analytics applications. This milestone introduces the following:

  • Schema auto-discovery: the ability for Druid to discover data fields and data types and continuously update tables automatically as they change
  • Shuffle joins: the ability to join large distributed tables without impact to query performance, powered by the new multi-stage query engine
  • Global expansion and new enhancements to Imply Polaris, the cloud database service for Apache Druid

“We see streaming data typically ingested in real-time and often coming from a variety of sources, which can lead to more frequent changes in data structure. Imply has now made Apache Druid simple and scalable to deliver real-time insights on those streams, even as data evolves.”

Apache Druid, the analytics database when real-time matters, is a popular open source database and 2022 Datanami Reader’s Choice winner used by developers at 1000s of companies including ConfluentSalesforce, and Target. Because of its performance at scale and under load – along with its comprehensive features for analyzing streaming data – Druid is relied on for operational visibility, rapid data exploration, customer-facing analytics, and real-time decisioning.

Project Shapeshift was announced at Druid Summit 2021 and it marked a strategic initiative from Imply to transform the developer experience for Druid across three pillars: cloud-native, simple, and complete. In March 2022, Imply announced the first milestone with the introduction of Imply Polaris, a cloud database service for Druid. In September 2022, Imply announced the largest architectural expansion of Druid in its history with the addition of a multi-stage query engine.

Druid has always been engineered for speed, scale, and streaming data. It’s why developers at Confluent, Netflix, Reddit and 1000s of other companies choose Druid over other database alternatives,” stated FJ Yang, Co-Founder and CEO of Imply. “For the past year, the community has come together to bring new levels of operational ease of use and expanded functionality. This makes Druid not only a powerful database, but one developers love to use too.”

Companies including AtlassianReddit, and PayTM utilize Imply for Druid because its commercial distribution, software, and services simplify operations, eliminate production risks, and lower the overall cost of running Druid. As a value-add to existing open source users, Imply guarantees a reduction in the cost of running Druid through its Total Value Guarantee.

Project Shapeshift Milestone 3 includes the following major contributions to Apache Druid and new features for Imply Polaris:

Automatic Schema Discovery in Druid

Schema definition plays an essential role in query performance as a strongly-typed data structure makes it possible to columnarize, index, and optimize compression. But defining the schema when loading data carries operational burden on engineering teams, especially with ever-changing event data flowing through Apache Kafka and Amazon Kinesis. Databases such as MongoDB utilize a schemaless data structure as it provides developer flexibility and ease of ingestion, but at a cost to query performance.

Today, Imply announces a new capability that makes Druid the first analytics database that can provide the performance of a strongly-typed data structure with the flexibility of a schemaless data structure. Schema auto-discovery, now available in Druid 26.0, is a new feature that enables Druid to automatically discover data fields and data types and update tables to match changing data without an administrator.

  • Auto detection of new tables: Druid can now auto-discover column names and data types during ingestion. For example, Druid will look at the ingested data and identify what dimensions need to be created and the data type for each dimension’s column.
  • Maintenance of existing tables: As schemas change, Druid will automatically discover the change – dimensions or data types are added, dropped, or changed in the source data – and adjust Druid tables to match the new schema without requiring the existing data to be reprocessed.

“Now with Apache Druid you can have a schemaless experience in a high-performance, real-time analytics database,” said Gian Merlino, PMC Chair for Apache Druid and CTO of Imply. “You don’t have to give up having strongly-typed data in favor of flexibility as schema auto-discovery can do it for you. Net, you get great performance whether or not you define a schema ahead of time.”

“Druid handling real-time schema changes is a big step forward for the streaming ecosystem,” stated Anand Venugopal, Director of ISV Alliances at Confluent. “We see streaming data typically ingested in real-time and often coming from a variety of sources, which can lead to more frequent changes in data structure. Imply has now made Apache Druid simple and scalable to deliver real-time insights on those streams, even as data evolves.”

Large Complex Joins Now Supported in Druid During Ingestion

In Druid 26.0, Apache Druid has expanded join capabilities and now supports large complex joins. While Druid has supported joins since version 0.18, the previous join capabilities were limited to maintain high CPU efficiency for query performance. When queries required joining large data sets, external ETL tools were utilized to pre-join the data.

Now, Druid has added support for large joins at ingestion – architecturally via shuffle joins. This simplifies data preparation, minimizes reliance on external tools, and adds to Druid’s capabilities for in-database data transformation. The new shuffle joins are powered by Druid’s multi-stage query engine – and in the future the community will extend shuffle joins to join large data sets at query-time in addition to ingestion-time.

Continued Innovation for Imply Polaris

Imply Polaris, the cloud database service for Apache Druid, is the easiest deployment model for developers. It delivers all of Druid’s speed and performance without requiring expertise, management, or configuration of Druid or the underlying infrastructure.

This cloud database was built to do more than cloudify Druid; it also optimizes data operations and delivers an end-to-end service from stream ingestion to data visualization.

Today, Imply announces a series of product updates to Polaris that enhance the developer experience, including:

  • Global Expansion – In addition to the US region, Polaris is now available in Europe, enabling customers to run across multiple availability zones as well as multi-regions for improved fault tolerance.
  • Enhanced Security – Polaris adds private networking options by ingesting data over AWS PrivateLink from customers’ Kafka or Confluent clusters in AWS. Customers who want to lower their data transfer costs can also choose VPC Peering for ingestion with Polaris.
  • Expanded integrations – In addition to native, connectorless support for Confluent Cloud, Polaris adds the same native support for Apache Kafka and Amazon Kinesis to easily ingest streaming data from anywhere. Polaris also now provides an API to export performance metrics to observability tools including Datadog, Prometheus, Elastic, and more.

Learn More:

About Imply

At Imply, we are on a mission to help developers become the new heroes of analytics. Our unique database, built from Apache Druid, enables them to develop the next generation of analytics applications. With Imply, developers can build without constraints as our database lets them create interactive data experiences on streaming and batch data with limitless scale and at the best economics. Backed by leading investors including Thoma Bravo, a16z and Bessemer Venture Partners, Imply is on a fast growth trajectory – disrupting the $100B database market – with customers including Salesforce, Atlassian, Reddit, and Intercontinental Exchange. To learn more, please visit: https://imply.io/.

© 2023 Imply. All rights reserved. Imply, the Imply logo, and Polaris are trademarks of Imply Data, Inc. in the U.S. and/or other countries. Apache Kafka, Apache Druid, Druid and the Druid logo are either registered trademarks or trademarks of the Apache Software Foundation in the USA and/or other countries. All other marks and logos are the property of their respective owners.

Contacts

Nicole Gorman
Touchdown PR
imply@touchdownpr.com
(508) 397-0131

Other press releases you might find interesting

Jan 31, 2024
Imply Honored with 2023 Confluent Rising Star Partner of the Year Award
Imply recognized as Rising Star Partner of the Year (ISV) winner for outstanding partnership growth, helping customers drive innovation
Read more
Jan 26, 2024
Apache® Druid Wins Best Big Data Product in the 2023 BigDATAwire Readers’ Choice Awards
Apache Druid named the winner in two categories: 1) Best Big Data Product: Analytics Database, Data Lake, and Lakehouse and 2) Top 3 Big Data and AI Open Source Projects.
Read more
Oct 31, 2023
Imply Introduces a Developer Center for Analytics and Application Innovation
New technical learning destination empowering developers, engineers, and architects to build real-time analytics applications through comprehensive resources and community engagement
Read more
Jul 18, 2023
Imply Joins the Connect with Confluent Partner Program, Creating a Comprehensive Platform for Real-Time Analytics Applications
Imply teams up with Confluent to make building real-time analytics on streaming data easier with a cutting-edge developer experience created from open-source technologies
Read more
Feb 08, 2023
Imply’s DBaaS, Polaris, Named Best Open-Source Cloud Solution in The Cloud Awards
Imply wins industry award in recognition of the rapid customer adoption in its cloud database service; this award combined with the 2022 Best Analytics Database award for Apache Druid demonstrates Imply’s accelerated...
Read more
Dec 13, 2022
Apache® Druid Named Best Analytics Database in the 2022 Datanami Readers’ Choice Award
Apache Druid was named the “Reader’s Choice” winner of the Best Analytics Database category in the seventh annual BigDATAwire (formerly Datanami) Readers’ and Editors’ Choice Awards.
Read more
Sep 20, 2022
Imply Announces Major Open Source Contribution for Apache Druid; New Financial Guarantee for Apache Druid Users
Apache Druid reinforces its leadership position as the real-time analytics database for modern analytics applications—a rapidly growing market at the intersection of analytics and applications Burlingame, CA –...
Read more
May 17, 2022
Imply Announces $100M Investment Led by Thoma Bravo to Drive the Market Shift to Modern Analytics Applications
New funding round validates Imply’s leadership position in the real-time analytics database category BURLINGAME, CA, May 17, 2022 – Imply Data, Inc., the company founded by the original creators of Apache Druid,...
Read more
Mar 01, 2022
Imply Announces Polaris, a Cloud Database Service for Modern Analytics Applications Plus Major Expansion for Apache Druid
Imply, the creators of the most popular database for modern analytics applications, today announced the appointment of Jennifer Palecki as the company’s first chief people officer.
Read more
Dec 11, 2021
Imply Names HR Veteran Jennifer Palecki As Chief People Officer
Imply, the creators of the most popular database for modern analytics applications, today announced the appointment of Jennifer Palecki as the company’s first chief people officer.
Read more
Nov 18, 2021
Eric Tschetter Joins Imply as Field Chief Technology Officer, Reuniting with the Other Original Authors of Apache Druid
Imply, the most popular database for modern analytics applications, today announced the appointment of Eric Tschetter as field chief technology officer (CTO).
Read more
Nov 09, 2021
Imply Introduces Project Shapeshift, the Next Step in the Evolution of the Druid Experience
Imply unveils Project Shapeshift, which will offer developers and organizations a next level Druid experience that reimagines the process of building modern analytics applications. A series of game changing capabilities...
Read more
Oct 27, 2021
Imply to Debut Accreditation Program for Apache Druid® Basics, Giving Developers the Foundation for Building Modern Analytics Applications
Imply, founded by the original creators of Apache Druid®, will debut the Imply Accreditation Program for Apache Druid Basics at Druid Summit 2021. Only those who have registered for Druid Summit 2021 will have early...
Read more
Oct 19, 2021
Imply Announces Full Details of Druid Summit 2021 Virtual Conferences For Americas, EMEA and APAC Delegates
Imply, founded by the creators of Apache Druid®, today announced full details for Druid Summit 2021 virtual conferences. Druid Summit 2021 will concurrently serve delegates across the Americas, EMEA and APAC.
Read more
Jun 16, 2021
Imply Closes $70 Million Series C at $700M Valuation to Extend Leadership as Foundational Platform for Analytics-in-Motion
Imply, the pioneer of Analytics-in-Motion, and founded by the original creators of Apache Druid, today announced the closing of a $70 million Series C led by Bessemer Venture Partners with participation from Tiger...
Read more
Jun 16, 2021
Imply Expands Leadership Team as Analytics-in-Motion Becomes Increasingly Central to Enterprise and Digital Native Organizations
Imply, the pioneer of Analytics-in-Motion, and founded by the original creators of Apache Druid, today announced two key hires to its executive team, with Praveen Rangnath joining as Chief Marketing Officer and Kevin...
Read more
Oct 28, 2020
TrueCar selects Imply Cloud as their self-service analytics platform
Leading automotive digital marketplace to use analytics to unlock insights from digital interactions, improve services with increased agility, and deliver a higher quality experience.
Read more
Apr 16, 2020
Imply 3.3 Extends Performance and Reduces TCO for Real-Time Intelligence
Addition of SQL JOIN operations to the world’s fastest real-time data platform delivers cost-effective self-service analytics to day-to-day decision-makers.
Read more
Jan 30, 2020
Imply Simplifies Cloud Deployment & Operations for Apache Druid
Customers get point-click installation, scale-up and upgrade across private clouds plus Amazon Web Services, Google Cloud Platform and Microsoft Azure.
Read more
Dec 10, 2019
Imply Launches Druid Summit
Imply, the real-time data analytics company, today announced it is organizing the inaugural Druid Summit, to be held at the San Francisco Airport Waterfront Marriott from April 13th through April 15th 2020.
Read more
Dec 10, 2019
Imply Raises $30mm at a $350mm Valuation in Growth
Andreessen Horowitz-backed firm plans to aggressively expand as demand for Apache Druid (incubating) - based solution grows.
Read more

Let us help with your analytics apps

Request a Demo