At Imply, we’ve talked often about how the world of analytics is changing (a16z podcast, future of analytics, the new analytics hero). More and more companies are building modern fit-for-purpose analytics applications for either internal or external consumption. These applications need a fast real-time analytical database that can handle any scale and thousands of concurrent users. For example, Atlassian (whitepaper) and Citrix (video) are embedding these applications into their products for their customers. While others, like Netflix (blog), Twitch (video), NTT (blog), and Charter (whitepaper), are building standalone internal analytics applications.
Developers at 1000+ leading companies have adopted Apache Druid as their database for analytics applications, many of whom look to Imply to deliver a simplified experience. Today, we’re excited to announce a major leap forward in ease-of-use with the introduction of Imply Polaris, our fully-managed, database-as-a-service. With Polaris, developers can get started with Druid in minutes and build massively scalable and interactive analytics applications without expertise or management overhead.
It’s quick and easy enough to download the Apache Druid open-source package and run through the quickstart with sample data. But things become more involved when you need to load and analyze your own real-world data which could run into the hundreds of terabytes or petabytes.
The challenges
We wanted to make Apache Druid’s power more easily accessible to individual developers through their journey, from setup and initial application development to production deployment and then scale without depending on anyone else. We identified three challenges: infrastructure, operations, and integrations.
First, we had to abstract the infrastructure so that developers could focus on the application rather than deploying and securing a distributed database. It can be a full-time job in itself to size hardware, tune Druid to ever-changing application needs, and ensure system security. Managing distributed systems and keeping them up-to-date takes time that could be spent focused on application development instead. No one wants to build the “undifferentiated heavy lifting”.
Second, getting the best out of real-time databases requires tuning on multiple layers. Some are around how the database works itself—how it should store data, what indices to use, how to prioritize queries, or how to balance data across servers. Other tuning concerns are more application-centric and involve aligning your data model and queries with the database’s architecture to get the best performance. Either way, an engineer needs to spend valuable time to extract the best performance from Druid.
Third, developers need to get data into and out of the database and present it to users. They need to think about how the data flows out of their application, how it gets transformed, loaded into the database, and ultimately presented to the end user. They may need additional data infrastructure to build these pipelines. That’s more overhead to build and manage, just to start with a basic application. The other direction also requires some thought. How does a developer present the data to the user? How does it provide them the ability to drill into the data and explore it when they have questions? Will they need to build a whole visualization system before they get to “hello world”?
Introducing Imply Polaris: the easiest way to build analytical applications
Imply Polaris is a fully managed Database-as-a-Service and more. Most similar offerings focus only on deployment management capabilities. Sure, Polaris has all the SaaS-y features you’d expect:
Polaris can get you started with your own database in 5 minutes
It follows all the best practices for security and is SOC2 (type 1 now, and type 2 in a few months) and HIPAA compliant
It easily scales up and down your cluster with a couple clicks and no downtime
It autoscales ingestion resources to ensure that you always load your files quickly and can maintain a lag for streaming data of less than 10 seconds.
We consider these capabilities table stakes for such an offering. But what we’re really most excited about is how Polaris rethinks the developer experience end-to-end.
For example, we know that developers who want to start building an application don’t want to have to set up additional data infrastructure or punch through a firewall to connect an external service before loading a single byte of data. On the other hand, it’s easy for a developer to simply push the data they want directly from their application to Polaris, so we made push-based streaming available in Polaris.
At the same time, we wanted to make it easy for developers to get the best performance. Imply Polaris automatically implements best practices for performance. For example, Polaris automatically chooses the right partitioning scheme based on the state of the table and automatically compacts data as streaming data lands in. Built-in performance monitoring allows developers to drill into the performance of their queries. Over time, Polaris will become even more intelligent as we build in tooling to help developers understand the impact of various performance tuning knobs. That way they will know, for example, which queries will get faster and which will get slower with one partitioning setup versus another. We will give developers the visibility to understand the performance impacts of pre-aggregation (aka rollup) and how to maximize it, and, more generally, how best to wield Druid’s superpowers and bend them to their will.
The last point I’ll touch on is our visualization layer. Users familiar with Imply will recognize Imply Pivot embedded in Polaris. Others will see a powerful UI capable of slicing and dicing highly dimensional data at the speed of thought. This is important because we want developers to be able to deliver immediate value to end users out of Polaris without building a whole custom application. As soon as data is in Polaris, users can start creating visualizations and drilling down into interesting data patterns to understand what’s going on and why. Over the next year, we’ll be improving Pivot’s ability to be embedded in external applications or to be used by customers wholesale as an application offering for their own customers.
What’s coming up
We’re rapidly developing and expanding Polaris’s capabilities. We’ll soon be adding more regions, more project sizes, more capabilities around ingestion including pull-based ingestion, and more autoscaling capabilities. We’ll be expanding our performance monitoring capabilities and the available workflows to optimize performance. For now, we’re focusing on making sure that every new capability we add to Polaris comes with an amazing developer experience.
How to get started with Imply Polaris
It’s easy to get started with Polaris. Simply sign up for a free trial at https://imply.io/polaris-signup, no credit card required, and within a few minutes you’ll be up and going building apps to slice and dice data like a ninja.
Other blogs you might find interesting
No records found...
Nov 14, 2024
Recap: Druid Summit 2024 – A Vibrant Community Shaping the Future of Data Analytics
In today’s fast-paced world, organizations rely on real-time analytics to make critical decisions. With millions of events streaming in per second, having an intuitive, high-speed data exploration tool to...
Pivot by Imply: A High-Speed Data Exploration UI for Druid
In today’s fast-paced world, organizations rely on real-time analytics to make critical decisions. With millions of events streaming in per second, having an intuitive, high-speed data exploration tool to...