We’ve added some new features, functionality and connectivity to Imply Polaris over the last two months. We’ve expanded ingestion capabilities, simplified operations, and increased reliability by running across multiple availability zones.
If you’re not familiar, let me give you a quick recap of what Polaris is all about. It starts with Druid-as-a-Service, with all the advantages of a fully managed cloud service plus additional built-in capabilities for data ingestion and visualization. It enables you to start extracting insights from your data within minutes—without the need to procure any infrastructure. You can use the same cloud database to manage your data from start to scale, with automatic tuning and continuous upgrades that ensure the best performance at every stage of your application’s life.
Polaris continues to advance, helping developers build real-time analytics applications faster, cheaper, and with less effort. Here’s what’s new in February, 2023:
MSQ is now the default batch engine for Polaris
The Multi-Stage Query (MSQ) engine that Imply contributed to the open source community in Druid 24.0 is now the default engine for Polaris batch ingestion. For more info on MSQ…. check out the latest Tales at Scale episode which dives into MSQ.
This is important because MSQ is part of the evolution of Druid to a query engine that continues to keep Druid’s high performance and high scale while adding more versatility. With MSQ, Polaris users instantly get more reliability and speed.
Also, with MSQ for Polaris, we have the foundation to drive further alignment with evolving Druid capabilities, with SQL-based ingestion coming very soon.
Pull ingestion from Open Source Kafka
Since the foundation for instant decision-making is streaming data pipelines, we’re excited to support more ways to mobilize streams in Polaris.
Polaris already dynamically scales ingestion resources based on actual event flow for data streams from Confluent Cloud, Apache Kafka, and Amazon Kinesis.
Many customers have enjoyed using our push API endpoint to ingest data without having to stand up and manage a streaming service. For those customers with existing Kafka clusters, Polaris can now easily ingest directly from Kafka.
With native, connector-less support for Kafka, Confluent Cloud, and Amazon Kinesis, Imply Polaris ingests data with exactly-once semantics to ensure that each event is ingested once and only once. Every event is immediately available for querying, as discussed here.
Polaris now available in Europe
In addition to the US region, Imply Polaris is also now available in Europe, specifically in AWS eu-central-1 region based in Frankfurt.
This means anyone can choose to start a Polaris free trial in the US region (aws us-east-1) or the EU region (aws eu-central-1) and deploy it to production later in one or both regions.
Be on the lookout for Polaris expansion across more regions to meet the needs of our global customers. Stay tuned for more updates.
Schema Registry for easier and faster data ingestion
Connections to Schema Registries is a feature that allows developers to centrally discover, control, and evolve data stream schemas. A schema defines the structure and format of a data record.
This is important because Kafka, at its core, only transfers data in byte format. There is no data verification that’s being done at the Kafka cluster level. In fact, Kafka doesn’t even know what kind of data it is sending or receiving; whether it is a string or integer. This leads to producers and consumers not communicating directly with each other. Instead, information transfer happens via Kafka topics.
But the consumer still needs to know the type of data the producer is sending in order to deserialize it. Imagine if the producer starts sending bad data to Kafka or if the data type of your data gets changed. Your downstream consumers will start breaking
Using a schema as a data format contract between producers and consumers leads to improved data governance, higher quality data, and enables data consumers to be resilient to compatible upstream changes.
The Schema Registry allows disparate systems to share a schema for serialization and de-serialization. For example, assume you have a producer and consumer of data. The producer knows the schema when it publishes the data. With a registry, tables can be automatically created using the right schema for the incoming data.
DATE_EXPAND and UNNEST
If you are a developer dealing with a lot of data and time stamps then DATE_EXPAND combined with UNNEST will make your life easier.
With DATE_EXPAND developers can easily add extra time-granularity by creating an array of timestamps. And with the UNNEST function, it’s now easy to create a corresponding row for each of the time-interval level granularity values.
Why is this important? Let’s say your company delivers online streaming services and it’s your job to analyze viewer patterns. Not only do you need to analyze the unique viewer count during a ‘session’ but with a finer granularity because you need to understand the exact moment when user drop off occurred.
With DATE_EXPAND and UNNEST this becomes a much easier process. With a single SQL statement, you can convert an array with timestamps into multiple rows, each for a single interval (such as one second) with the associated data for that interval.
Learn more and get started for free!
Ready to get started? Sign up for a free 30-day trial of Imply Polaris—no credit card required! As always, we’re here to help—if you want to learn more or simply have questions, set up a demo with an Imply expert.