White Paper

How to conduct your own Imply Polaris Proof of Concept

Imply Polaris

Welcome to Imply Polaris! You might be reading this because:

  • You need ease of operations and management for your current open-source Apache® Druid cluster
    • If that’s the case, focus on how easy it is to deliver insights to your business stakeholders using Imply Polaris, without the need to worry about database operations, management and monitoring
  • You already built an analytics application and have issues, like queries not performing within acceptable speeds, not able to meet concurrency demands of your application, or not able to ingest data real time
    • This document provides a structured way to validate these key criteria for your application
  • You are building a new analytics application
    • Any analytics application would need capabilities like interactive analytics at any scale, support for higher concurrent queries and/or analytics on real time data
    • This document provides a structured way to validate these key criteria for your application

Polaris Introduction

Imply Polaris is a fully managed cloud database as a service (DBaaS) built from Apache® Druid to support building highly interactive modern analytics
applications.

Developers and architects choose Apache Druid because of its unique capabilities to deliver:

  • Sub-second analytics at any scale
  • High concurrency at the lowest cost
  • Insights and analytics on both streaming and batch data

Try Imply Polaris for free and start building analytics applications in as little as five minutes at https://imply.io/polaris-signup/

  • Fully-Managed Cloud Service
    • Build modern analytics apps without worrying about the underlying
      infrastructure
  • Database Optimization
    • Get the high performance out Druid without being an expert
  • Single Development Experience
    • Start fast with built-in push-based streaming service and visualization engine integrated into a single UI

30 Day Free Trial

Free Trial Introduction

This guide will help you use our free, self-service, proof-of-concept (POC) trial to evaluate Imply Polaris, the fully managed cloud database as a service to build modern analytics applications. This guide offers tips and advice, and it’s aligned with key test milestones to maximize your 30 day free trial. The 30 day free trial allows you to store up to 200GB of data (input size can be up to 1 TB of JSON,CSV, andTSV data) in Polaris and the database can be used continuously for the entire 30 days without incurring any additional charges.

For most analytics applications and business cases, following the advice in this guide will give you a strong indication of how Imply Polaris can fulfill your business requirements and the needs of your analytics application. If you need more help to investigate your business case / analytics applications requirements, please contact Imply at contact@imply.io

Getting Started

Introduction

After you sign up for a free trial with Polaris, here are three steps to take before you start working with Polaris:

  1. Read about Polaris and watch the short video here
  2. Get started with Polaris by following the quickstart here
    1. Quickstart covers the basics of how to load data, create tables and query
    2. It also covers Polaris’ data cube and visualization features
  3. Bookmark the developer guide here, which will be handy when working with Polaris

A Proof of Concept Plan

Now that you understand what Polaris is and have reviewed the quickstart, here are guidelines and best practices for setting up a proof of concept approach that will allow you to properly evaluate Polaris for building your modern analytics application.

The criteria below represent the features that are most needed for building modern data applications. The criteria below might not be equally important for your application needs, so prioritize them based on your evaluation and application needs.

Recommended Overall POC Process

Here’s our recommended approach:

  • Define your requirements and success criteria
  • Execute the POC
    • Define the scope of the data, beginning with a small dataset, and then iterating
    • In each iteration, estimate the overall data size. Your Polaris trail allows you upto 200 GB of data in Polaris
  • Complete your POC
    • Achieve success criteria
    • Signup for the full experience by providing billing information

With all of that in mind, let’s dig in!

Guidelines for PoC Criteria

Any modern analytics application requires some combination of the following requirements:

  • Interactive analytics at any scale
  • High concurrency at the best value
  • Insights on streaming and batch data

The following guidelines will give you a blueprint on how to structure a PoC to validate the above requirements.

Real Time Ingestion Testing

Understand the throughput needs to support your analytics applications, in terms of number of messages per second and message size. Either create synthetic data or production data based on the above needs. Polaris supports real time ingestion via push ingestion API, Apache Kafka® (and Kafka-compatible APIs), and Amazon Kinesis. Use one of these methods to ingest real time data and watch how data appears instantly in Imply tables.

Refer below docs to learn more about the options in Polaris to load real time data.

Criteria:
Ingest X amount of messages per second (or per minute) with an average message size of Y bytes.

Batch Ingestion

If ingesting data via batch is key to your application needs, consider the following questions:

  • How much data do you need to ingest in a batch?
  • What is the format of the data?
  • How often will you need to ingest each batch of data?

Your trial allows you to store 200GB of data in Polaris. This refers to the amount of data stored after compression, which happens automatically as part of the ingestion process. Because of this compression, your raw data may be much larger than the 200GB.

Refer below docs to learn more about the options in Polaris to load batch data.

  • Files API – https://docs.imply.io/polaris/api-ingest-files/
  • Ingest from Amazon S3 – https://docs.imply.io/polaris/api-ingest-s3/

Criteria:
Ingest data of size X GB every Y hours and validate the same in the Polaris tables.

Interactive Analytics

Providing interactive drill down analytics is key for many analytics applications. Visualization features are integrated into your Polaris environment. There is more
information here.

Before validating this requirement, create a plan on what’s most important for your users to see and what kind of interactive exploration you want to enable for your users. Once you have the flow ready, execute the exploration on your data using Imply Polaris.

Criteria:
Users should be able to analyze data over a period of time (such as 7 days) to find anomalies and should be able to drill down into the data using other dimensions to find the cause of the anomaly.

Concurrency Testing

Understand the most frequently used query patterns for your application. If using Polaris visualization, these queries can be obtained from the query monitoring section. Create a workload framework that includes a mix of the query patterns in tools like JMeter or Locust.

Be realistic about the queries per second that are needed for your application. It’s common to overestimate the query frequency. This will impact query patterns and data model design.

Leverage the built-in monitoring in Polaris to understand how your cluster performs during this test.

Criteria:
Support X queries per second, each completing in Y seconds (typically 1s or 2s) with the representative set of queries that is required by the application.

API Support

API support in any database is essential to build an application around the database or to automate the ingestion and other processes. Take a look at the APIs Polaris provides and the security built into those APIs, including token generation and retirement. Polaris documentation has a detailed overview of all the APIs supported.

The following criteria assume the need for executing queries via APIs. Modify the criteria as needed based on the priority of your analytics application.

Criteria:
Validate the support for different kinds of queries in Polaris query API.

Executing the PoC

First iteration

The goal of the first iteration is testing functionality. Functional testing doesn’t usually require large amounts of data, so start with a small sample dataset of 1GB or less. The goal is to ingest data, translate query patterns into SQL, test out the API’s and verify that Polaris addresses functional requirements.

The SQL statements generated here can and should be used in subsequent iterations to drive concurrency testing.

Second iteration

Next, test with real timeframes and required ingestion methods, as well as higher concurrency in analytic queries.

  • Load a significant portion of test data using the preferred ingestion methods (batch or real time)
    • 1-5% is a good starting point
    • Make sure it covers the full timeframe of your production needs
  • Initiate ongoing incremental loads
  • Test and tune the queries produced in the initial iteration. Here are some performance tuning resources:
  • Drive concurrency using JMeter or similar tool
    • Your trial has limited resources, so contact Imply if you need to execute more than 10 QPS
  • Before considering this iteration complete:
    • Tune both ingestion and queries such that they meet your success criteria

Analytics Application in Polaris

Imply Polaris has an analytics application, giving anyone in your organization or
your external customers the ability to explore and analyze data with visually rich,
easy-to-use tools. These tools include dashboards, query builders, and data
cubes. You can use this as your application instead of building a brand new
analytics application. See below for more details:

Monitoring

Imply Polaris includes built-in monitoring to support your production workloads
and ensure your database is optimized for your application. Monitoring provides
the details below:

  • User queries monitoring
  • Streaming monitoring
  • Detailed metrics

Next Steps

After your trial and you are satisfied with what Imply Polaris can provide, you can continue to use your Polaris database right away with pay-as-you-go billing. You can also continue to test larger data sets and workloads, if needed, with pay-as-you-go billing.

If you still have questions or need more time for evaluation, contact Imply to get answers to your questions.

Also, see the following resources:

Let us help with your analytics apps

Request a Demo