Make Imply Polaris the New Home for your Rockset Data

Jul 01, 2024
William To

Co-author: Charles Smith

By now, you’ve heard the news: OpenAI has purchased Rockset, and has chosen to sunset Rockset services by September 2024 for existing customers. This tight timeline means that Rockset users are under immense pressure to migrate to another database, learn the new technology, and go live—within weeks. 

Any replacement for Rockset must:

  • Provide compatibility with ANSI SQL, a popular language that remains the preferred way for most developers to query and work with data, as well as REST API support and SDKs for popular languages, like Python.
  • Support real-time analytics from Kafka/Kinesis with end-to-end latency under 10 seconds in production at scale.
  • Accommodate popular data formats, including nested JSON, CSV, Parquet, Avro, and protobuf.
  • Support nested structures, a way to model complex relationships between data (and typically associated with JSON).
  • Effortlessly autoscale to keep pace with demand—while keeping costs down.
  • Offer continuous uptime, as disruptions, slowdowns, or outages can lead to customer churn or financial penalties for organizations.
  • Include a managed service that can automatically configure, run, and monitor database clusters, so that developers don’t have to. 

What is Polaris—and why is it an ideal choice for Rockset users?

Built on Apache Druid, one of the world’s fastest analytical databases, Imply Polaris is a fully managed database-as-a-service—and an ideal alternative for Rockset users. 

From the very beginning, Polaris was designed for high concurrency (think 1000+qps), low latency queries (think milliseconds),at the petabyte scale. Polaris also provides native Kafka and Kinesis integration, enabling a true, real-time query capability. This means that Imply is ideal for real-time analytics applications and systems, as response times will remain consistent even in challenging conditions.

In contrast to other databases, Polaris also provides infinite zoom, or the ability to preserve granularity even while reams of data are rolled up or summarized—a hugely important feature for data teams that need both high-level and detailed views into their environments.

Polaris also provides asynchronous queries for retrieving infrequently accessed, “cold” data in deep storage at a very low cost. This means you don’t need another data warehouse and your query results are always consistent because they all come from the same system.. Rockset users who depend heavily on async queries will find this feature extremely useful.

Like Rockset, Druid uses standard ANSI SQL for querying data, easing the learning curve. In fact, Polaris supports a wide range of ANSI SQL expressions—the same syntax that Rockset users are familiar with. In addition, Polaris includes APIs for programmatic access (so you can more easily work with resources such as tables, files, and ingestion jobs) as well as a SDK that can generate OpenAPI clients in Python, Java, Typescript/Javascript.

In addition, Imply can automatically discover schema changes during ingestion, an ability that is similar (but superior) to Rockset’s automatic indexing capability. With the inferred schema, Druid will also automatically index the data, alleviating the need to manually manage indices. This also has the advantage of providing the flexibility of a schema-on-read database alongside the performance of a schema-on-write solution—not unlike Rockset’s flexible, schema-free developer experience. 

Lastly, Polaris abstracts away most of the operational work associated with provisioning, deploying, and troubleshooting clusters. Polaris can autoscale ingestion based on demand, rightsizing resources so that users will use only what they need, and thus pay only what they use. As a plus, any Polaris to Rockset migration will require lower effort than an open source database: simply set up Polaris, move your data from Rockset to S3, and then ingest into Polaris.

What Polaris can—and cannot—do

Polaris is fantastic for real-time analytics across any industry and use case, and will benefit any organization that needs subsecond queries on large amounts of fast-moving data. 

For example, Internet of Things (IoT) providers can use Polaris to rapidly return results on solar panel performance or help analyze power consumption from residential households. Companies can use Polaris to power customer-facing applications, such as for media or video game performance. Ecommerce platforms can use Polaris to support their retail operations, personalize and recommend experiences or goods to shoppers, and assess product performance.

Polaris also comes with Pivot, a visualization layer with built-in authentication and authorization. Pivot comes with a SDK that makes it super easy to embed visualization into web applications or share with your users securely.

While Polaris is suitable for many users, there are some areas where alternative databases may be a better choice. One example is vector search, which falls under the umbrella of artificial intelligence and machine learning. In short, unstructured data, such as images, text, emails, and videos are assigned a string of numbers (vectors), which are grouped by similarity for easy retrieval.

Another area is full-text search, which is used to retrieve documents by matching words or phrases. Today, most full-text search databases are built on Apache Lucene, an open source technology that includes abilities such as fuzzy search, recommendations, and more. Although Polaris does not currently have a full-text search capability, we are alpha testing this feature. 

As a real-time analytical database, Polaris is not suitable for transactional workloads, which comprise most daily business operations, such as online sales, flight bookings, or hotel reservations. Transactional workloads involve large numbers of computationally light operations (creates, reads, updates, or deletes), and are ideal for databases such as MongoDB, PostgreSQL, or Microsoft SQL Server. Instead, Polaris complements transactional databases by executing fast aggregations across large datasets.

How to migrate from Rockset to Polaris

With minimal effort, you can transition your data to Imply Polaris. If you have any questions, our dedicated professional team is on hand to assist you at each step of the way.

Sign up for Polaris

To migrate your data onto Polaris, you need a Polaris organization. 

Sign up for a Polaris trial.

Start reading data from your stream

If you utilize a streaming service like Apache Kafka or Amazon Kinesis, you can start reading streaming events directly into Polaris; because it is natively compatible with both technologies, Polaris will not require additional plugins or workarounds to do so.

If you push data directly into Rockset, you should integrate with the Polaris API, which can directly push data from an application source into Polaris over HTTP(S). The process is fairly straightforward, and consists of creating a connection through the UI or API, starting a job to ingest via the connection, and finally, sending event data to Polaris.

Backfill data

The next step is to backfill, or to move, data from Rockset into Polaris. In essence, the backfill process occurs by exporting data to S3 and then loading it into Polaris—for the full process, visit our dedicated documentation page.

As a note, you can simultaneously backfill data and migrate queries. You can even go further and serve production traffic and backfill at the same time.

Connect your client

To run queries: https://docs.imply.io/polaris/query 

To connect your apps, use the JDBC API or HTTP API.

Migrate your queries

Use the Query API to rewrite your queries in Druid SQL.

Imply will help you migrate your queries for free! If you have any questions at all, please don’t hesitate to contact our professional services team—they’re here to make migrations as easy as possible.

Conclusion

OpenAI’s purchase of Rockset, and the subsequent deprecation of its service, doesn’t necessarily have to be a setback for your organization. In fact, Imply Polaris, built on the tested Apache Druid architecture, is a fantastic alternative to existing Rockset users. With Polaris, you can support massive growth in user numbers, query traffic, and data volumes; maintain subsecond response times for aggregations and other operations; and enjoy improved performance at a lower cost.

It’s important to note that Polaris features both a lower learning curve (thanks to its extensive SQL support) and a lighter developer workload (due to its heavy automation). Most importantly, because Polaris is built on an open source technology, users will never again have to worry about losing access to their core database technology.

Any Rockset migrations are fast and straightforward, so get started today—and contact us if you have any questions at all.

Other blogs you might find interesting

No records found...
Nov 14, 2024

Recap: Druid Summit 2024 – A Vibrant Community Shaping the Future of Data Analytics

In today’s fast-paced world, organizations rely on real-time analytics to make critical decisions. With millions of events streaming in per second, having an intuitive, high-speed data exploration tool to...

Learn More
Oct 29, 2024

Pivot by Imply: A High-Speed Data Exploration UI for Druid

In today’s fast-paced world, organizations rely on real-time analytics to make critical decisions. With millions of events streaming in per second, having an intuitive, high-speed data exploration tool to...

Learn More
Oct 22, 2024

Introducing Apache Druid® 31.0

We are excited to announce the release of Apache Druid 31.0. This release contains over 525 commits from 45 contributors.

Learn More

Let us help with your analytics apps

Request a Demo