Things to Consider When Scaling Analytics for High QPS

This 2-part blog post explores key technical considerations to support high QPS for analytics and the strengths of Apache Druid.

For some, the thought of analytics and high QPS (queries per second) together may seem unlikely. After all, we typically think of analytics as the occasional report and dashboard into business metrics.

But, analytics use cases are evolving with high volume, low latency queries on the rise. Companies like Confluent, Target, and Pinterest use analytics for much more than weekly executive summaries. They’re making analytics available across their organizations; their teams are exploring high-dimensional raw data in a free-flowing, ad-hoc nature; and they’re powering analytics applications and data products for 1000s to millions of external users and customers.

In the era of analytics where query volume is the sine qua non “V” of data, how should we think about system architecture – what matters and why?

High QPS or high concurrency: Is there a difference?

First, let’s clear up a common misconception between high QPS and high concurrency. While they are both related to the number of simultaneous queries, they are different metrics when it comes to database performance and system architecture.

QPS is the key technical objective: process a specific number of queries in one second, start to finish. On the other hand, high concurrency is not so much a technical objective but an attribute of a system that can handle multiple simultaneous queries.

In other words, high concurrency is an aspect of system design that enables high QPS. But a database designed for high concurrency (ie. most modern databases today) does not ensure high QPS. There are many more factors at play here.

Factors that make high QPS for analytics hard

The ability of an analytics database – actually, any database for that matter – to handle high QPS is a function of the type of queries and the amount of data that needs to be processed for each query and of course the architecture of the database itself.

Type of Queries: Different types of queries have different levels of complexity, and some queries may require more processing power, memory, or disk I/O than others. For example, a simple query that selects a single row from a table can be processed quickly, while a complex query that performs aggregations will take longer to complete.

Amount of Data: In addition to query complexity, the amount of data that needs to be processed for each query also plays a role in determining the QPS that a database can handle. Queries that involve large amounts of data require more time to scan and process, and require more computing resources than small data sets.

High QPS is less of an issue for transactional databases like MongoDB, Oracle, and MySQL as they’re dealing with simple CRUD operations on specific rows of data. But, high QPS for analytics queries on larger data sets is a much different animal.

So then, let’s consider data warehouses – the classic analytics workhorse for complex queries – like Snowflake, BigQuery, and Redshift. The problem here is that these systems are built for infrequent queries like reports and static dashboards for executive status checks. They’re built to minimize cost per query and do things like separate compute and storage, but that comes at a cost to query latency, making them not ideal for high QPS requirements.

And of course, pre-computing all your aggregations into cache is an option albeit a highly restrictive and labor intensive one.

This is why architecting high QPS for analytics is hard.

High QPS for analytics needs more than hardware

It’s easy to think that throwing more hardware at a problem will make it go away. After all, more resources – CPU, memory, and disk I/O – will increase the number of concurrent connections possible for a database but that alone doesn’t give the performance guarantee for sub-second queries…or the baggage that comes with scaling. More on that in a second.

So solving for high QPS for analytics needs to take in more considerations. Let’s look at them.

Processing efficiency:

The key to designing for fast read-intensive queries lies in minimizing the amount of work and scanning required. And it requires a database with the smarts to know how to take a complex query and determine what to scan or not: don’t read the full data set if it can read the smallest index; don’t send data unnecessarily from server to server; don’t perform large complex joins at query-time if you can do it earlier.

The ideal architecture accepts an analytics query and minimizes the amount of computation required by storing the data and querying it in a highly optimized way, even if querying petabytes of data. So while parallel processing is an obvious way to have a system designed for high concurrency, the system itself needs to have the smarts in its storage engine to read less data to begin with.

Operational burden:

Virtually any database today can scale-out for concurrency. But that doesn’t mean it’ll be fun to do so!

Take for example, PostgreSQL: a classic, general purpose database that developers like to use for everything from transactional processing to analytics too. It’s natively designed as a scale-up architecture and it’s pretty fast for analytics queries when the data set and the concurrency is on the low end. But as your application grows, what can be done?

This is generally what’s involved in configuring PostgreSQL and other OLTP databases for high concurrency:

Database design: ensure that the database schema and table structures are optimized – e.g. manual indexing, distribution, and normalization techniques.
Partitioning: divide larger tables into smaller tables based on a user-defined partition key, with manual sharding to horizontally partition across multiple instances
Replicate the data: create data copies from the primary server to replicas to reduce the load on the primary server and increase concurrency.
Load balance: now with replicas, set up a load balancer to distribute incoming queries across multiple PostgreSQL instances with a connection pooling mechanism to handle the concurrent load.
Query rewriting: analyze query plans and performance metrics to figure out how to optimize the queries
Monitor…optimize…grow…rinse…and repeat

So while “anything is possible” might be true, there’s definitely pain when stretching a database beyond its core use.

Amount of infrastructure:

Why use 1000 vCPU when you can use 100? This might sound obvious, but it requires the right system architecture to minimize the processing horsepower required. So again, storage design, query efficiency, and resource management at scale all play a key role in meeting the performance SLA for high QPS and keeping the costs down.

There’s an easy way…Apache Druid

As we’ve seen here, achieving high QPS for analytics isn’t so easy. There are shortcuts like pre-computing and caching aggregations, but that’s highly restrictive.

That’s where Apache Druid comes in. It’s a high performance, real-time analytics database specifically designed for applications that need ultra-low latency on high volume queries. For example, Target uses Druid for analytics applications that execute over 4 million queries per day with an average response of 300ms; Confluent powers a customer-facing application requiring 350 QPS on data streaming in at 5 million events per second.

And it’s an economical solution too: for example, a performance benchmark comparing Google BigQuery to Apache Druid shows a 12X price for performance difference as Druid fundamentally uses less infrastructure under load.

If you’re curious to know how Apache Druid does it, let’s dive into part 2 to explain how Druid achieves high QPS with ease.