Imply most often used in use cases where real-time ingest, fast query performance, and high uptime are important. As such, Imply is commonly used for powering GUIs of analytical applications, as a backend for highly-concurrent APIs, or as a complete end-to-end solution for use cases that need fast aggregations.
Common application areas for Imply include:
Some of these use cases are described in more detail below.
Imply is often used to collect user generated data such as clickstreams, viewstreams, and activity streams. This data is often produced as a result of users interacting with digital products. Imply is used to measure user engagement, track A/B test data for product releases, and troubleshoot anomalies in usage patterns.
Imply’s core engine, Druid, can compute user metrics such as distinct counts both exactly and approximately. This enables Imply to compute measures such as daily active users in under a second approximately (with 99% accuracy) to view general trends, and also compute exact values to present to key stakeholders as needed. Furthermore, Imply can be used for funnel analysis, and to measure how many users took one action, but did not take another action.
Imply’s search and filter capabilities enable rapid, easy groupings of users along any set of demographics. Measure and compare user activity by age, gender, location, and much more.
Imply is commonly used to collect and analyze netflows. Imply is used to arbitrarily slice and dice flow data along any set of attributes such as service name, interface name, address, port, protocol, and more. Imply is also used to calculate complex metrics such as quantiles on packet size, bytes per second, average flow rate, and more.
Imply leverages Druid’s ability to roll up raw data at ingestion time to substantially reduce the amount of data that needs to be stored for queries. In existing production clusters, netflow data stored in Druid is up to 1000x less than the raw data ingested. This leads to substantial storage savings and direct performance benefits.
Netflow queries often involve grouping or ranking dozens of dimensions to measure key metrics or diagnose issues. Many traditional technologies are unable to provide interactive queries when the number of dimensions is high, or when the cardinality of the dimensions is high. Druid’s innovative architecture is able to scale to handle interactive queries on complex data sets with hundreds of dimensions, and billions of unique values per dimension.
Netflow analytics often requires computing complex metrics to measure performance or identify issues. For example, to compute burstable billing, you first need to group flows in 5 minute buckets. Next, you need to find 95th percentile flow rate for each bucket. Finally, you need to identify the top devices related to these flows. Druid is built from the ground up to be able to calculate these complex measures with ease.
Leverage Imply to track the operational data generated by applications. Similar to the user activity use case, this data can be about how users are interacting with an application or it can be the metrics emitted by the application itself. You can use Imply to quickly and visually drill into how different components of an application are performing, identify bottlenecks, and troubleshoot issues.
Unlike many traditional solutions, there are no limits to the volume, complexity, and throughput of the data. Rapidly analyze applications events with thousands of attributes, and compute complex metrics on load, performance, and usage. For example, rank API endpoints based on 95th percentile query latency, and slice and dice how these metrics change based on any ad-hoc set of attributes such as time of day, user demographic, or datacenter location.
Imply is used in production at some of the world’s largest companies to ingest millions of events per second, representing data from thousands of applications.
Imply can be used as a powerful time series solution for server and device metrics. Ingest machine generated data in real-time, and perform rapid ad-hoc analytics to measure performance, optimize hardware resources, or identify issues.
Unlike many traditional timeseries databases, Imply’s core engine, Druid is an analytics engine at heart. Druid combines ideas of timeseries databases, column-oriented analytic databases, and logging/monitoring solutions. Druid combines time-based partitioning, column-oriented storage, and search indexes in a single system. This means time-based queries, numerical aggregations, and search and filter queries are extremely fast.
Imply unlocks the power of Druid, and in doing so, enables workflows with device and server metrics beyond what a traditional timeseries database can do. You can include thousands of tags with your metrics, and arbitrarily group and filter on any combination of tags. You can group and rank on tags, and compute a variety of complex metrics. Furthermore, you search and filter on tag values orders of magnitude faster than in traditional timeseries databases.
Imply is commonly used to store and query online advertising data. This data typically comes from ad servers and is critical to measure and understand advertising campaign performance, click through rates, conversion rates (attrition rates), and much more.
Imply’s core engine, Druid, was initially designed to power a user-facing analytics application for digital advertising data. Druid has seen substantial production use for this type of data and the largest clusters in the world hold petabytes of data on thousands of servers. Imply provides a powerful UI on top of Druid to unlock Druid’s capability to rapidly slice and dice data.
Leverage Imply to instantly compute impressions, clicks, eCPM, and key conversion metrics, filtered on publisher, campaign, and user information. Ad-hoc slice and dice this data and present the data visually in dashboards and reports. Easily share this data with your team.
Imply is commonly used for BI use cases. Organizations have deployed Imply to accelerate queries and power applications. Unlike many SQL-on-Hadoop engines such as AWS Athena (Presto) or Hive, Imply is designed for sub-second queries, where users can interactively explore data through a UI. SQL-on-Hadoop solutions such as Presto and Hive are designed for complex, data warehousing oriented use cases, where queries may involve many complex joins, and results may take hours to return.
Imply is a great fit if you are developing a user-facing application and you want your users to be able to run their own self service drill down queries interactively.
Imply works well with any event-driven data set such as clickstreams, timeseries data, or telemetry data. Imply is designed for workloads where users analyze this data through interactive UIs, where performance and uptime are critical.
Imply is commonly paired with a message bus such as Apache Kafka or AWS Kinesis, or a stream processor such as Apache Flink or Apache Spark Streaming, and acts as a sink for such systems. In these setups, Imply is the query and visualization layer for the stream delivery and stream processing systems.
Imply is also often paired with a file system such as HDFS, AWS S3, and other cloud blob stores. In this setup, static files are batched loaded into Imply.