Imply works well with any event-driven data set such as clickstreams,
timeseries data, or telemetry data. Imply is designed for workloads
where users analyze this data through interactive UIs, where performance
and uptime are critical.
Imply is commonly paired with a message bus such as Apache Kafka or AWS
Kinesis, or a stream processor such as Apache Flink or Apache Spark
Streaming, and acts as a sink for such systems. In these setups, Imply
is the query and visualization layer for the stream delivery and stream
Imply is also often paired with a file system such as HDFS, AWS S3, and other cloud blob stores. In this setup, static files are batched loaded into Imply.
Imply is often used to collect user generated data such as clickstreams,
viewstreams, and activity streams. This data is often produced as a
result of users interacting with digital products. Imply is used to
measure user engagement, track A/B test data for product releases, and
troubleshoot anomalies in usage patterns.
Imply’s core engine, Druid, can compute user metrics such as distinct
counts both exactly and approximately. This enables Imply to compute
measures such as daily active users in under a second approximately
(with 99% accuracy) to view general trends, and also compute exact
values to present to key stakeholders as needed. Furthermore, Imply can
be used for funnel analysis, and to measure how many users took one
action, but did not take another action. Such analysis is useful is
tracking user signups for a product.
Imply’s search and filter capabilities enable rapid, easy groupings of users along any set of demographics. Measure and compare user activity by age, gender, location, and much more.
Imply is commonly used to collect and analyze netflows. Imply is used to
arbitrarily slice and dice flow data along any set of attributes such as
service name, interface name, address, port, protocol, and more. Imply
is also used to calculate complex metrics such as quantiles on packet
size, bytes per second, average flow rate, and more.
Imply leverages Druid’s ability to roll up raw data at ingestion time to
substantially reduce the amount of data that needs to be stored for
queries. In existing production clusters, netflow data stored in Druid
is up to 1000x less than the raw data ingested. This leads to
substantial storage savings and direct performance benefits.
Netflow queries often involve grouping or ranking dozens of dimensions
to measure key metrics or diagnose issues. Many traditional technologies
are unable to provide interactive queries when the number of dimensions
is high, or when the cardinality of the dimensions is high. Druid’s
innovative architecture is able to scale to handle interactive queries
on complex data sets with hundreds of dimensions, and billions of unique
values per dimension.
Netflow analytics often requires computing complex metrics to measure performance or identify issues. For example, to compute burstable billing, you first need to group flows in 5 minute buckets. Next, you need to find 95th percentile flow rate for each bucket. Finally, you need to identify the top devices related to these flows. Druid is built from the ground up to be able to calculate these complex measures with ease.
Leverage Imply to track the operational data generated by applications.
Similar to the user activity use case, this data can be about how users
are interacting with an application or it can be the metrics emitted by
the application itself. You can use Imply to quickly and visually drill
into how different components of an application are performing, identify
bottlenecks, and troubleshoot issues.
Unlike many traditional solutions, there are no limits to the volume,
complexity, and throughput of the data. Rapidly analyze applications
events with thousands of attributes, and compute complex metrics on
load, performance, and usage. For example, rank API endpoints based on
95th percentile query latency, and slice and dice how these metrics
change based on any ad-hoc set of attributes such as time of day, user
demographic, or datacenter location.
Imply is used in production at some of the world’s largest companies to ingest millions of events per second, representing data from thousands of applications.
Imply can be used as a powerful time series solution for server and
device metrics. Ingest machine generated data in real-time, and perform
rapid ad-hoc analytics to measure performance, optimize hardware
resources, or identify issues.
Unlike many traditional timeseries databases, Imply’s core engine,
Druid, is analytics engine at heart. Druid combines ideas of timeseries
databases, column-oriented analytic databases, and logging/monitoring
solutions. Druid combines time-based partitioning, column-oriented
storage, and search indexes in a single system. This means time-based
queries, numerical aggregations, and search and filter queries are
Imply unlocks the power of Druid, and in doing so, enables workflows with device and server metrics beyond what a traditional timeseries database can do. You can include thousands of tags with your metrics, and arbitrarily group and filter on any combination of tags. You can group and rank on tags, and compute a variety of complex metrics. Furthermore, you search and filter on tag values orders of magnitude faster than in traditional timeseries databases.
Imply is commonly used to store and query online advertising data. This
data typically comes from ad servers and is critical to measure and
understand advertising campaign performance, click through rates,
conversion rates (attrition rates), and much more.
Imply’s core engine, Druid, was initially designed to power a
user-facing analytics application for digital advertising data. Druid
has seen substantial production use for this type of data and the
largest clusters in the world hold petabytes of data on thousands of
servers. Imply provides a powerful UI on top of Druid to unlock Druid’s
capability to rapidly slice and dice data.
Leverage Imply to instantly compute impressions, clicks, eCPM, and key conversion metrics, filtered on publisher, campaign, and user information. Ad-hoc slice and dice this data and present the data visually in dashboards and reports. Easily share this data with your team.
Imply is commonly used for BI use cases. Organizations have deployed
Imply to accelerate queries and power applications. Unlike many
SQL-on-Hadoop engines such as AWS Athena (Presto) or Hive, Imply is
designed for sub-second queries, where users can interactively explore
data through a UI. SQL-on-Hadoop solutions such as Presto and Hive are
designed for complex, data warehousing oriented use cases, where queries
may involve many complex joins, and results may take hours to return.
Imply is a great fit if you are developing a user-facing application and
you want your users to be able to run their own self service drill down