Security threats have existed since the very beginning of the internet. As technologies become more complex—and our society becomes increasingly digitized—security challenges have evolved to keep pace.
Today, the list of potential security risks is long. A disgruntled employee may leak sensitive customer data (such as addresses or financial information) in exchange for compensation. Scammers may gain control of credit card or bank accounts through phishing (or smishing) and engage in fraudulent commercial activity. Hackers may break into smart home networks through IoT devices to make off with credentials like WiFi passwords, logins, and more.
In order to prevent or mitigate these dangers, security professionals must use data collection, aggregation, and analysis to identify and investigate potential anomalies in real time. By detecting these threats in their early stages, teams can stop malicious actors before they infiltrate vital network infrastructure, compromise valuable data and assets, or cause lasting damage to an organization’s reputation.
Given the volume and the variety of data required for security analytics, analyzing and acting on possible threats can feel like filtering out a drop of water from a firehouse—a feat beyond human capabilities. As a result, many teams build algorithms to assist in early detection and warning, relying on machine learning to conduct threat and data analysis in near real time.
To complicate matters, the security landscape can be expansive—comprising billions of endpoints across hundreds of regions and hundreds to thousands of users. A single attack can leave digital fingerprints across every part of the landscape, including network activity, logs, and connection requests, which may reside in different databases or datasets with different schemas.
To effectively safeguard digital environments, teams must build a real-time analytics application that can sift through the torrent of security data, pinpoint potential hazards, and flag them for further investigation. This application requires a database that can:
Ingest, organize, and query data in real time and at scale. Given the time-sensitive nature of many security threats, teams require rapid results from their queries. Delays can mean the difference between detecting and stopping a threat or a full-blown security breach.
Answer ad-hoc questions across huge volumes of data. Due to the complexity of digital environments and emerging risks, it’s not always possible for security teams to know what clues to look for, or which questions to ask of their data. As a result, any database has to be capable of unrestricted, open-ended exploration across a wide range of dimensions, filters, and aggregations. Security teams will have to drill down, slice and dice, and dissect their data through a variety of criteria, such as users, geographic region, and time range.
Provide always-on reliability. Hackers and malware can strike at any time, so security organizations cannot take breaks—nor can their databases. Ideally, teams will use data technologies that don’t require downtime for operations like upgrades, scaling, or rebalancing, and won’t lose or duplicate data.
Manage both real-time and historical data in the same platform. Traditionally, this was split into two different database types: transactional (OLTP) for instantaneous access to data and high performance under load, and analytical (OLAP) for multidimensional analysis. In this paradigm, OLTP databases could not scale massively or perform complex analytical operations, while OLAP data architectures could not return results quickly or support a large number of users or queries.
The unique nature of digital security, however, now requires a data technology that can do both—scale seamlessly and cheaply like an analytical database, support the high traffic and real-time queries of a transactional database, and most importantly, enable access to both types of data in a side-by-side format.
As the database built for speed, scale, and streaming data, Apache Druid empowers teams to prevent fraud and defend against security breaches in real time. With native support for technologies like Apache Kafka and Amazon Kinesis, Druid can effortlessly support streaming data without using additional workarounds or costly connectors.
One key advantage of Druid is query on arrival. Whereas many other databases ingest events in batches and persist them to files before they can be accessed by users, Druid ingests stream data by event and directly into memory at data nodes, where it can be instantly queried. This ensures that important, time-sensitive data is immediately available.
From the beginning, Druid was designed for both reliability and durability. After events are ingested, they are processed and organized into columns and segments before being persisted into a deep storage layer, which serves as continuous backup. Should a node become unavailable, its workload will be spread across other existing nodes, and the data previously stored on the failed node will be pulled from deep storage and loaded onto the other functioning nodes. This ensures that data is never inaccessible.
In addition, Druid can power a wide range of visualizations, providing the unlimited, rapid data exploration that security teams need. To enhance the Druid experience and add further flexibility to data investigation, Imply also provides Pivot, a visualization engine that simplifies the creation of dashboards and graphics like heat maps, stack areas, sunbursts, and more. Pivot can also accommodate different data types and automatically optimize schema for faster, more efficient query results—even when faced with an influx of traffic and users.
Lastly, Druid can simultaneously manage real-time and historical data without sacrificing speed or efficiency, enabling teams to access and compare key data in one platform. This simplifies workflows and makes it easier to fine tune algorithms to more easily discover and flag potential security risks.
With its intelligent automation and powerful machine learning, Sift is a leader in the cybersecurity space, regularly deploying advanced algorithms into production. Sift customers use the scores generated by these machine learning models to assess the potential risk of events or transactions—and to determine whether they should be blocked, accepted, or investigated. Because each customer has unique traffic and decision patterns, Sift needed a tool that could automatically learn what each customer would perceive as normal for their environment.
As a result, Sift built Watchtower, an automated, real-time monitoring platform that can train itself on historical customer data in order to identify and alert on irregularities. Sift chose Apache Druid to power Watchtower, due to Druid’s proven record of providing highly interactive visualizations and real-time data experiences. Using Druid, Watchtower can aggregate data from thousands of servers by a variety of dimensions—and query this data across a moving time window with real-time analysis and visualization.
Now, Sift can notify customers when outliers are detected, helping them get ahead of possible threats and mitigate any potential business impacts.
Quote: As the leader in Digital Trust and Safety, we enable online business to prevent fraud and abuse while streamlining customer experiences. We built an anomaly detection engine called Watchtower, which uses machine learning models to detect unusual activity. Apache Druid and Imply help us analyze data with an interactive experience that provides us with on-demand analysis and visualization.
—Neeraj Gupta, SVP of Engineering and Cloud Operations at Sift.
To learn more about how Sift uses Apache Druid, read our blog post.
For more information about Druid, read our architecture guide. For the easiest way to get started with real-time analytics, start a free trial of Polaris, the fully managed, Druid database-as-a-service by Imply.