Combating financial fraud and money laundering at scale with Apache Druid

Apr 26, 2023
Julia Brouillette

Apache Druid provides the backbone for fast, ad hoc investigations at extreme scale—something you won’t find in legacy data stores or off-the-shelf case management and regulatory reporting products. 

Perhaps no other industry has a more complex relationship with data than financial services. In an era of accelerated data growth, financial services institutions (FSIs) and FinTech vendors must process and share information including customer data, risk data, and transaction data quickly, securely, and under strict compliance requirements which vary, sometimes widely, between regulators. 

Gone are the days of satisfaction with overnight data processing batches, which were once the status quo. Internal and external stakeholders expect immediate insights from streaming and historical data in order to make decisions. FSIs are under pressure to expand the scope of data they can deliver to clients and regulators while narrowing the gap between data creation and access. 

This window of time is especially important when it comes to combating threats like fraud and money laundering. Cybercriminals are becoming more sophisticated, making fraud a moving target. Costs of fraud-fighting to build trust with customers and prove regulatory compliance are high and rising.

The current state of anti-fraud and AML: what’s missing?

For decades, anti-fraud and anti-money laundering (AML) solutions have relied on after-the-fact batch processing. AML was considered an investigative process to be executed after money laundering activity was suspected. 

Today, in most cases, security approaches at an FSI or FinTech company are driven by a combination of off-the-shelf products and batch data warehouse systems, which are used to generate reports, feed static dashboards, and provide analysts with a means to run ad hoc queries on data. FSIs also use these systems to power machine learning tools that can help automate client verification and know your customer (KYC) processes, which are an integral part of meeting every FSI’s compliance needs. 

Even with these tools and processes in place, security breaches are a more constant threat than ever—and when they happen, it’s no longer acceptable to wait days or even minutes to react. Requirements are evolving toward proactive response, enabling analysts to detect threats at early stages and defeat each attack before it infiltrates network infrastructure, compromises valuable data and assets, or otherwise harms the organization. 

Fraud analysis involves gathering data from various sources—real-time application data, historical transaction logs, KYC systems, and more to trace exactly what happened, when, and what systems may be impacted. For these ad hoc investigations, off-the-shelf or static data warehouse solutions work best in cases where the volume of data is relatively moderate, as it is for a regional bank, for instance. 

So, when does it make sense to upgrade to a real-time analytics database for anti-fraud and AML use cases? There are a few things to keep in mind:

  1. Subsecond latency at scale: Larger FSIs or FinTech software providers often need to handle high throughput, up to millions of real-time data ingested per second. If you need to run complex queries on top of that with a response time of less than a few seconds, you’ll likely overextend the limits of any transactional database or data warehouse (unless you have an unlimited budget) and sacrifice performance.
  2. Streaming data in context: As FSIs respond to the increasing pace of modern requirements, data movement is becoming focused on streaming technologies like Apache Kafka and Amazon Kinesis. This drives a need to analyze streaming events as they are created at a tremendous scale, and a need to look at those streams in context alongside historical data. Without a database purpose-built to analyze both streaming and batch data in real time, data consistency and time-to-insight could suffer.
  3. Interactive ad hoc investigations: Ideally, security and compliance teams have unrestricted ability to query raw data without latency, so they can fly through investigations and find the answers they need without having to wait minutes, or even hours, for data to load. This isn’t possible with traditional BI tools, which are good for automating report creation but can’t support ad hoc queries at scale.

Digital native FinTech companies and a growing number of FSIs are going beyond batch-based data warehouses (e.g. Snowflake, Google BigQuery) and off-the-shelf tools (e.g. Hummingbird, Alloy) by building analytics applications to power their anti-fraud and AML projects using powerful open source databases.

Enabling slice and dice AML investigations at DBS Bank

Development Bank of Singapore, a leading financial services firm, is a great example. 

DBS is using Apache® Druid to power AML investigations for compliance. Before using Druid, teams shared giant data files back and forth for batch processing, which took too much time to meet the needs of AML investigations. Now, Druid ingests more than 4 million transactions per day and tracks alerts generated by the AML workflow. Investigators now perform ad hoc, slice and dice analysis on millions of rows of data in seconds. 

In addition to ad hoc investigations, Druid serves aggregated data required for DBS’ machine learning models, which inform their AML processes. No other data store matched Druid’s performance for their AML use cases, according to DBS vice president Arpit Dubey. 

“Druid’s real-time capabilities and integration with Kafka without writing any complex code are one of the key reasons Druid fits into our ecosystem so well. Another reason for going real time was to move toward building an integrated system—where security screening, forensic investigations, customer authentification checks, and other processes all become more responsive and on-demand.”  – Arpit Dubey, VP, Platform Architect, Data and Analytics at DBS Bank

Accurately identifying anomalies in real time at Sift

Another example is Sift, a sophisticated fraud prevention tool helping hundreds of businesses block fraudulent transactions. 

Sift’s customers, who include Doordash, Poshmark, Hello Fresh, and other high-volume processors of fraud-vulnerable transactions, use the scores generated by machine learning models to decide whether to accept, block, or watch events and transactions. Since each customer has unique traffic and decision patterns, Sift needed a tool that can automatically learn what “normal” looks like for each customer.

Sift built Watchtower, an automated monitoring tool that uses anomaly detection algorithms to learn from past data and trigger alerts in real time on unusual changes. Watchtower is powered by Apache® Druid, enabling Sift’s customers with interactive experiences with data at scale. Using Druid, Watchtower users are able to aggregate data by a variety of dimensions from thousands of servers. They can then query this data across a moving time window with real-time analysis and visualization.

“As the leader in Digital Trust & Safety, we enable online businesses to prevent fraud and abuse while streamlining customer experiences. We built an anomaly detection engine called Watchtower, which uses machine learning models to detect unusual activity. Apache Druid and Imply help us analyze data with an interactive experience that provides us with on-demand analysis and visualization.” – Neeraj Gupta, SVP of Engineering and Cloud Operations at Sift

Sift is now able to proactively contact customers when anomalies are detected, preventing potential business impact for their customers.
To read more about how Apache Druid supports real-time security and fraud analytics, check out this page.

Other blogs you might find interesting

No records found...
Jul 23, 2024

Streamlining Time Series Analysis with Imply Polaris

We are excited to share the latest enhancements in Imply Polaris, introducing time series analysis to revolutionize your analytics capabilities across vast amounts of data in real time.

Learn More
Jul 03, 2024

Using Upserts in Imply Polaris

Transform your data management with upserts in Imply Polaris! Ensure data consistency and supercharge efficiency by seamlessly combining insert and update operations into one powerful action. Discover how Polaris’s...

Learn More
Jul 01, 2024

Make Imply Polaris the New Home for your Rockset Data

Rockset is deprecating its services—so where should you go? Try Imply Polaris, the database built for speed, scale, and streaming data.

Learn More

Let us help with your analytics apps

Request a Demo