Target and Apache Druid: Real-Time Analytics at Massive Scale

logo-Target-case-studies

Overview

Target is one of the largest retailers in the United States, with brick-and-mortar stores in all 50 states and one of the most-visited ecommerce sites in the country. In addition to typical merchandising functions like assortment planning, pricing and inventory management, Target also operates a large supply chain, financial/banking operations and property management organizations.

Challenge

As a data-driven organization, Target needed a data analytics platform that could address the unique needs of various business units, while scaling to hundreds of thousands of users and accommodating an ever-increasing amount of data.

This enterprise analytics platform needed to allow data to be consumed in a high-performance, easy-to-use way for non-technical users. Even with an abundance of existing vendor products and open-source tools available, they needed one that focused on three main principles: speed, discovery and collaboration, and scalability. They aimed to create a platform that allowed self-service development, data ingestion, and querying – enabling business users to get answers quickly. They also wanted a platform that fostered collaboration and data sharing and could be embedded in various tools and applications. Scalability was critical, as they needed a platform that could serve their large teams and respond to surges in demand in a cost-effective manner.

Solution

Target custom-built a data analytic platform using a simple architecture stack, allowing data ingestion from various sources and offering an interface for users to interact with the platform. They chose Apache Druid as a central component of the platform to store business data (sales, planning, and HR information, and so on) while the top of the stack is where users see and interact with the platform.

“What we’re doing with this analytics platform is giving people the ability to use that data. Data can be consumed in a high performance, easy to use way. [It’s] very likely we’re making that available to non technical users, folks that are maybe marginally good at SQL, but they’re very well versed in business, the enterprise portion of this. We don’t want to limit this to a single domain because this isn’t just a marketing tool or a merchandising tool. Every area of the business that can take advantage of this platform.”

Their custom analytics platform relies on a powerful query generation engine that translates user-authored content into Druid queries. This platform enforces security measures and maintains a metadata database to keep track of various objects and states. One of the key features of the platform is its flexibility, allowing users to bring their own databases and integrate them seamlessly. The platform’s user interface is adaptable, capable of being utilized in back-office applications and mobile apps.

Target chose Druid to power their platform due to its ability to handle time series data efficiently. Druid excels in aggregating and filtering data, as well as providing horizontal scalability. This means that when someone needs to increase query speed, add more users, or store more data, they can simply add hardware. Druid also supports sophisticated operations and complex aggregations, and boasts an active community and comprehensive documentation.

Target’s Druid cluster began as a few virtual machines and has since grown into a robust on-premises system with hundreds of servers. The cluster has multiple brokers and a load-balanced configuration, with tens of thousands of cores and hundreds of terabytes of RAM. Backed by HDFS for deep storage and MySQL for metadata storage, their Druid cluster offers over a petabyte of storage when factoring in RAM and SSDs.

Results

As of 2020, Target 3,500 data sources in their platform, totaling around 3 trillion rows of data. Their custom analytics platform, powered by Druid, offers speed, scalability, and adaptability, making it an invaluable tool for our organization. 

“We see about 4 million queries run on this platform a day against Druid. 4 million is a sizable number, but that’s a fraction of the queries that our users ask. So that’s 200 queries a second at peak.”

And Target achieved their goal of building a platform that could be utilized by both technical and non-technical users alike, across the organization. The platform services about 70,000 daily active users from Target’s stores, distribution centers and in headquarters locations all the way on the other side of the planet.

“There are folks from pretty much every pocket of the business that are using this platform. And then, as we said, we also wanted the data to be embeddable. We’ve got at least. 15 examples of back office or mobile applications that are taking data from Druid and surfacing it to those users.”

Source:
https://imply.io/videos/summit/enterprise-scale-analytics-platform-powered-by-druid-at-target/

Let us help with your analytics apps

Request a Demo