Real-time data and analytics for fantasy sports

by Pradip Thoke, Dream11 · September 17, 2020

Sports has enjoyed immense popularity with fans, and recent technological advancements have bridged the gap between sports and the fan even more. One such technological innovation is fantasy sports, that puts the fan at the centre! In addition to cheering for their teams, with fantasy sports, fans can be the team owner or manager, making their sports experience more entertaining and wholesome. Online Fantasy Sports is a skill-based digital entertainment platform that is played using the medium of the Internet, wherein sports fans create their own virtual dream teams made up of real-life players from upcoming matches.

Dream11 - India’s biggest fantasy sports platform

Dream11 currently serves over 2.5 million concurrent users and handles 40 million requests per minute across fantasy sports contests for professional sports including cricket, football (soccer), kabaddi, baseball, and basketball.

Needless to say, all of these actual games generate a vast deluge of event data and statistics for each contest, player, tournament, and league, along with in-game statistics of every sort that have to be ingested, stored, processed and analyzed across all of the individual fantasy leagues, and on all aspects of the contests. This amounts to capturing and crunching data on over 3 billion daily events comprising more than 4.5 TB of data per day!!!!

Dealing with growth and rationalizing systems

Like many companies dealing with explosive growth, systems that work perfectly fine when starting out with a small user base, limited service offering, and few employees, often can’t scale and keep up with the requirements of new product introductions, increasing customer expectations, and growth in the number of internal users. At some point, the pain just gets too great and a company has to bite the bullet and figure out how to re-architect systems to accommodate growth and to lay the foundation for a prosperous future.

Envisioning in-house analytics

For Dream11, that project was born under the rubric of building an In-House Analytics capability and center of excellence that would address existing pain points, streamline operations, improve performance, and create new features and functionality to empower the business for new products and services that delight customers.

Challenges of Dream11’s previous architecture

Dream11 has two types of primary data: interactional and transactional. Previously, these data types were stored in separate databases, but needed to be gathered in one system to enable the mapping of user actions to a specific transaction, revise the definition of metrics for the business, build interaction-based audience profiles to drive the sending of promotions to users, and many more features.

In order to accomplish these goals, the first step was that Dream11 decided to go with a time series database. An evaluation led them to Apache Druid open source, but they then realized they needed a managed service and moved to Imply Cloud.

The next step was to figure out how to calculate real-time metrics, which is a combination of user traffic and transactions. The existing web analytics platform, from a large search vendor, was being used for user impressions, and transactional data was being stored in AWS. Because they wanted to have the data in their hands first before passing it off to the web analytics platform, they built a JavaScript-based capture to Kafka, based on an in-house ETL using Presto and Redshift. Fortunately, by using Imply for real-time aggregation and analytics, they were able to replace the search vendor’s web analytics platform for that purpose.

Another problem was that access to the event data/raw data was delayed anywhere from 24 hours to 5 days due to the time needed for ingestion. Similarly, it could not support real-time views of the data when user concurrency surpassed 2 million.

Other objectives of the modernization program were to create a centralized reporting platform, to lower costs, to enable new capabilities, and to create a main pipeline for real-time business requirements that could not be supported by the web analytics platform.

Achieving operational data excellence with Imply and Apache Druid

By implementing Imply, Dream11 now has direct access to the raw, real-time data via over 900 Kafka topics. It streams in from web and mobile applications and is then ingested into Druid, where all the raw data is kept and made available for analytics and reporting. Before, the product team had to submit a request for analytics information that could take hours to days to produce. Now with Imply, business users who aren’t trained data or business analysts have direct self-service analytics capabilities.

Raw events are pushed into Druid with the event name and a massive set of properties for each event. Druid dimensions are created from these properties. One of the most critical dashboards for Dream11 is Top N, such as Top 10 users. Druid analytics makes it a snap to produce these Top N dashboards. At peak volume, Druid is processing 20 million requests per minute (RPM). It is expected to grow to 80 Million RPM by the end of the year. The Druid analytics are currently based off of only 10 dimensions today, but given the scalability achievable with Druid, they are expecting to add many more dimensions and a lot more data. For the time being, they are using 15TB of data that is retained for 10 days with 5TB being loaded per day.

With this approach, Imply supports Dream11’s need for ‘Hot Analytics’, where data needs to be understood and acted upon in real time, while also being able to push aggregated data to other platforms where it can be retained for longer time horizons and used for less time-sensitive purposes (‘warm’ or even ‘cold’ analytics). By taking control of their data with Druid, they can now better support the requirements of the business more cost efficiently.

Setting a foundation for future growth and improvement

With the new In-House Analytics architecture, Dream11 has set the stage for a numerous improvements:

  • Better centralized reporting and accelerated speed of reporting
  • Mapping users across multiple platforms and accurately tracking user journey funnels
  • Eliminating data sampling. since the new platform can ingest all of the raw data
  • Correlating user interaction and transaction data
  • Improving data security and compliance with new data protection standards
  • Significant cost savings
  • More flexible data integration with other sources

Clearly, by taking the time and effort to implement a scalable real-time analytics capability with Imply, Dream11 is poised for even greater success as the fantasy sports market continues to grow.

Back to blog

How can we help?