Oct 5, 2020

Zeotap: Data Modeling in Druid for Non temporal and Nested Data

Druid has been the production workhorse for the past 2+ years at Zeotap powering the core Audience planning across our Connect and Targeting products. Though Druid is best suited for data having time as a dimension as it partitions data based on time first, we have used Druid to serve ML powered enhanced insights and Estimation of potential dataset sizes, to assist us with our core business case of Audience planning. These are datasets without timestamp a.k.a non-temporal with high scale and having nested dimensions. These have been achieved using nuanced data modelling to store the data sets and achieve millisecond latency retrieval on top of the same. The core of the presentation would be on the data modelling journey to achieve these use cases detailing the query access patterns. We also delve upon the architecture – ingestion into druid sink and processing including ML. In the end we go over the production setup and configurations and provide the performance tunings applied.