Imply Videos

Dec 13, 2022

Percentiles at Scale

At Netflix, our engineers often need to see metrics as distributions to get a full picture of how our users are experiencing playback. For example the “play delay”, or the time taken from hitting play to seeing the video start. Measuring this as an average would lose a lot of detail. We have tried various data types to store these distributions in a way that we can query across a massive dataset and get results in low seconds to remain interactive.

T-Digest and datasketches don’t keep up with our needs, so we came up with our own storage format that can handle merging 100s of Billions of rows in sub-second query times.

In this session we’ll introduce Spectator Histogram, a storage format that allows good approximations of distributions, can scale to Trillions of rows while maintaining good query performance and allows for Percentile queries through use with Netflix OSS Atlas and the Druid-Bridge module.