Zillow Group Delivers Real-Time Visibility for Self-Serve Analytics
Zillow enables product and business teams to self-serve ad hoc analytics with 5-minute onboarding and effortless scaling, reducing overhead while accelerating decisions.
Learn More
Metaimpact Delivers Real-Time Visibility into Data Stack Performance
“Dimension Tables have provided a more flexible, scalable, performant, and future-proof solution for our needs. The transition to Imply Polaris was far better than any other database migration we have experienced in the past.”
Metaimpact™ is revolutionizing customer lifecycle management with an outcome-based collaboration tool for software buyers and suppliers. Formerly known as MetaCX, Metaimpact connects SaaS companies with their customers throughout the sales and delivery processes, aiming to drive shared success and better business outcomes.
The development team at Metaimpact manages real-time data and analytics for B2B customers across a broad range of use cases such as sustainability, healthcare, and economic development for IoT. This requires processing large volumes of customer event data with complex mappings that must be consistently and accurately updated.
They initially explored Google Bigtable, storing pre-computed results for real-time analytics, meaning that certain types of computations or aggregations are performed after ingestion but before querying. This was great for query performance, but inadequate for situations when the customer didn’t know in advance what questions to explore.
This led Metaimpact to switch to Imply Enterprise (on-premise) as a solution to ingest and transform streaming data from Kafka into customer-facing analytics. Although Imply Enterprise made an immediate impact on query speed and solved the issue of managing arbitrary queries across multiple data sources at scale, Metaimpact ran into a few challenges that ultimately led them to upgrade to Imply Polaris (fully managed SaaS).
The first challenge was related to infrastructure management. Even with Helm chart-based distribution, the Metaimpact team was spending a lot of time dealing with version upgrades, rolling restarts and capacity management. They knew that Imply Polaris would solve this with a fully managed cloud solution, but before they could make the switch from Imply Enterprise, they had to find a better solution for managing upserts.
Handling complex mappings for thousands of rows of data was challenged by the lack of normalized identifiers, as Metaimpact’s customers weren’t always certain of the correct mapping at the time of ingestion. Denormalizing the data would have required expensive re-ingestion operations, which was infeasible not only due to the cost, but also because they wanted to avoid batch changes to maintain near-real-time updates. Injecting the data as inline tables was also ruled out, as this wouldn’t be able to scale to millions of rows.
Metaimpact initially explored Kafka-driven lookup tables. By using the Kafka ingestion format, updates would be applied in near-real-time, it would support a future state with millions of rows, and it avoided expensive join operations because all of the data would already be in-memory. However, they ran into two core challenges:
`LOOKUP(CONCAT(mappingId, '::', columnName), "combinedLookupTable")`. This was functional, but was memory inefficient because the map ID was a repeated string on every entry within the lookup column.“Polaris Dimension Tables make it possible for us to perform live identity normalization without having to re-ingest raw events when identity mappings are updated. With support for multiple columns, they’re the logical successor to Lookup functions.”
– Jason Schmidt, Software Architect | Metaimpact
Metaimpact transitioned to Polaris Dimension Tables as a more efficient and scalable solution. Dimension Tables are designed to optimize the upserts of high-cardinality datasets by storing dimensions separately from fact data, joining large datasets in real time without the overhead associated with traditional lookup tables (notably, Kafka-based lookup tables do not support multi-field keys or multiple attributes).
With Dimension Tables, Metaimpact no longer needed a compound key. As the map ID, key and expected output could be stored in separate columns, the low cardinality of map IDs could be represented efficiently and wouldn’t lead to memory issues for a very long time.
There were a few hurdles to overcome to switch to Dimension Tables, but Metaimpact found it to be a very reasonable transition:
`LOOKUP(expression, lookupTableName)` to a JOIN syntax. Since they were still working on the transition from Imply Enterprise to Polaris, they needed a format that would support both lookup tables and Dimension Tables. They gave the Dimension Table the same name as the combined lookup table had, and the only difference was needing to prefix the lookup table name with the “lookup” string.By switching to Dimension Tables with Imply Polaris, Metaimpact improved memory efficiency by at least 2x. By reducing data costs, this effectively lengthens the runway for Metaimpact to expand their Imply node size.
The integration with Polaris also simplified management and support, allowing Metaimpact to use Dimension Tables like any other data source and benefit from Imply’s expert assistance. Beyond Dimension Tables, the switch to Polaris significantly reduced the amount of time that their team spent on infrastructure management due to benefits such as elastic autoscaling, committer-led support, automatic updates, and high availability that eliminated the aforementioned rolling restart issue. Taking these time savings into account, Imply Polaris reduced their total costs (TCO) by over 10% compared to Imply Enterprise.
Metaimpact is now fully operational on Imply Polaris, benefiting from faster, more scalable analytics with real-time computation under 120 ms average query speed. They anticipate future improvements as Imply continues to innovate upserts and other core capabilities.
*This story was written by Larissa Klitzke (Senior Product Marketing Manager, Imply) in collaboration with Jason Schmidt (Software Architect, Metaimpact).
Zillow Group Delivers Real-Time Visibility for Self-Serve Analytics
Zillow enables product and business teams to self-serve ad hoc analytics with 5-minute onboarding and effortless scaling, reducing overhead while accelerating decisions.
Learn More
Citrix Delivers Real-Time Observability for Threat Detection and Environment Health
Citrix ingests 3B events daily, delivers 99.9% uptime, and meets SLOs on 90% of queries—providing proactive insider threat detection and visibility into its environment.
Learn More
Paytm Delivers Real-Time Visibility into Customer Behavior at Petabyte Scale
Paytm cut infra costs by 50%, boosted performance 10×, and freed 12 weekly engineering hours by enabling real-time behavioral analytics at petabyte scale.
Learn More
WalkMe Delivers Real-Time Observability Across Device Performance
WalkMe replaced slow search systems with a real-time platform to monitor billions of client devices, giving teams cost-effective visibility into performance and usage.
Learn More
Reddit Delivers Real-Time Analytics for Ad Campaign Performance
Reddit accelerated query performance and improved availability to 99.9%, enabling advertisers to explore six months of ad event data in real time to optimize campaign outcomes.
Learn More
PepsiCo Delivers Real-Time Operational Visibility Across Sales, Supply Chain & Marketing
PepsiCo enables subsecond queries over massive event data, improving responsiveness in sales, predictive out-of-stock detection, and embedding operational insights into tools with minimal overhead.
Learn More
Poshmark Delivers Real-Time Observability for User Behavior & A/B Testing
Poshmark supports 80M users and 60+ data dimensions, cutting dashboards to <5 seconds and enabling A/B test insights in <2 seconds for fast visibility into platform usage.
Learn More
Pinterest Delivers Real-Time Advertising Analytics at Massive Scale
Pinterest processes 500,000 ad events per second and answers 99% of advertiser queries in ~250 ms, reducing event-to-ingestion time to under a minute for near-instant advertiser insights.
Learn More
Amobee Delivers High-Performance Ad Analytics with Real-Time Query Flexibility
Amobee runs subsecond queries over trillions of rows with hundreds of concurrent users, enabling advertisers to explore any market or time period in real time while reducing costs.
Learn More
Orb Delivers Real-Time Observability into Billing Operations
Orb scaled its billing platform with Kafka event streams, optimized query workloads, and saved 20–30 engineering hours/month while extending visibility into revenue reporting.
Learn More
Adikteev Delivers Real-Time Visibility and Alerts for Campaign Health
Adikteev cut dashboard latency from over 7 minutes to seconds, enabling self-service exploration of campaign telemetry and proactively alerting clients to DAU shifts two days earlier than internal pipelines.
Learn More
RixEngine Delivers Real-Time Observability for Global Ad Exchanges
RixEngine reduced costs by over 50% and halved query latency with real-time observability, giving partners instant, actionable insights.
Learn MoreAtlassian Delivers Real-Time Visibility for Usage Analytics
Atlassian improved analytics with 5× faster performance, ≤100 ms queries, and 5+ years of retained data, giving customers instant visibility into long-term usage trends.
Learn More
Target Delivers Real-Time Visibility for Front-Line Business Decisions
Target runs 4M queries per day for 70,000 users with 300–600 ms response times, empowering teams to explore data interactively and act on insights at the edge of operations.
Learn More
The R&A Delivers Real-Time Observability During Major Sporting Events
The R&A ingested 250M new rows every 15 minutes and processed 25B rows with subsecond queries—maintaining 100% uptime and enabling teams to respond instantly during live events.
Learn More
Rakuten Delivers Real-Time Visibility Across Event Streams
Rakuten supports hundreds of concurrent users and thousands of daily queries over millions of records per second, enabling near real-time insights across business units.
Learn More
Blis Delivers Real-Time Visibility into Massive Ad Auction Scale
Blis processes hundreds of thousands of ad auction requests per second across 35,000 publishers and 200 million consumers, giving customers instant insights like ad-opportunity counts by place and time.
Learn More
Netflix Delivers Real-Time Observability Across Playback Quality
Netflix ingests 2 million events per second and queries 1.5 trillion rows in milliseconds, enabling real-time monitoring of playback quality for a consistent viewing experience.
Learn More
Twitch Delivers Real-Time Visibility for Usage and Performance
Twitch processes ~80B events per day and supports ~70,000 queries from 500+ internal users with <500 ms response times, enabling teams to observe usage and performance trends in real time.
Learn More
Confluent Delivers Real-Time Observability into Streaming Cluster Operations
Confluent ingests 5M+ events/sec and supports hundreds of subsecond queries on high-cardinality metrics, giving real-time visibility into thousands of Kafka clusters.
Learn More
TrueCar Delivers Real-Time Observability with Dashboards & Anomaly Detection
TrueCar ingests 20M records daily across ~80 TB, delivers <1 ms query times, and provides anomaly detection dashboards with minimal engineering overhead.
Learn More
GameAnalytics Delivers Real-Time Observability for Game Developers
GameAnalytics processes 25B events/day and 100,000 queries/hour with subsecond latency, giving developers real-time observability into player behavior and game performance.
Learn More
PayPal Delivers Real-Time Visibility into the User Journey
PayPal ingests 5.5B events daily and runs tens of thousands of queries to monitor KPIs and identify pain points, giving teams fast insights to improve user experience.
Learn More
NTT Delivers Real-Time Observability for Global IP Traffic
NTT ingests 100k+ events/sec to provide real-time network visibility across its global IP backbone, enabling both technical and non-technical users to explore traffic trends.
Learn More
IronSource Delivers Real-Time Visibility for Dashboard Performance
IronSource cut query latency from 35s to <5s while handling up to 1.5M events/sec, enabling non-SQL users to interactively explore streaming data via dashboards.
Learn More
Expedia Delivers Real-Time Visibility for Traveler Segmentation
Expedia cut query latency from 24 hours to <5 seconds, enabling dynamic traveler segmentation across multiple datasets and giving marketing and ops teams near-instant insights.
Learn MoreRoblox Delivers Real-Time Visibility Across Millions of Gaming Experiences
Roblox runs subsecond queries over high-cardinality gameplay data across 10M+ experiences and 86 TB ingested daily, giving creators fast, actionable insights without high cost or overhead.
Learn More
Ibotta Delivers Real-Time Observability for Fraud & Incident Detection
Ibotta built real-time fraud detection and incident response, cutting costs by 25% and enabling 30× more users with self-service analytics for faster visibility into anomalies.
Learn More
Splunk Delivers Real-Time Observability for Data Investigation & Monitoring
Splunk ingests 500M rows per minute and reduced storage 13.5×, enabling users to monitor, investigate, and act on their data in real time with far smaller footprint.
Learn More
Walmart Delivers Real-Time Visibility into Competitor Pricing
Walmart processes over 1B events daily with near subsecond query latency to monitor competitor pricing, enabling teams to respond instantly to market shifts.
Learn More
Salesforce Delivers Edge Observability Across Releases & Performance
Salesforce ingests 5B events per day, cuts storage 47%, and speeds queries by ~30%, enabling release comparisons and performance visibility for 150,000+ customers.
Learn More
Nielsen Marketing Cloud Delivers Real-Time Audience Visibility at Scale
Nielsen enables customers to drill into 80,000 attributes with real-time queries over terabytes of daily data, using distinct-count sketches for fast, accurate insights into audience trends.
Learn More
Yahoo Delivers Real-Time Visibility into User Behavior at Massive Scale
Yahoo processes 100B events daily across 10B sessions, managing 20+ PB of data and enabling both internal users and customers to access behavioral insights on demand.
Learn More
Charter Communications Delivers Real-Time Visibility into Product Performance
Charter scaled to 100–150k messages/sec, boosted storage 10×, and doubled throughput, enabling real-time visibility into product performance and customer experience.
Learn MoreImply is the easiest way to build with Druid through our cloud service and committer-driven expertise. For existing Apache Druid users, we can guarantee it.
Get started