Metaimpact

“Dimension Tables have provided a more flexible, scalable, performant, and future-proof solution for our needs. The transition to Imply Polaris was far better than any other database migration we have experienced in the past.”

Jason Schmidt, Software Architect | Metaimpact

Summary

Metaimpact upgraded to Dimension Tables for ~120 ms queries, 2× better memory efficiency, and 10% cost savings, improving infrastructure performance visibility.

Highlights

Imply supports real-time computation at scale under 120 ms avg. query speed
Upgrading from Kafka-based lookups to Imply Polaris Dimension Tables improved memory efficiency by at least 2x
Saved >10% TCO via Polaris support and streamlined infrastructure management

Background

Metaimpact™ is revolutionizing customer lifecycle management with an outcome-based collaboration tool for software buyers and suppliers. Formerly known as MetaCX, Metaimpact connects SaaS companies with their customers throughout the sales and delivery processes, aiming to drive shared success and better business outcomes.

The development team at Metaimpact manages real-time data and analytics for B2B customers across a broad range of use cases such as sustainability, healthcare, and economic development for IoT. This requires processing large volumes of customer event data with complex mappings that must be consistently and accurately updated.

They initially explored Google Bigtable, storing pre-computed results for real-time analytics, meaning that certain types of computations or aggregations are performed after ingestion but before querying. This was great for query performance, but inadequate for situations when the customer didn’t know in advance what questions to explore.

This led Metaimpact to switch to Imply Enterprise (on-premise) as a solution to ingest and transform streaming data from Kafka into customer-facing analytics. Although Imply Enterprise made an immediate impact on query speed and solved the issue of managing arbitrary queries across multiple data sources at scale, Metaimpact ran into a few challenges that ultimately led them to upgrade to Imply Polaris (fully managed SaaS).

Challenge

The first challenge was related to infrastructure management. Even with Helm chart-based distribution, the Metaimpact team was spending a lot of time dealing with version upgrades, rolling restarts and capacity management. They knew that Imply Polaris would solve this with a fully managed cloud solution, but before they could make the switch from Imply Enterprise, they had to find a better solution for managing upserts.

Handling complex mappings for thousands of rows of data was challenged by the lack of normalized identifiers, as Metaimpact’s customers weren’t always certain of the correct mapping at the time of ingestion. Denormalizing the data would have required expensive re-ingestion operations, which was infeasible not only due to the cost, but also because they wanted to avoid batch changes to maintain near-real-time updates. Injecting the data as inline tables was also ruled out, as this wouldn’t be able to scale to millions of rows.

Metaimpact initially explored Kafka-driven lookup tables. By using the Kafka ingestion format, updates would be applied in near-real-time, it would support a future state with millions of rows, and it avoided expensive join operations because all of the data would already be in-memory. However, they ran into two core challenges:

With many individual mappings from their customers, they originally envisioned a separate lookup table for each customer-defined-map ー but at scale, initializing all of the lookup tables had the unintended effect of timing-out regular ingestion tasks.
Next, they created a compound key approach, where the customer’s map identifier would prefix the key to be matched. This could be evaluated in the query with a simple expression like `LOOKUP(CONCAT(mappingId, '::', columnName), "combinedLookupTable")`. This was functional, but was memory inefficient because the map ID was a repeated string on every entry within the lookup column.

“Polaris Dimension Tables make it possible for us to perform live identity normalization without having to re-ingest raw events when identity mappings are updated. With support for multiple columns, they’re the logical successor to Lookup functions.”

– Jason Schmidt, Software Architect | Metaimpact

Solution

Metaimpact transitioned to Polaris Dimension Tables as a more efficient and scalable solution. Dimension Tables are designed to optimize the upserts of high-cardinality datasets by storing dimensions separately from fact data, joining large datasets in real time without the overhead associated with traditional lookup tables (notably, Kafka-based lookup tables do not support multi-field keys or multiple attributes).

With Dimension Tables, Metaimpact no longer needed a compound key. As the map ID, key and expected output could be stored in separate columns, the low cardinality of map IDs could be represented efficiently and wouldn’t lead to memory issues for a very long time.

There were a few hurdles to overcome to switch to Dimension Tables, but Metaimpact found it to be a very reasonable transition:

Metaimpact switched from the inline functional syntax `LOOKUP(expression, lookupTableName)` to a JOIN syntax. Since they were still working on the transition from Imply Enterprise to Polaris, they needed a format that would support both lookup tables and Dimension Tables. They gave the Dimension Table the same name as the combined lookup table had, and the only difference was needing to prefix the lookup table name with the “lookup” string.
Next, they extended the split in their code to begin taking advantage of the structure of the Dimension Tables, so that they could limit the number of rows processed within the Dimension Table itself.
Metaimpact identified an opportunity to optimize Dimension Table queries, which initially led to broadcast joins on every row. By moving to a CTE query with an upfront filter, they reduced the number of sub-query rows from each customer map to streamline the process. This allowed the query engine to focus on relevant rows, significantly enhancing JOIN efficiency and ensuring smooth query execution.
Finally, they successfully transitioned to running all of their queries on Polaris, and around the same time, Imply included Dimension Tables in the Imply Quickstart they use for local development. This meant they could fully retire lookup tables!

Impact

By switching to Dimension Tables with Imply Polaris, Metaimpact improved memory efficiency by at least 2x. By reducing data costs, this effectively lengthens the runway for Metaimpact to expand their Imply node size.

The integration with Polaris also simplified management and support, allowing Metaimpact to use Dimension Tables like any other data source and benefit from Imply’s expert assistance. Beyond Dimension Tables, the switch to Polaris significantly reduced the amount of time that their team spent on infrastructure management due to benefits such as elastic autoscaling, committer-led support, automatic updates, and high availability that eliminated the aforementioned rolling restart issue. Taking these time savings into account, Imply Polaris reduced their total costs (TCO) by over 10% compared to Imply Enterprise.

Metaimpact is now fully operational on Imply Polaris, benefiting from faster, more scalable analytics with real-time computation under 120 ms average query speed. They anticipate future improvements as Imply continues to innovate upserts and other core capabilities.

*This story was written by Larissa Klitzke (Senior Product Marketing Manager, Imply) in collaboration with Jason Schmidt (Software Architect, Metaimpact).

References

Using Upserts in Imply Polaris

See more similar to Metaimpact

The R&A Delivers Real-Time Observability During Major Sporting Events

The R&A ingested 250M new rows every 15 minutes and processed 25B rows with subsecond queries—maintaining 100% uptime and enabling teams to respond instantly during live events.

Learn More

Reddit Delivers Real-Time Analytics for Ad Campaign Performance

Reddit accelerated query performance and improved availability to 99.9%, enabling advertisers to explore six months of ad event data in real time to optimize campaign outcomes.

Learn More

Target Delivers Real-Time Visibility for Front-Line Business Decisions

Target runs 4M queries per day for 70,000 users with 300–600 ms response times, empowering teams to explore data interactively and act on insights at the edge of operations.

Learn More

How a Leading University Accelerated Security Investigations in Splunk by Eliminating Week Long Data Retrieval

A leading university system transformed how it accesses historical security data for legal and compliance requests. Instead of relying on workflows in Splunk that require rehydrating data from cold storage...

Learn More

IronSource Delivers Real-Time Visibility for Dashboard Performance

IronSource cut query latency from 35s to <5s while handling up to 1.5M events/sec, enabling non-SQL users to interactively explore streaming data via dashboards.

Learn More

Roblox Delivers Real-Time Visibility Across Millions of Gaming Experiences

Roblox runs subsecond queries over high-cardinality gameplay data across 10M+ experiences and 86 TB ingested daily, giving creators fast, actionable insights without high cost or overhead.

Learn More

Nielsen Marketing Cloud Delivers Real-Time Audience Visibility at Scale

Nielsen enables customers to drill into 80,000 attributes with real-time queries over terabytes of daily data, using distinct-count sketches for fast, accurate insights into audience trends.

Learn More

PepsiCo Delivers Real-Time Operational Visibility Across Sales, Supply Chain & Marketing

PepsiCo enables subsecond queries over massive event data, improving responsiveness in sales, predictive out-of-stock detection, and embedding operational insights into tools with minimal overhead.

Learn More

Citrix Delivers Real-Time Observability for Threat Detection and Environment Health

Citrix ingests 3B events daily, delivers 99.9% uptime, and meets SLOs on 90% of queries—providing proactive insider threat detection and visibility into its environment.

Learn More

Confluent Delivers Real-Time Observability into Streaming Cluster Operations

Confluent ingests 5M+ events/sec and supports hundreds of subsecond queries on high-cardinality metrics, giving real-time visibility into thousands of Kafka clusters.

Learn More

Salesforce Delivers Edge Observability Across Releases & Performance

Salesforce ingests 5B events per day, cuts storage 47%, and speeds queries by ~30%, enabling release comparisons and performance visibility for 150,000+ customers.

Learn More

Orb Delivers Real-Time Observability into Billing Operations

Orb scaled its billing platform with Kafka event streams, optimized query workloads, and saved 20–30 engineering hours/month while extending visibility into revenue reporting.

Learn More

Expedia Delivers Real-Time Visibility for Traveler Segmentation

Expedia cut query latency from 24 hours to <5 seconds, enabling dynamic traveler segmentation across multiple datasets and giving marketing and ops teams near-instant insights.

Learn More

Blis Delivers Real-Time Visibility into Massive Ad Auction Scale

Blis processes hundreds of thousands of ad auction requests per second across 35,000 publishers and 200 million consumers, giving customers instant insights like ad-opportunity counts by place and time.

Learn More

Splunk Delivers Real-Time Observability for Data Investigation & Monitoring

Splunk ingests 500M rows per minute and reduced storage 13.5×, enabling users to monitor, investigate, and act on their data in real time with far smaller footprint.

Learn More

RixEngine Delivers Real-Time Observability for Global Ad Exchanges

RixEngine reduced costs by over 50% and halved query latency with real-time observability, giving partners instant, actionable insights.

Learn More

Netflix Delivers Real-Time Observability Across Playback Quality

Netflix ingests 2 million events per second and queries 1.5 trillion rows in milliseconds, enabling real-time monitoring of playback quality for a consistent viewing experience.

Learn More

Paytm Delivers Real-Time Visibility into Customer Behavior at Petabyte Scale

Paytm cut infra costs by 50%, boosted performance 10×, and freed 12 weekly engineering hours by enabling real-time behavioral analytics at petabyte scale.

Learn More

How BTG Pactual Scales Security Investigations Without Replacing Splunk

The bank used Imply Lumi to retain and query an additional 5TB of security data per day while reducing Splunk-related observability costs by over 70%. Lumi accelerated investigations across historical data...

Learn More

Twitch Delivers Real-Time Visibility for Usage and Performance

Twitch processes ~80B events per day and supports ~70,000 queries from 500+ internal users with <500 ms response times, enabling teams to observe usage and performance trends in real time.

Learn More

Atlassian Delivers Real-Time Visibility for Usage Analytics

Atlassian improved analytics with 5× faster performance, ≤100 ms queries, and 5+ years of retained data, giving customers instant visibility into long-term usage trends.

Learn More

PayPal Delivers Real-Time Visibility into the User Journey

PayPal ingests 5.5B events daily and runs tens of thousands of queries to monitor KPIs and identify pain points, giving teams fast insights to improve user experience.

Learn More

Amobee Delivers High-Performance Ad Analytics with Real-Time Query Flexibility

Amobee runs subsecond queries over trillions of rows with hundreds of concurrent users, enabling advertisers to explore any market or time period in real time while reducing costs.

Learn More

NTT Delivers Real-Time Observability for Global IP Traffic

NTT ingests 100k+ events/sec to provide real-time network visibility across its global IP backbone, enabling both technical and non-technical users to explore traffic trends.

Learn More

Pinterest Delivers Real-Time Advertising Analytics at Massive Scale

Pinterest processes 500,000 ad events per second and answers 99% of advertiser queries in ~250 ms, reducing event-to-ingestion time to under a minute for near-instant advertiser insights.

Learn More

Ibotta Delivers Real-Time Observability for Fraud & Incident Detection

Ibotta built real-time fraud detection and incident response, cutting costs by 25% and enabling 30× more users with self-service analytics for faster visibility into anomalies.

Learn More

Zillow Group Delivers Real-Time Visibility for Self-Serve Analytics

Zillow enables product and business teams to self-serve ad hoc analytics with 5-minute onboarding and effortless scaling, reducing overhead while accelerating decisions.

Learn More

Charter Communications Delivers Real-Time Visibility into Product Performance

Charter scaled to 100–150k messages/sec, boosted storage 10×, and doubled throughput, enabling real-time visibility into product performance and customer experience.

Learn More

Poshmark Delivers Real-Time Observability for User Behavior & A/B Testing

Poshmark supports 80M users and 60+ data dimensions, cutting dashboards to <5 seconds and enabling A/B test insights in <2 seconds for fast visibility into platform usage.

Learn More

Rakuten Delivers Real-Time Visibility Across Event Streams

Rakuten supports hundreds of concurrent users and thousands of daily queries over millions of records per second, enabling near real-time insights across business units.

Learn More

GameAnalytics Delivers Real-Time Observability for Game Developers

GameAnalytics processes 25B events/day and 100,000 queries/hour with subsecond latency, giving developers real-time observability into player behavior and game performance.

Learn More

Adikteev Delivers Real-Time Visibility and Alerts for Campaign Health

Adikteev cut dashboard latency from over 7 minutes to seconds, enabling self-service exploration of campaign telemetry and proactively alerting clients to DAU shifts two days earlier than internal pipelines.

Learn More

WalkMe Delivers Real-Time Observability Across Device Performance

WalkMe replaced slow search systems with a real-time platform to monitor billions of client devices, giving teams cost-effective visibility into performance and usage.

Learn More

Yahoo Delivers Real-Time Visibility into User Behavior at Massive Scale

Yahoo processes 100B events daily across 10B sessions, managing 20+ PB of data and enabling both internal users and customers to access behavioral insights on demand.

Learn More

TrueCar Delivers Real-Time Observability with Dashboards & Anomaly Detection

TrueCar ingests 20M records daily across ~80 TB, delivers <1 ms query times, and provides anomaly detection dashboards with minimal engineering overhead.

Learn More

Walmart Delivers Real-Time Visibility into Competitor Pricing

Walmart processes over 1B events daily with near subsecond query latency to monitor competitor pricing, enabling teams to respond instantly to market shifts.

Learn More

With Imply for Druid,
save time and money.

Imply is the easiest way to build with Druid through our cloud service and committer-driven expertise. For existing Apache Druid users, we can guarantee it.

Get started

Observability Warehouse

Real Time Analytics Database

OBSERVABILITY CASE STUDIES

Content

Support

Apache Druid

Summary

Highlights

Background

Challenge

Solution

Impact

References

See more similar to Metaimpact

With Imply for Druid,
save time and money.

Ready to decouple your observability stack?
No workflow changes. No migrations. More data, less spend.

Observability Warehouse

Real Time Analytics Database

OBSERVABILITY CASE STUDIES

Content

Support

Apache Druid

Metaimpact

Summary

Highlights

Background

Challenge

Solution

Impact

References

See more similar to Metaimpact

With Imply for Druid,save time and money.

Ready to decouple your observability stack? No workflow changes. No migrations. More data, less spend.

With Imply for Druid,
save time and money.

Ready to decouple your observability stack?
No workflow changes. No migrations. More data, less spend.