Metaimpact

Imply Polaris’ Dimension Tables streamline real-time analytics for Metaimpact

“Dimension Tables have provided a more flexible, scalable, performant, and future-proof solution for our needs. The transition to Imply Polaris was far better than any other database migration we have experienced in the past.”
Jason Schmidt, Software Architect  |    Metaimpact

Summary

Metaimpact improved its real-time analytics by transitioning from Apache Kafka-driven lookup tables to Dimension Tables with Imply Polaris. This allowed Metaimpact to manage large datasets more effectively, reducing memory usage and improving query performance by eliminating the need for compound keys. The near-real-time data updates provided by Dimension Tables streamlined operations for customer analytics. After migrating, Metaimpact experienced significant performance improvements, including faster queries, more efficient memory usage, and time / cost savings via infrastructure management.

Highlights

  • Imply supports real-time computation at scale under 120 ms avg. query speed
  • Upgrading from Kafka-based lookups to Imply Polaris Dimension Tables improved memory efficiency by at least 2x
  • Saved >10% TCO via Polaris support and streamlined infrastructure management

Background

Metaimpact™ is revolutionizing customer lifecycle management with an outcome-based collaboration tool for software buyers and suppliers. Formerly known as MetaCX, Metaimpact connects SaaS companies with their customers throughout the sales and delivery processes, aiming to drive shared success and better business outcomes.

The development team at Metaimpact manages real-time data and analytics for B2B customers across a broad range of use cases such as sustainability, healthcare, and economic development for IoT. This requires processing large volumes of customer event data with complex mappings that must be consistently and accurately updated.

They initially explored Google Bigtable, storing pre-computed results for real-time analytics, meaning that certain types of computations or aggregations are performed after ingestion but before querying. This was great for query performance, but inadequate for situations when the customer didn’t know in advance what questions to explore.

This led Metaimpact to switch to Imply Enterprise (on-premise) as a solution to ingest and transform streaming data from Kafka into customer-facing analytics. Although Imply Enterprise made an immediate impact on query speed and solved the issue of managing arbitrary queries across multiple data sources at scale, Metaimpact ran into a few challenges that ultimately led them to upgrade to Imply Polaris (fully managed SaaS).

Challenge

The first challenge was related to infrastructure management. Even with Helm chart-based distribution, the Metaimpact team was spending a lot of time dealing with version upgrades, rolling restarts and capacity management. They knew that Imply Polaris would solve this with a fully managed cloud solution, but before they could make the switch from Imply Enterprise, they had to find a better solution for managing upserts.

Handling complex mappings for thousands of rows of data was challenged by the lack of normalized identifiers, as Metaimpact’s customers weren’t always certain of the correct mapping at the time of ingestion. Denormalizing the data would have required expensive re-ingestion operations, which was infeasible not only due to the cost, but also because they wanted to avoid batch changes to maintain near-real-time updates. Injecting the data as inline tables was also ruled out, as this wouldn’t be able to scale to millions of rows.

Metaimpact initially explored Kafka-driven lookup tables. By using the Kafka ingestion format, updates would be applied in near-real-time, it would support a future state with millions of rows, and it avoided expensive join operations because all of the data would already be in-memory. However, they ran into two core challenges:

  1. With many individual mappings from their customers, they originally envisioned a separate lookup table for each customer-defined-map ー but at scale, initializing all of the lookup tables had the unintended effect of timing-out regular ingestion tasks.
  2. Next, they created a compound key approach, where the customer’s map identifier would prefix the key to be matched. This could be evaluated in the query with a simple expression like `LOOKUP(CONCAT(mappingId, '::', columnName), "combinedLookupTable")`. This was functional, but was memory inefficient because the map ID was a repeated string on every entry within the lookup column.

“Polaris Dimension Tables make it possible for us to perform live identity normalization without having to re-ingest raw events when identity mappings are updated. With support for multiple columns, they’re the logical successor to Lookup functions.”

– Jason Schmidt, Software Architect, Metaimpact

Solution

Metaimpact transitioned to Polaris Dimension Tables as a more efficient and scalable solution. Dimension Tables are designed to optimize the upserts of high-cardinality datasets by storing dimensions separately from fact data, joining large datasets in real time without the overhead associated with traditional lookup tables (notably, Kafka-based lookup tables do not support multi-field keys or multiple attributes).

With Dimension Tables, Metaimpact no longer needed a compound key. As the map ID, key and expected output could be stored in separate columns, the low cardinality of map IDs could be represented efficiently and wouldn’t lead to memory issues for a very long time.

There were a few hurdles to overcome to switch to Dimension Tables, but Metaimpact found it to be a very reasonable transition:

  1. Metaimpact switched from the inline functional syntax `LOOKUP(expression, lookupTableName)` to a JOIN syntax. Since they were still working on the transition from Imply Enterprise to Polaris, they needed a format that would support both lookup tables and Dimension Tables. They gave the Dimension Table the same name as the combined lookup table had, and the only difference was needing to prefix the lookup table name with the “lookup” string.
  2. Next, they extended the split in their code to begin taking advantage of the structure of the Dimension Tables, so that they could limit the number of rows processed within the Dimension Table itself.
  3. Metaimpact identified an opportunity to optimize Dimension Table queries, which initially led to broadcast joins on every row. By moving to a CTE query with an upfront filter, they reduced the number of sub-query rows from each customer map to streamline the process. This allowed the query engine to focus on relevant rows, significantly enhancing JOIN efficiency and ensuring smooth query execution.
  4. Finally, they successfully transitioned to running all of their queries on Polaris, and around the same time, Imply included Dimension Tables in the Imply Quickstart they use for local development. This meant they could fully retire lookup tables!

Impact

By switching to Dimension Tables with Imply Polaris, Metaimpact improved memory efficiency by at least 2x. By reducing data costs, this effectively lengthens the runway for Metaimpact to expand their Imply node size.

The integration with Polaris also simplified management and support, allowing Metaimpact to use Dimension Tables like any other data source and benefit from Imply’s expert assistance. Beyond Dimension Tables, the switch to Polaris significantly reduced the amount of time that their team spent on infrastructure management due to benefits such as elastic autoscaling, committer-led support, automatic updates, and high availability that eliminated the aforementioned rolling restart issue. Taking these time savings into account, Imply Polaris reduced their total costs (TCO) by over 10% compared to Imply Enterprise.

Metaimpact is now fully operational on Imply Polaris, benefiting from faster, more scalable analytics with real-time computation under 120 ms average query speed. They anticipate future improvements as Imply continues to innovate upserts and other core capabilities.

*This story was written by Larissa Klitzke (Senior Product Marketing Manager, Imply) in collaboration with Jason Schmidt (Software Architect, Metaimpact).

References

See more similar to Metaimpact

Zillow Group

Zillow and Imply: Empowering Internal Users with Self-Serve Analytics

As the most-visited real estate website in the United States, Zillow and its affiliates offer customers an on-demand experience for selling, buying, renting and financing with transparency and nearly seamless...

Learn More
Target

Target and Apache Druid: Real-Time Analytics at Massive Scale

As a data-driven organization, US-based retailer Target needed a data analytics platform that could address the unique needs of each of its various business units, while scaling to hundreds of thousands of...

Learn More
Rakuten

Rakuten relies on Druid to Analyze Millions of Records per Second

Rakuten is an affiliate marketing company that helps users earn cash back by shopping through their site. As one of the biggest data-driven companies in Japan, Rakuten ingests and processes huge amounts of...

Learn More
Citrix

Citrix uses Druid to Prevent Security Threats in Real-time

Citrix is a digital workspace platform that gives employees everything they need to be productive in one unified experience while arming IT with the visibility, simplicity, and security needed to enable and...

Learn More
Confluent

Scale, Streaming, and Subsecond Queries: Confluent and Apache Druid

Confluent is a full-scale data streaming platform that enables its customers to easily access, store, and manage data as continuous, real-time streams. Confluent turned to Druid because their existing NoSQL...

Learn More
Salesforce

Salesforce Chooses Apache Druid For Their Edge Observability Platform

To ensure a consistently great experience for more than 150,000 customers around the globe, Salesforce built an observability application powered by Druid. Now, Salesforce is able to obtain data-driven insights...

Learn More
Amobee

Amobee Scales Ad Analytics, Querying Trillions of Rows in Milliseconds

Amobee provides end-to-end advertising campaigns and portfolio management across TV, digital and social media for some of the world’s largest brands. Since implementing Apache Druid, Amobee has been able...

Learn More
Walmart

Walmart uses Apache Druid to Track Competitor Pricing in Real Time

Walmart chose Druid as part of their technology stack to track the pricing of their competitors in real-time. With Druid, latencies dropped to near subsecond levels while easily scaling to more than 1 billion...

Learn More
Blis

Blis+Imply: Real-Time Analytics for Adtech—At Massive Scale

Blis is an integrated advertising planning and buying platform that delivers scaled, relevant, and high-performing audiences to the world’s top brands and media agencies. Blis chose Imply to implement real-time...

Learn More
Expedia Logo

Personalizing Travel: Expedia, Imply, and the Art of Segmentation

As one of the world’s top travel platforms, Expedia Group manages customer experiences across more than 200 booking sites and 25 brands including Brand Expedia, Orbitz, Travelocity, Vrbo, and Hotels.com....

Learn More
Iron Source

ironSource + Imply: Codeless Queries and Interactive Dashboards

As the leading business platform for the app economy, IronSource provides an array of services to monetize and scale applications, all using streams powered by Confluent and real-time dashboards powered by...

Learn More
Sift

Sift: Achieving Real-Time Anomaly Detection with Imply + Druid

Sift is the leader in Digital Trust & Safety, empowering companies of all sizes to unlock revenue without risk. Sift chose Druid to power their automated monitoring tool, Watchtower, a system that would use...

Learn More

Atlassian Switches from PostgreSQL to Druid for Customer Analytics

Atlassian is a software company with a suite of products designed to enable collaboration among software developers, project managers, and other software development teams. Atlassian chose Druid to power their...

Learn More
Ibotta

Security at Speed: Why Ibotta Built Real-Time Fraud Detection on Imply

Ibotta, a free cashback rewards platform, chose Druid to power their multifaceted fraud prevention strategy that combines data from third-party vendors with Ibotta’s own data to make decisions about fraud...

Learn More
WalkMe

WalkMe Delivers Real-time Analytics for Digital Adoption Platform

WalkMe is a Digital Adoption Platform (DAP) pioneer that offers a 360-degree solution to leading organizations worldwide. WalkMe chose Druid to power their internal and external analytics applications, enabling...

Learn More

How Apache Druid and Imply Helped Orb Scale Their Usage-Based Billing Platform

Orb, a modern billing engine purpose-built for companies with complex pricing models, overcame database challenges and achieved significant success with Apache Druid and Imply. Facing scalability issues with...

Learn More
Adikteev

Adikteev Achieves Subsecond Latency for Customer Analytics with Imply

Adikteev designs and executes mobile marketing campaigns for their clients in order to boost app use and engagement. With Imply, built from Apache Druid, Adikteev created customer-facing dashboards that enable...

Learn More
TrueCar

Speed, Security, and Scaling: Why TrueCar Uses Imply + Druid

TrueCar is the most efficient and transparent way to find a car. TrueCar chose Druid and Imply to make their dashboards real-time, detect anomalies, and do so while minimizing engineering and operational overhead.

Learn More
NTT

How NTT Powers Their Analytics Stack and Data Exploration with Imply

NTT is one of the largest telecommunications companies in the world. NTT Global IP Network (GIN) business unit chose Druid and Imply to power their analytics stack, unlocking new data exploration use cases...

Learn More
Pinterest

Pinterest and Druid: Optimizing Advertising for 400 Million+ Visitors

Pinterest chose Druid to power Archmage, their real-time analytics application that enables advertisers to effectively reach over 400 million people who use Pinterest every month.

Learn More
Poshmark

How Poshmark Uses Druid to Monitor Their Platform in Real Time

Poshmark is a leading social marketplace for buying and selling of second hand fashion and home goods. Poshmark's team chose Apache Druid to as the core of their analytics framework that lets users explore...

Learn More
Yahoo

Yahoo uses Druid and DataSketches for Real-time Behavioral Analytics

As its audience and advertising data volumes grew, Yahoo faced increasing demand to make data more accessible, both to internal users and customers. To address the demand for data, the Yahoo team decided to...

Learn More
Splunk

Imply and Druid: The Foundation of Splunk’s Real-Time Analytics Engine

Splunk is the world’s first Data-to-Everything™ Platform, designed to remove the barriers between data and action to turn data into doing for its 19,000+ customers. With Apache Druid and Imply powering...

Learn More
Nielsen Marketing Cloud

How Nielsen Marketing Cloud Uses Druid to Analyze Audience Trends

Nielsen Marketing Cloud provides a way to profile the various audiences that marketers and publishers would like to target on digital media, activate via various ad networks, and then gain insights on ad performance....

Learn More
Reddit

Reddit Analyzes Advertisement Data in Real Time Using Druid with Imply

Reddit generates tens of gigabytes of event data per hour from advertisements on its platform. To let advertisers both understand their impact and decide how to target their spending, Reddit needed to enable...

Learn More
Ippen Digital

Ippen Digital + Imply: A Foundation for Mission-Critical Analytics

Ippen Digital offers an integrated platform to aggregate content, drive subscription growth and manage advertising across a broad range of digital content. Ippen Digital chose Apache Druid and Imply to achieve...

Learn More
Paytm

Paytm Built a PB-scale Analytics Application using Druid with Imply

Paytm, India’s leading financial services company, switched to Imply to support a powerful, cost-efficient application that enables hundreds of internal users to analyze customer behavioral data in real-time.

Learn More
Netflix

How Druid Provides Internet-Scale Observability at Netflix

Netflix built an observability analytics app powered by Druid, enabling them to monitor playback quality and ensure a consistently great user experience across all devices and operating systems.

Learn More
Charter

How Charter Communications Improves Customer Experiences with Imply

Charter Communications is a leading broadband connectivity company and cable operator serving more than 30 million customers in 41 states through its Spectrum brand. Charter chose Druid and Imply as the foundation...

Learn More
Twitch

Data for All: How Twitch Used Imply to Build Self-Service Analytics

As Twitch grew, the amount of data they received and the number of employees interested in using data grew rapidly. To continue empowering decision-making as they scaled, Twitch turned to Druid and Imply to...

Learn More
GameAnalytics

GameAnalytics turns 57M Game Events per Day into Real-time Insights

GameAnalytics is the number one analytics tool for anyone building a mobile game, from indie developers and game studios to established publishers. The platform receives, stores, and processes game events from...

Learn More
PayPal

PayPal chooses Druid to Optimize the User Journey with User Analytics

PayPal is an online payment system that enables individuals and businesses to send and receive money securely through its mobile app or website. PayPal uses Druid to analyze behavioral data generated from users...

Learn More

With Imply for Druid,
save time and money.

Imply is the easiest way to build with Druid through our cloud service and committer-driven expertise. For existing Apache Druid users, we can guarantee it.

Get started

Let us help with your analytics apps

Request a Demo