Background
Metaimpact™ is revolutionizing customer lifecycle management with an outcome-based collaboration tool for software buyers and suppliers. Formerly known as MetaCX, Metaimpact connects SaaS companies with their customers throughout the sales and delivery processes, aiming to drive shared success and better business outcomes.
The development team at Metaimpact manages real-time data and analytics for B2B customers across a broad range of use cases such as sustainability, healthcare, and economic development for IoT. This requires processing large volumes of customer event data with complex mappings that must be consistently and accurately updated.
They initially explored Google Bigtable, storing pre-computed results for real-time analytics, meaning that certain types of computations or aggregations are performed after ingestion but before querying. This was great for query performance, but inadequate for situations when the customer didn’t know in advance what questions to explore.
This led Metaimpact to switch to Imply Enterprise (on-premise) as a solution to ingest and transform streaming data from Kafka into customer-facing analytics. Although Imply Enterprise made an immediate impact on query speed and solved the issue of managing arbitrary queries across multiple data sources at scale, Metaimpact ran into a few challenges that ultimately led them to upgrade to Imply Polaris (fully managed SaaS).
Challenge
The first challenge was related to infrastructure management. Even with Helm chart-based distribution, the Metaimpact team was spending a lot of time dealing with version upgrades, rolling restarts and capacity management. They knew that Imply Polaris would solve this with a fully managed cloud solution, but before they could make the switch from Imply Enterprise, they had to find a better solution for managing upserts.
Handling complex mappings for thousands of rows of data was challenged by the lack of normalized identifiers, as Metaimpact’s customers weren’t always certain of the correct mapping at the time of ingestion. Denormalizing the data would have required expensive re-ingestion operations, which was infeasible not only due to the cost, but also because they wanted to avoid batch changes to maintain near-real-time updates. Injecting the data as inline tables was also ruled out, as this wouldn’t be able to scale to millions of rows.
Metaimpact initially explored Kafka-driven lookup tables. By using the Kafka ingestion format, updates would be applied in near-real-time, it would support a future state with millions of rows, and it avoided expensive join operations because all of the data would already be in-memory. However, they ran into two core challenges:
- With many individual mappings from their customers, they originally envisioned a separate lookup table for each customer-defined-map ー but at scale, initializing all of the lookup tables had the unintended effect of timing-out regular ingestion tasks.
- Next, they created a compound key approach, where the customer’s map identifier would prefix the key to be matched. This could be evaluated in the query with a simple expression like
`LOOKUP(CONCAT(mappingId, '::', columnName), "combinedLookupTable")`
. This was functional, but was memory inefficient because the map ID was a repeated string on every entry within the lookup column.
“Polaris Dimension Tables make it possible for us to perform live identity normalization without having to re-ingest raw events when identity mappings are updated. With support for multiple columns, they’re the logical successor to Lookup functions.”
– Jason Schmidt, Software Architect, Metaimpact
Solution
Metaimpact transitioned to Polaris Dimension Tables as a more efficient and scalable solution. Dimension Tables are designed to optimize the upserts of high-cardinality datasets by storing dimensions separately from fact data, joining large datasets in real time without the overhead associated with traditional lookup tables (notably, Kafka-based lookup tables do not support multi-field keys or multiple attributes).
With Dimension Tables, Metaimpact no longer needed a compound key. As the map ID, key and expected output could be stored in separate columns, the low cardinality of map IDs could be represented efficiently and wouldn’t lead to memory issues for a very long time.
There were a few hurdles to overcome to switch to Dimension Tables, but Metaimpact found it to be a very reasonable transition:
- Metaimpact switched from the inline functional syntax
`LOOKUP(expression, lookupTableName)`
to a JOIN syntax. Since they were still working on the transition from Imply Enterprise to Polaris, they needed a format that would support both lookup tables and Dimension Tables. They gave the Dimension Table the same name as the combined lookup table had, and the only difference was needing to prefix the lookup table name with the “lookup” string.
- Next, they extended the split in their code to begin taking advantage of the structure of the Dimension Tables, so that they could limit the number of rows processed within the Dimension Table itself.
- Metaimpact identified an opportunity to optimize Dimension Table queries, which initially led to broadcast joins on every row. By moving to a CTE query with an upfront filter, they reduced the number of sub-query rows from each customer map to streamline the process. This allowed the query engine to focus on relevant rows, significantly enhancing JOIN efficiency and ensuring smooth query execution.
- Finally, they successfully transitioned to running all of their queries on Polaris, and around the same time, Imply included Dimension Tables in the Imply Quickstart they use for local development. This meant they could fully retire lookup tables!
Impact
By switching to Dimension Tables with Imply Polaris, Metaimpact improved memory efficiency by at least 2x. By reducing data costs, this effectively lengthens the runway for Metaimpact to expand their Imply node size.
The integration with Polaris also simplified management and support, allowing Metaimpact to use Dimension Tables like any other data source and benefit from Imply’s expert assistance. Beyond Dimension Tables, the switch to Polaris significantly reduced the amount of time that their team spent on infrastructure management due to benefits such as elastic autoscaling, committer-led support, automatic updates, and high availability that eliminated the aforementioned rolling restart issue. Taking these time savings into account, Imply Polaris reduced their total costs (TCO) by over 10% compared to Imply Enterprise.
Metaimpact is now fully operational on Imply Polaris, benefiting from faster, more scalable analytics with real-time computation under 120 ms average query speed. They anticipate future improvements as Imply continues to innovate upserts and other core capabilities.
*This story was written by Larissa Klitzke (Senior Product Marketing Manager, Imply) in collaboration with Jason Schmidt (Software Architect, Metaimpact).