Observability at a Breaking Point: How Decoupling Unlocks Speed, Scale, & Savings
Learn how decoupled observability helps you do more with your Splunk data, reduce costs, and scale efficiently with Federated Search.
Watch nowEveryone. Thanks for joining today's webinar. My name is Matt Moworsi, and later, I'll be joined by my colleague, Peter Marshall. So quick housekeeping note before we get started. If questions come up at any point, please feel free to drop them into the chat. We will keep an eye out for them throughout the session and address as many as we can either live or during the q and a at the end. So with that, let's get into it. Today, we're gonna be talking about a challenge nearly every Splunk team eventually runs into. How do you continue to scale visibility as your data volumes keep growing without letting cost get completely out of hand or changing how teams actually work? Now keep in mind, I'm not talking about moving away from Splunk. It's about expanding what Splunk can do. So a quick bit of context on who we are. Imply is the company behind Druid, a open source real time analytics database that's built for sub second queries at massive scale. We've been around for about ten years now. And across our customer base, Druid processes hundreds of trillions of rows every year with queries consistently coming back in under a hundred milliseconds. Now because of that, we have had a front row seat to how companies work with streaming and event data, and more importantly, where their observability stacks start to break down as data grows. Okay. So, let me be very clear upfront. Here at Imply, we are big fans of Splunk. Splunk is excellent at search. It's excellent at investigation. People love SPL, the native query language for Splunk. But at the same time and you've, you know, probably heard, especially if you're listening to this webinar, that Splunk is awesome, but at the same time, it's very expensive. And what often gets missed is why that's true. The issue isn't that Splunk is inefficient or that it is poorly designed. Splunk is doing exactly what it was built to do, which is index data. So it's always searchable. It's fast. It's available. That model works really, really well until the data itself reaches a certain scale. So this isn't really a Splunk problem. It's much more of a data problem at scale. And once you you look at it as a data problem, the rest, it starts to make much more sense. Logs, telemetry data. They don't grow with the business. They grow with machines. When you think about it, every microservice, every container, every cloud service, they're all generating more data by defaults. And because Splunk is designed to index data upfront, more data, well, that means more indexers, more infrastructure, and more operational overhead. Now at the same time, organizations are under real pressure to reallocate spends, especially toward AI and and other new initiatives. So teams, they don't stop scaling Splunk because they want to. They stop because the math stops working for them. And when teams hit that point, they usually they don't make a a big architectural decision right away. Instead, they try to they try to cope with the problem that they that they're having. And the first path is often pruning data before it reaches Splunk. So this is this is filtering or sampling to decide upfront what's worth indexing and what's worth keeping. Second common path is offloading data to cloud storage. So here, it's it's pushing high volume sources out of Splunk to reduce costs. And in this situation, the data still exists, but now it's no longer immediately searchable with SPL and getting answers. Well, it takes more time. It takes more steps. Now both of these approaches, they they absolutely reduce cost, but they do it by reducing visibility, which ultimately makes investigations much more harder to complete. And these these aren't strategic solutions. They're more designed to be workarounds. So the obvious question becomes, is there a way to keep more data searchable without paying full indexing costs for all of it? That's where the third option comes in. What we've built with Loomi is a foundation optimized specifically for logs and events, a warehouse that's designed to sit underneath the tools teams already use. So this isn't this isn't just an archive solution or a net new observability platform. Loomi is designed around really a different assumption, and that's that machine generated data, it's gonna keep growing. That's inevitable. The new the what's new is that the system underneath those observability tools has to be built for this new reality. So instead of indexing everything the same way, Loomi starts with aggressive compression that's purpose built for log and event data. And on top of that, we layer indexing and query execution so the data stays interactive. It's not just archival. And then we make that data accessible in multiple ways, including directly from the tools teams already use, like Splunk using SPO. So this brings us to imply Loomi. Loomi is a high performance, high efficiency observability warehouse for logs. It's designed to sit underneath the tools teams already use. So from Splunk's point of view, Loomi looks like another federated search target. Splunk remains your home base, so this means you continue to use the same SBL. You keep your dashboards, your workflow stay intact, nothing changes for the end user. But what does change is how much data you can actually keep searchable. Now one of the first things teams notice with Loomi is storage efficiency. Loomi typically stores raw log data at around one third the size of traditional observability platforms, and it's often even smaller than simply g zipping that data into object storage. So what that means is that you can keep more data online and you can keep it searchable for longer without, you know, blowing up your budget. Second things teams notice is the performance. Because Loomi indexes and executes queries directly on top of compressed data, searches, well, they stay interactive even on large historical datasets. And when you think about different searches like broad time ranges or wild cards or heavy aggregations, these are the kind of searches teams are often hesitant to run because they can be slow, they can be disruptive. But with Loomi, these queries now become usable again. And, you know, ultimately, that directly reduces your investigation time, and it speeds up how quickly teams can get the answers that they need. Now all of that performance, well, it only matters if people can actually use it. And from a user's point of view, again, nothing changes. Your dashboards still work, your alerts still fire, your investigations, they still happen within SBL. And that's important because let's be honest. Most observability projects, they fail not for technical reasons, but because people just don't want to adopt them. Okay. Once teams see that nothing changes for users, the next thing that becomes clear is that the economics start to look very different. Instead of being forced to choose between performance and cost, teams start getting more of both. They can keep more data online. They can search across longer time ranges, and they can do it without indexing everything at full cost. Splunk keeps doing what it does best, again, powering those dashboards, those alerts, those investigations. And Loomi, well, it quietly takes the pressure off underneath, handling storage, handling large scale historical queries, all much more efficiently. The result is that teams stop treating retention like some sort of luxury and stop dropping data just to stay within budget. They get broader visibility. They get a much more predictable cost structure without changing how people work. Okay. So once teams see that the economics change, the next natural question is, okay, what's actually driving the cost here? In most observability platforms today, the storage, the compute, they are tightly coupled together. If you want data to be fast and searchable, you're paying for always on compute whether anyone is querying that data or not. That works really well for recent high touch data. But as the data ages, the way it's used changes and evolves. Unfortunately, the cost model does not. Lumi comes in and breaks that coupling. By separating compute from storage, you're no longer paying full performance costs for data that isn't actively being queried. That's where the idea of a dial comes in. Recent high touch data, it stays hot with always on performance. And as that data ages, it doesn't just fall off a cliff or or disappear behind rehydration steps. It stays searchable, but compute, it only spins up when a query actually runs. So the goal here is simple. You only pay always on costs when the workload truly needs it. Now if this if this idea of separating storage and compute sounds familiar, it's because we've already seen the shift play out in analytics. For a long time, analytic systems, again, they tightly coupled their storage and their compute together. And as data volumes grew, costs well, they just exploded and flexibility all but disappeared. Snowflake was one of the first to to change that by decoupling storage and compute, letting teams scale data independently for performance and only pay for what they needed when they needed it. Loomi comes in and applies that same architectural shift to logs and events because observability data has hit the same scaling limits analytics did. Machine generated data, it's massive. It's not going anywhere. It's business critical. It's growing faster than traditional indexing models were ever designed to handle. What Loomi provides is a modern data foundation underneath observability tools and one that's built for this new reality. Okay. So here's what this looks like in practice. This team was generating roughly five terabytes of data per day that they they simply couldn't afford to index in Splunk. And this wasn't this wasn't low value data. It was data they wanted access to during investigations and during incident reviews. But at the scale that they were operating, the unit economics just didn't work for them. So like many teams out there, they were forced to make hard trade offs. What changed with Lumi is that they stopped trying to force Splunk to do the most expensive part of the job. Instead of reducing data or sampling more aggressively or pushing it out of reach to the cloud, they brought that five terabytes per day back into scope by storing it in Lumi underneath Splunk. So what happened was is that, you know, Splunk stayed the place teams worked, same searches, same dashboards, same workflows. And even with that additional data back online, they were able to reduce their overall Splunk related costs by more than seventy percent. So they ended up with more data. They got better visibility. They completed faster investigations, all while dramatically lowering costs. Now here's where finance comes into the picture. Before Lumi, the CFO for this organization was really pushing toward a a full migration. Again, not because Splunk wasn't valuable, but because the cost trajectory, where they were headed just didn't make sense. And there weren't many other levers that they could pull. Once finance saw savings on the order of seventy percent, that that conversation, it completely changed. By shifting the most expensive part of the problem underneath the stack, Loomi was able to to make the economics finally start to line up for them. And from a finance perspective, that that was the turning point for them. Instead of debating whether to move off the platform, the finance team, they they finally had a way to control cost growth while keeping the tools, keeping the workflows that the operations team already liked and and relied on. At this point, I I I think the the best thing to do is to switch gears a little bit and and show you how this works. So, at this point, I'm gonna hand it over to Peter to walk through Lumi in action. And as you as you watch the demo, there are three things I'd like you to to keep an eye on. First is check out how little changes there are for the end user. They get the same queries, the same fields. You know, overall, it's the same experience. Number two is keep an eye on the performance, especially as the searches get more complex and cover larger time ranges. And then finally, notice how this is all done with simple configuration. There's no agents. There's there's no migration project. There's, you know, there's no replatforming. It's very easy to get started with with Lumi. And with that, I'll hand it over to Peter to to show you Lumi in action. Thanks for that team. Great to meet you all. Welcome to the webinar. I'm Peter Marshall. I'm director of developer relations here at Imply. I'm gonna talk to a little bit more about Lumi's approach to search and query handling, that thing that helps us search that data at very high speeds, and a little bit about the compression, that thing that's reducing the storage footprint. Because those two things together is what makes Lumie really cost effective. So my task here today is to translate all the good stuff you've been hearing about Imply and about Imply Loomi from my colleagues and to show you what that looks like in the real world. So let's get going. I'm gonna open a browser here, and let's log into a leading observability platform. Note there are no add ons. There's no custom applications. We're just looking at standard features that come with this tool off the shelf. So in the background, my team has been preloading a demo dataset here. We've got some web data from a make believe ecommerce website. So let me answer the first question you might have, which is how easy is it to look at this data if I'm looking in Lumi? Well, it's very easy. We are dual loading this exact same dataset into Loomi. And to get to it, I'm gonna take exactly the same search command that I used to show you the local demo data. I'm just gonna make one small change. When I hit go, these two technologies are now working together. They're searching for the matching records and computing the final results from a copy of the data that I have in Lumi. And in the results, as you see here, I'm getting exactly the same raw data, the same field, the same results because the commands I use, the language I use, stays the same even though I'm now using Loomi. So let's now turn to that exciting advantage of Loomi, speed. To see that more clearly, let's get a bit more complicated. I'm gonna run this more complicated command on the local data. And meanwhile, let's open the same search. I'm gonna put this now so that it goes and searches Loomi. This is exactly the kind of command that any of your users who use these tools day in, day out will be very familiar with. And look how much faster results came back when imply Loomy is involved. Meanwhile, that local search let's have a look. Oh, yeah. That's still working away. So my team undertook comprehensive benchmark testing. We looked at a range of different searches and queries, and observability tools consistently get at least four x performance boost when observability is stored in, and those answers are being computed in partnership with Imply Lumi. That kind of speed recounts when you care about MTTR, when you're looking at speedy access to the latest data, when you're asking challenging questions, when you're trying to really dig into what you know about your environment. So how do we set this all up? It's really quite easy. Lumi presents itself as part of the infrastructure that these tools are used to. It's about maximizing fit, about minimizing friction when it comes to integration. And that means in this main tool, all the searches, the dashboards, the alerts, everything benefits. But Loomi doesn't just make searching and querying faster. It's also about how stores the data. And as you can see here, this sample dataset was compressed by over ninety percent. Now that's really not unusual. Lumi's approach to intelligently compressing all sorts of observability data is what opens up options for rethinking about how and where you store your observability data. You might reconsider your retention policies and avoid that pain of rehydration and whether it's now possible to collect and search things like Kubernetes or VPC flow logs. Maybe today, you've been thinking that's gonna be way too expensive to store, let alone in terms of compute to try and search the stuff. Again, Lumi isn't about replacing what you have. It's about doing more with what you have. Implies that to fit really neatly into existing architecture, both for collecting data and for generating insights. You know, this gets to the heart of how we are gonna deliver on what the market is calling decoupled observability. So with that in mind, let's look at some of these integrations. So this is the integrations page here inside Lumi. To load data into this environment, we actually use OpenTelemetry. If you're Splunk users, well, you can see there's HEC and s two s connect here. That's really easy to configure, therefore, on your forwarders to push the data through into Lumi. And we also have a pull mechanism in here in the demo environment that picks up data automatically from s three buckets. So how about getting the data out? Well, starting here with Splunk, Lumi appears as a remote search head. So hooking up to search the data that we have stored in Lumi, as with all of our integrations, it's just copy and paste. We're building Lumie to fit neatly into architecture. We also see here we've got Grafana. And in Grafana, Lumie appears as Loki. We've got a JDBC connector for BI tools. We have an MCP server. So I have a bit of fun with this. I've connected this demo environment to Clawd Desktop. I have a play around with natural language search. I have a colleague who's using LangChain modules to build a chatbot for anomaly detection. That's really quite cool as well. So under the hood, in summary then, we've got Lumi's speed. We've got its compression. That's what's making querying and searching all this data really cost effective. While that decoupled approach to integration, that's what's opening up the possibility of fewer silos, of better collaboration between teams, more efficient workflows. You can preserve your workflows and use the tools that you already have. Just making small configuration changes, and you can do more thanks to that high compression, thanks to that really great performance. When you're ready to activate this decoupled observability architecture, you can then take advantage of those native integrations and open up your Lumi data to other tools as well. So I hope that was all helpful, useful, and interesting for you. With that, I'm gonna hand back over to the team. I'm gonna stay in the background here and answer any questions you might have. With that, back over to you. Alright. That was great. Thank you so much, Peter. Before I wrap up, I wanna, you know, pause for a second and open it up for questions. If anything came up during the demo or even earlier in the session, again, please feel free to drop your questions into the chat. And while folks are typing, I'll just start off with two questions that come up a lot. One is how does Loomi fit into an existing observability stack without forcing a migration? It's probably the most common question I get. Short answer is that Loomi sits underneath what you what you already have. All of your existing collectors keep working the same way. Whether that's Splunk's universal and heavy forwarders, if you're using Cribble Stream or OpenTelemetry, Lumi simply becomes another destination on ingest. And on the query side, we integrate with Splunk's native federated search. So you get to keep using SPL and your existing dashboards. Ultimately, what we've seen with teams is that they can adopt Lumi at at their own pace. Second question that comes up a lot is, where do teams usually start when they're evaluating Lumi? And most teams that I talk to, they start with the data that that is causing the most cost pressure today. Usually, high volume logs or older data that's expensive to keep indexed. They'll run Lumi side by side. They can measure the performance. They can look at the cost comparison. And then from there, they can expand from from where they are once they feel, you know, more comfortable. Alright. So it looks like we've covered everything that came in through the chats. So before we sign off, just a just a quick wrap up for me. What we showed today, it's not about changing tools or forcing you into a new workflow. It's about giving observability teams a a better data foundation, one that lets you one that lets you keep more data searchable, control your costs, and scale without making any of those hard trade offs. You keep the tools you already know. You keep the workflows that your teams rely on. And over time, you get a much more sustainable path forward. So with that, I just wanna say thanks again for joining us today. If you'd like to continue the conversation or dig into how this would work in your environment, we'd be happy to follow-up. So please feel free to reach out. Again, thanks for joining. Have a good day.
Learn how a decoupled architecture for Splunk—powered by Imply Lumi and Federated Search—helps you keep more data searchable, reduce costs, and scale efficiently without changing existing Splunk workflows.
Observability at a Breaking Point: How Decoupling Unlocks Speed, Scale, & Savings
Learn how decoupled observability helps you do more with your Splunk data, reduce costs, and scale efficiently with Federated Search.
Watch nowKeynote: Powering Event-Driven Data with Apache Druid
The distinction between OLTP and OLAP is becoming less relevant as data architectures shift toward entities and events. In this session, we’ll delve into how Apache Druid’s event-first approach synthesizes...
Watch nowClosing Keynote: Charting the Future of Druid
What lies ahead for Apache Druid? Join us as we explore the evolving landscape of Druid’s query and storage engines, and how they are positioned to address the biggest challenges in event data for the future. Speaker: Gian...
Watch now