Splunk Smartstore vs Lumi Loglake

Jun 16, 2026
David Gee

Lumi Loglake lets Splunk teams query logs directly in object storage — AWS S3, Delta Lake, Apache Iceberg — using standard SPL, with results returned as native Splunk events that work with existing dashboards, alerts, and applications, including Enterprise Security. No schemas to define, no pipelines to build, no data to move. Queries run on virtual compute pools that spin up when a query fires and spin back down after an idle timeout you define.

If you run Splunk, you may be thinking: doesn’t SmartStore already do this? My warm buckets are in S3 today.

They are. And that’s where the similarity ends. SmartStore and Loglake put data in the same place but are opposite architectures for what happens when you search it. SmartStore copies data out of object storage to query it on a fixed fleet of indexers. Loglake queries it where it lives, on compute pools that run only when queries do. That single difference cascades through performance, scaling, operations, and cost — and it’s worth walking through exactly how.

Side by side

 SmartStoreLumi Loglake
Where data is searchedCopied from S3 to local indexer cache firstIn place, in object storage
Compute modelFixed indexer fleet, provisioned for peak, always onVirtual compute pools; spin up on query, spin down when idle
Long-lookback searchesEvict hot cache; degrade concurrent searchesRun on dedicated pools; zero impact on real-time workloads
Workload isolationShared indexers and shared cacheSeparate compute pools per workload
ScalingAdd peer nodes, replicate configs, rebalance dataResize a pool; no data movement
Cloud constraintsIndexers co-located with object store’s cloudAny region
Query accessSplunk only (proprietary bucket format)SPL, Spark SQL, LogQL, ANSI SQL
Cost modelAlways-on infrastructure, regardless of usageCompute runs only while queries do


What SmartStore actually does with your data

SmartStore was the right idea in 2018. Moving warm data off expensive indexer disk and into object storage cut retention costs substantially, and Splunk Cloud runs on it today.

But look at the mechanics. Hot buckets are built on indexer disk, as they always were. When a bucket rolls to warm, it’s uploaded to the object store, which holds the master copy. So far, so decoupled.

Then comes the critical detail, stated plainly in Splunk’s own documentation: Splunk software can’t read data from S3 directly. Every bucket a search touches must first be localized — fetched from the object store and written back to indexer disk — before it’s searchable. A cache manager handles this, fetching buckets on demand and evicting others to make room. In SmartStore, object storage is a parking lot, not a query target.

The cache works because of a statistical bet. Per Splunk’s architecture documentation, 97% of searches look back 24 hours or less, so the cache manager favors recent buckets, and for everyday monitoring the hit rate stays high.

The 3% you bought retention for

The problem is the other 3% — and it’s not a random 3%. The searches that reach into warm data are incident investigations, threat hunts, compliance audits, and long-range trend analysis: precisely the workloads that justify retention in the first place. Under SmartStore, each one pays the copy-back tax: fetch buckets from S3, write them to local disk, evict whatever was cached.

Splunk’s Validated Architectures documentation describes the consequence directly: searches spanning many buckets, such as all-time or wildcard searches, evict recent and heavily used data from the cache, degrading search performance on recent data until the cache repopulates. One analyst running a 90-day investigation slows down everyone else on the cluster.

The recommended mitigations are revealing: cap how far back a role can search with srchTimeWin, throttle long-running searches with workload management rules, tune the eviction algorithm. Every one of these is a restriction on users searching their own data. When the documented defense against an architecture’s failure mode is preventing the queries it was deployed to serve, the architecture is the problem.

Loglake has no equivalent failure mode because it has no cache to thrash. Queries against data in object storage run on their own virtual compute pool, fully separate from real-time monitoring — and you can define multiple pools of different sizes for different workloads. Put scheduled compliance reports on one pool and ad hoc investigations on another, and the 90-day deep-dive touches nothing else.

Storage decoupled; compute never did

The cache exists because SmartStore changed where data sits without changing what searches it. Indexers remain a fixed fleet, handling ingest and search on the same machines, provisioned for peak whether or not peak arrives. There is no workload-based scaling; growing the cluster means manually adding peer nodes and rebalancing data across them.

The operational requirements compound the rigidity. Per Splunk’s documentation: indexers must be hosted in the same cloud as the object store — AWS indexers for S3, GCP for GCS, Azure for Blob. All index settings must be identical across every peer. A list of indexes.conf settings must stay at their defaults, and others are silently ignored. In multisite clusters, using data model acceleration — the foundation of Enterprise Security — requires disabling search affinity entirely.

Splunk knows the compute layer is the gap: version 10.x introduces Ingest-Tier Scaling and Indexing/Replication Separation. But these are explicitly constrained by SmartStore behavior and limited to a single S3 region — incremental elasticity bolted onto the same architecture.

Loglake’s compute model inverts the economics. Queries run on virtual compute pools: you choose a pool size, the pool spins up when queries fire, and it spins back down after an idle timeout you set. Because the pools hold no data, resizing one is a configuration change — no peer nodes to add, no buckets to rebalance, no cluster maintenance window. Scaling SmartStore is a project; scaling Loglake is a dial. There’s no co-location requirement either — Loglake reaches object storage in any region — and compute runs only when queries do, so retention decisions stop driving infrastructure cost. Capacity planning for the data you might someday search becomes a legacy concern.

One platform, not a bolt-on

Loglake is part of Lumi, the observability warehouse — not a sidecar for archived data. Live data flows into Lumi’s hot tier through the plumbing Splunk teams already run: HEC, S2S from universal and heavy forwarders, Ingest Actions, and OpenTelemetry. Real-time detection and long-lookback investigation run on one platform, queried through one experience, with the SPL your team uses every day.

And because the data isn’t locked in a proprietary bucket format, the same datasets are queryable from Databricks via Spark SQL, Grafana via LogQL, and any BI or AI tool via ANSI SQL — no duplication, no second pipeline. The data stays yours: full export is available through support if an account ever terminates.

Your data is already in object storage

If you’re running SmartStore, you’ve already made the architectural admission: log data belongs in S3. SmartStore got it there, then kept a 2018 constraint — copy it back before you can search it, on compute you provision for peak and pay for around the clock.

Loglake removes the constraint. Same SPL, same dashboards, same Enterprise Security — querying the data where it already lives.

Request a demo to see Loglake query unstructured logs in S3 with standard SPL — no schemas, no catalogs, no copy-back.

Related reading: A First Look at Lumi Loglake · Lumi Loglake vs Splunk Federated Search for S3

Other blogs you might find interesting

No records found...
Jun 11, 2026

Supercharging Schema-On-Read: Logs in Object Storage Don’t Need a Data Catalog

Machine data architectures are rapidly changing. As telemetry volumes continue to grow and as costs rise, organizations are increasingly moving logs and other machine data into object stores such as AWS S3....

Learn More
May 21, 2026

A First Look at Lumi Loglake: Query Logs Where They Live

TL;DR: Imply Lumi Loglake is a lakehouse (separated compute/storage) architecture for unstructured logs that reduces costs from 40% up to orders of magnitude on your hardware/AWS/Azure bill used to run your...

Learn More

Ready to decouple your observability stack?
No workflow changes. No migrations. More data, less spend.

Request a Demo