How Substreams Solves Solana’s 5 Indexing Pains for Real-Time Apps

Solana’s blazing-fast throughput and low-cost transactions have made it a powerhouse for DeFi, DePIN, and high-frequency applications. But its speed and unique architecture come with a hidden cost: a lack of reliable and performant indexing providers. Traditional indexing tools buckle under Solana’s high TPS, frequent re-orgs, fragmented account states, and broken decodings.

Enter Substreams, The Graph’s purpose-built indexing solution, which is perfect for Solana’s unique challenges. This blog breaks down how Substreams solves 5 critical pain points developers face when indexing Solana data—and why leading protocols like Helium, Hivemapper, and PropellerHeads choose to rely on Substreams as their preferred indexing solution.

Problem 1: High-TPS Chains Require Real-Time Indexing

Traditional RPCs often struggle to provide real-time data. Even custom RPC-based indexers can drift from the chain’s head slot by up to 5 seconds. Real-time apps (e.g., DEX aggregators, NFT mints) miss critical transactions or serve stale data.

Firehose streams blocks directly out of the Solana node. The packets are transmitted, maximizing performance and efficient use of bandwidth.

Why It Matters:

  • DePIN apps (e.g., Helium) can track state reliably in real time.
  • AI agents can begin triggering actions based on real-time events

Problem 2: Backfilling Historical Data Creates Major Bottlenecks

RPC-based indexing means paying for every historical query and maintaining redundant data pipelines. Substreams cuts out costly middlemen with direct blockchain processing, bypassing RPCs entirely to eliminate rate limits and per-query fees—giving you raw, unfiltered data access.

Archival RPCs charge premium fees and take days, or even weeks, to backfill depending on the block range. Google Big Table imposes sequential reprocessing unless manually sharded and parallelized. Even the best competing streaming engines, like Yellowstone Geyser, have downsides in that they don’t provide any historical data.

Substreams processes historical blocks in parallel. The result is up to a 72,000% reduction in indexing speeds and a 70% reduction in total infrastructure costs. In Theoriq’s case, this meant time to market went from weeks to days.

Why It Matters:

  • AI agents train on years of on-chain events in just hours.
  • Analytics platforms (e.g., Amberdata) build comprehensive datasets without RPC bottlenecks.
  • Substreams consistently cuts infra costs by as much as 70%.

Problem 3: Automatic Re-orgs Create a Balance Between Speed and Accuracy

Re-orgs are an inherent part of building on Solana, but handling them shouldn't fall on the developer. Since re-orgs occur frequently and vary in number of slots, the typical options are to wait for finalization or manually manage them by pruning and backfilling data.

Substreams abstracts this complexity by maintaining an in-memory representation of the chain. It processes blocks in canonical order, automatically rolls back and replays during re-orgs, and tracks a cursor for seamless reconnection. Each stream handles re-orgs in isolation, ensuring safety across users and enabling flexible starting points and module configurations. This approach makes real-time indexing safe, consistent, and deterministic.

Why It Matters:

  • Solvers (e.g., PropellerHeads) avoid stale data from unmerged forks.

Problem 4: Fragmented Account State

Solana stores program data across multiple accounts. However, this design results in massive and frequent state changes per slot, making it economically and technically infeasible to persist full historical state due size and performance constraints.

While RPC nodes offer access to the latest confirmed state, they don’t support historical lookups or diffs. Queries like getProgramAccounts force nodes to scan large datasets, leading to memory-intensive operations and timeouts. If the result set is too large, it may be truncated. Similarly, real-time streaming via Yellowstone Geyser exposes account changes as they happen but lacks persistence—disconnects or validator restarts can cause data loss.

Without reliable history, developers are left manually reconciling events like ownership transfers (e.g., unassignProxy), often needing redundant infrastructure to piece together incomplete state changes. Substreams enhances data accessibility, enabling developers to track account changes across entire programs coupled with a 3-month moving data window, empowering deeper trend analysis and audit capabilities without millisecond time-constraints.

Why It Matters

  • DeFi protocols manage rewards and liquidity more holistically.
  • NFT platforms track collections without manual account stitching.
  • Protocols can manage account data without redundant infrastructure.

Problem 5: Out of the Box IDL Decodings

Interface Definition Language (IDL) files are critical for decoding program data on Solana. They define account structures, event formats, and instruction schemas. However, IDL versioning can introduce breaking changes, like altered event layouts or field names, making reliable decoding across time non-trivial.

Traditionally, developers manually import Rust IDLs and write custom logic to parse both events and instructions. Given the decoupling of top-level and inner instructions, manual parsing is a complex and error-prone task. Substreams simplifies this by auto-importing IDLs through the CLI and un-nesting all inner instructions into a unified InstructionView, exposing the resolved program_id and accounts. With native Rust support and built-in CPI event decoding, Substreams offers a reliable, fire-and-forget approach to indexing across evolving IDL versions, including community support for older Anchor versions.

Why It Matters

  • Teams ship faster with shared modules for common protocols.
  • Developers skip writing custom IDL parsers for every protocol.

Use Cases: Substreams in Action

  1. DePIN (Helium): Stream IoT device states in real time, even during network surges.
  2. Trading: Backtest strategies against 6 months of DEX data in hours, not weeks.
  3. AI Agents (Theoriq): Ingest real-time DAO votes and NFT mints to trigger context-aware actions.
  4. Analytics (Amberdata): Build live dashboards with TVL, user activity, and trade volume—no RPC lag.

Conclusion: Indexing Shouldn’t Be Your Bottleneck

Solana’s speed is useless if your dapp can’t keep up. Substreams gives you real-time data at Solana’s native speed, historical insights without RPC jail, and pre-built tooling for complex protocols.

Ready to Build?

Stop wrestling with RPCs. Start building with Substreams.

About The Graph

The Graph  is the leading indexing and query protocol powering the decentralized internet. Launched in 2018, it has enabled tens of thousands of developers to effortlessly build  Subgraphs  and   Substreams  across countless blockchains, including Ethereum, Solana, Arbitrum, Optimism, Base, Polygon, Celo, Soneium, and Avalanche.

Discover more about how The Graph is shaping the future of decentralized physical infrastructure networks (DePIN) and stay connected with the community. Follow The Graph on  X LinkedIn Instagram Facebook Reddit Farcaster  and  Medium. Join the community on The Graph’s  Telegram, join technical discussions on The Graph’s  Discord.

The Graph Foundation  oversees The Graph Network.  Edge & Node StreamingFast Semiotic Labs GraphOps Pinax   Wonderland  and  Geo  are seven of the many organizations within The Graph ecosystem.


Categories
Graph UpdatesRecommended
Published
June 3, 2025

StreamingFast

View all blog posts