Boost Indexing Performance with Substreams-Powered Subgraphs

Prefer to hear this blog aloud? Listen to the contents of this blog via The Graph Podcast.


In the dynamic world of web3, the advent of Substreams and Substreams-powered subgraphs signifies a transformative shift. This innovative technology redefines blockchain data indexing, offering unmatched composability and efficiency on The Graph Network.

Substreams is a unique, streaming-first system that employs advanced data transformation techniques to store and process blockchain data, making it readily available for various data stores or real-time systems. Substreams mark a new era in blockchain data indexing, representing a complete reimagining of what's possible with data which, up until this point, has been considered limited in speed or freshness. Substreams-powered subgraphs are a major step forward in advancing indexing performance.

Unveiling Substreams-powered Subgraphs: A New Paradigm in Data Indexing

Developed by core developers at StreamingFast, Substreams is an exceptionally powerful processing engine capable of consuming rich streams of blockchain data. Substreams allow you to refine and shape blockchain data for fast and seamless digestion by end-user applications. More specifically, Substreams is a blockchain-agnostic, parallelized, and streaming-first engine, serving as a blockchain data transformation layer. Powered by the Firehose (a files-based data streaming product by StreamingFast), it enables developers to write their own Rust modules, build upon existing community modules, provide extremely high-performance indexing, and sink their data anywhere.

Substreams-powered subgraphs combine the power of Substreams with the queryability of subgraphs. With the release of Substreams-powered subgraphs, developers can now utilize Substreams to enhance syncing speeds in combination with subgraph functionalities for data querying. Substreams-powered subgraphs are served on The Graph Network and leverage advanced parallelization techniques for storing and processing blockchain data.

Developers who build using Substreams-powered subgraphs can reduce syncing time by more than 100x, while also {see footnote} improving overall performance, providing a new and fresh form of data agility. For Indexers, running Firehose and serving Substreams saves time and resources by horizontally scaling and increasing efficiency, reducing processing and wait time.

You can start building with Substreams now – learn how via the StreamingFast documentation.

Benefits of Substreams and Substreams-powered Subgraphs

The integration of Substreams-powered subgraphs into The Graph Network represents a significant advancement, bringing about myriad benefits for developers and users. Indexing rewards have been enabled for Substreams-powered subgraphs after The Graph Council approved GIP-0053 (presently active on Ethereum mainnet only). As the community continues to push the boundaries, these benefits will revolutionize access to blockchain data.

Substreams and Substreams-powered subgraphs will transform the way we handle data in the realm of decentralized applications (dapps). This is because Substreams can load data into different kinds of data sources, including subgraphs! By combining the benefits of Substreams with subgraphs, Substreams-powered subgraphs bring greater composability and high-performance indexing to The Graph. This will introduce many new use cases throughout The Graph ecosystem, as users can now reuse their Substreams modules to output to different sinks such as PostgreSQL, MongoDB, Kafka and more.

The onset of Substreams-powered subgraphs introduce a list of benefits to The Graph Network, including:

  • High-performance indexing: Substreams optimize indexing performance orders of magnitude faster through large-scale clusters of parallel operations (think BigQuery).
  • Composability: With Substreams-powered subgraphs, you can stack Substreams modules like LEGO blocks, and build upon community modules, further refining public data.
  • Sink anywhere: Sink your data to anywhere you want: PostgreSQL, MongoDB, Kafka, subgraphs, flat files, or even Google Sheets.
  • Programmable: Use code to customize extraction, do transformation-time aggregations, and model your output for multiple sinks.
  • All the benefits of Firehose: When using Substreams-powered subgraphs, you also get all the benefits of Firehose, such as the lowest latency and no polling, higher availability, the best data model, and flat files.

Substreams-powered Subgraphs in Action

The benefits of using Substreams-powered subgraphs are already demonstrating value in real-world applications! One of the most striking examples of these benefits is the dramatic improvement in sync times. Consider the Uniswap-v3 subgraph, which traditionally took 2 months to sync. With a Substreams-powered subgraph, this process was completed in just 20 hours. That's a staggering 72x speed improvement, reducing sync time from 1,440 hours to a mere 20. This kind of efficiency can revolutionize how users interact with data on The Graph Network. (Note also that graph-node indexing performances have also massively improved recently: innovation on all fronts!)

Examples of Substreams-powered Subgraphs at Work

As development on Substreams continues, the transformative potential is one of continuous growth and evolution. Messari, a leading provider of crypto market insights, is already leveraging the power of Substreams and Substreams-powered subgraphs to enhance their services. By harnessing these technologies, Messari can provide more timely and accurate market data, empowering their users to make informed decisions in the fast-paced world of crypto trading.

Talking about the impact of Substreams and Substreams-powered subgraphs, Vincent Wen, Engineering Manager for on-chain data at Messari, says, “For our team, Substreams has been a game-changer in radically improving the indexing speed and composability of subgraphs. It’s given us much faster development velocity and shorter iteration cycles.” Wen adds, “And since Substreams-powered Subgraphs expose substreams data through an already familiar GraphQL API, they combine the best of both worlds.”


For our team, Substreams has been a game-changer in radically improving the indexing speed and composability of subgraphs. It’s given us much faster development velocity and shorter iteration cycles.

-Vincent Wen, Engineering Manager for on-chain data at Messari

As already mentioned, the Uniswap v3 Substreams-powered subgraph is a prime example of what is possible with Substreams. Created by members of the Uniswap community, the Uniswap v3 Substreams-powered subgraph is also being used by Lido, a versatile liquid-staking solution. Thanks to Substreams and Substreams-powered subgraphs, Lido can provide an optimized, streamlined experience for its users served by data that is always up-to-date and readily available.

DappLooker, a platform for analyzing and visualizing blockchain data, is using Substreams-powered subgraphs to enable real-time analytics and no-code dashboards, providing users with immediate and accurate information for enhanced web3 analytics.

If you want to experience Substreams-powered subgraphs for yourself, there’s no better way than trying it out firsthand! You can get started with the transformative composability and efficiency of Substreams-powered subgraphs by using Uniswap V3 today. Soon, there will be many more options to choose from, including an ERC-20 Substreams-powered subgraph.

If you feel inspired and want to get started building your own Substreams-powered subgraph, then apply for a grant or submit a proposal to The Graph Foundation - apply today! Additionally, if you need assistance or support, please email [email protected].

Each real-world use case underscores the benefits of Substreams and Substreams-powered subgraphs and, as demonstrated with Uniswap V3, opens the door to new and innovative applications.

The Road Ahead

Enabling Substreams-powered subgraphs on The Graph Network for Ethereum mainnet is the first milestone in a journey of expanding the decentralized network to serve new types of data services beyond traditional subgraphs. This initial step allows The Graph ecosystem to immediately harness the benefits of Substreams. Next, core dev teams will focus on expanding the network’s service offerings to include support for additional chains, Substreams, Firehose and many new sinks as a service.

Today, only subgraphs can be queried through the network. However, Indexers are already setting up their indexing nodes to support Substreams, and core devs are working to enable The Graph Network to serve Substreams. This leap forward involves having the infrastructure in place to support streaming-first systems like Substreams, the capacity to handle payments for continuous streaming, increased data-processing demands, and the ability to ensure seamless integration with existing systems and services.

Once the decentralized network is ready for more data services, Indexers will have the opportunity to compete to offer the best Substreams and Firehose services around, all serving users of The Graph Network. Naturally, Indexers will vary in what type of data they serve, with some focusing on historical processing and others on real-time data, depending on available compute resources and which data types they’re most interested in supporting. But this competitive environment will drive innovation throughout web3, solidifying The Graph Network as the marketplace for all web3 data types.

Final thoughts

Today marks a significant milestone in the evolution of The Graph Network. Innovative technologies like Substreams-powered subgraphs represent a bold step towards the future, opening up new avenues for developers and web3 users alike. From dramatically improved sync times to redefining the landscape of data services in the decentralized world, it’s clear that The Graph is on track to organize more and more of the world’s public data.

The integration of Substreams will give rise to a novel economy of cache sharing between providers. This means that anyone wanting to trade off CPU time in exchange for pre-processed bundles of flat files from someone else will have the opportunity to do so. This innovative approach to resource sharing will further enhance the efficiency and flexibility of The Graph Network.

Developers are encouraged to explore Awesome Substreams or try it out online. You can also explore some examples, jump on the quickstart for Substreams-powered subgraphs, visit the quickstart for standalone Substreams, and learn more in the documentation - start maximizing the speed and flexibility of your dapp!

The journey towards a more efficient, dynamic, and powerful decentralized network has begun, and every network participant has a role to play in this exciting journey. Embrace the future with Substreams and Substreams-powered subgraphs, and be a part of the next evolution in data indexing and querying on The Graph Network!

*The improvement in performance will differ based on several factors, including the complexity of the Substreams modules once a subgraph is fully ported over to the new technology. The comparison is based on a Substreams-powered subgraph written to produce equivalent data to its traditional subgraph counterpart.*

About StreamingFast

StreamingFast is foremost one of the world’s experts at processing and indexing blockchain data. Its core innovations, the Firehose and Substreams, are files-based and streaming-first approach that enables high-performance indexing on high throughput chains.

Launched in 2018 and originally backed by top investors, StreamingFast embarked on a journey towards becoming a fully employee-owned enterprise. This pivotal transformation was made possible through a core developer grant from The Graph in June 2021.

As a core developer on The Graph, StreamingFast is integrating its Firehose and Substreams products into the The Graph stack, enabling extremely high-performance indexing and opening up the protocol to serving a wide range of exciting new data use cases.

You can follow StreamingFast on Twitter and on Discord

About The Graph

The Graph is the source of data and information for the decentralized internet. As the original decentralized data marketplace that introduced and standardized subgraphs, The Graph has become web3’s method of indexing and accessing blockchain data. Since its launch in 2018, tens of thousands of developers have built subgraphs for dapps across 40+ blockchains - including  Ethereum, Arbitrum, Optimism, Base, Polygon, Celo, Fantom, Gnosis, and Avalanche.

As demand for data in web3 continues to grow, The Graph enters a New Era with a more expansive vision including new data services and query languages, ensuring the decentralized protocol can serve any use case - now and into the future.

Discover more about how The Graph is shaping the future of decentralized physical infrastructure networks (DePIN) and stay connected with the community. Follow The Graph on X, LinkedIn, Instagram, Facebook, Reddit, and Medium. Join the community on The Graph’s Telegram,join technical discussions on The Graph’s Discord.

The Graph Foundation oversees The Graph Network. The Graph Foundation is overseen by the Technical Council. Edge & Node, StreamingFast, Semiotic Labs, The Guild, Messari, GraphOps, Pinax and Geo are eight of the many organizations within The Graph ecosystem.


Category
Graph Protocol
Author
StreamingFast
Published
July 20, 2023

StreamingFast

View all blog posts