Historical Ethereum Data Access After EIP-4444
After years of R&D, Ethereum 2.0 is just around the corner. One of the EIPs being considered for inclusion in Eth 2.0 is EIP-4444 - Bound Historical Data in Execution Clients. EIP-4444 is a hot topic in the Ethereum community as it would add history pruning to Ethereum clients. Requiring nodes to store less history would benefit validators by making it easier to sync and verify chain state - unlocking higher gas limits without sacrificing security. Higher gas limits add capacity for rollups, increasing Ethereum’s throughput and reducing transaction costs. It would also support greater decentralization as it would make sure that even as Ethereum state grows, people can continue to validate the chain on consumer grade hardware.
This proposal would however remove some functionality for Ethereum clients. Nodes would no longer be able to serve requests for historical data once pruned. Yet, many dapps require access to historical data from Ethereum to show past user behavior like user account balances, transactions, votes and similar from the distant past.
This post will show how The Graph fills the gap left by EIP-4444 for dapps by indexing historical data from genesis and serving queries for that data. We will highlight The Graph's approach to verifiability of data and the roadmap for removing trust throughout the protocol, drawing parallels between the trust models of Ethereum and The Graph. The roadmap is a reflection of The Graph’s long-standing commitment to decentralization and values alignment with the Ethereum community. But first, what is The Graph and how does Ethereum benefit from a separate query protocol?
Using The Graph for historical data
The Graph is an incentivised indexing and query protocol for blockchain data. Developers create subgraphs (open APIs) that extract, process, and store rich, derived data from Ethereum. Dapp developers use subgraphs to serve smart contract data in their front-end applications using ad-hoc GraphQL queries. Indexers sync subgraphs and process queries to serve that data to the end users.
From a software architect's point of view, there’s a big benefit to separating the concerns of writing data (Ethereum) and reading data (The Graph). The blockchain’s job is to provide consensus over transactions in blocks, ideally with an expressive smart contract language. There’s a tradeoff space to navigate between speed, security, and decentralization. Separating the query responsibility to a protocol higher in the stack allows Ethereum to focus on scaling transaction throughput and lowering gas fees. As a layer on top, The Graph removes would-be responsibilities and introduces unique capabilities like the ability to aggregate across data sources, smart Indexer selection based on consumer preferences, a language for expressing query prices, and a high-throughput, low-latency state channels implementation for fast and efficient micropayments.
Many dapps are already moving away from using JSON-RPC APIs to using The Graph’s decentralized network by having their queries served by an open and permissionless market of Indexers. Dapps benefit from increased reliability and performance, a decentralized community of network participants, cost savings, and more.
Verifiability in The Graph
There are two sides to verifiability. First, there are verifiable processes to compute data. Second, there are methods to make verifiable claims about that data. Using the "two sides to one coin" analogy: verifiable processes would be the coin's head, verifiable claims tails, and the metal keeping the two sides together would be a commitment to the data.
In Ethereum, the method to verify the process is to validate the proof-of-work chain. The data commitments are block headers and Merkle roots and the verifiable claims come in the form of Merkle proofs. The analogous concepts In The Graph are Verifiable Indexing, Proofs of Indexing (PoI), and Verifiable Queries, respectively.
Stages of Verifiability
The Graph is a continuously evolving system with new features being added regularly. Every subgraph feature progresses through stages of verifiability, each with progressively minimized trust assumptions. Those stages are 1. experimental, 2. arbitration, 3. fraud proofs, and 4. validity proofs.
This granular evolution of verifiability within The Graph enables each subgraph or query to autonomously trade off cutting-edge features for higher security. Because subgraphs are independent, a dapp developer may react quickly to business needs while not sacrificing security for the whole ecosystem of subgraphs. Rather than having to decide whether to "move fast and break things" or to "engineer slowly and deliberately," we get the best of both where developers can choose the degree of verifiability while development is ongoing. Once a feature reaches the final stages of verifiability, however, it stays there forever.
Experimental features are not yet implemented in a way that the protocol can enforce all security guarantees, for example if the implementation is not deterministic or the API is unstable. Instead, the user must trust in the reputation of the Indexer (or Indexers, if the user cross-checks results). Consumers may mitigate risk by not relying on experimental features in production or selecting Indexers who are long-term incentive-aligned with the protocol and have demonstrated good behavior for an extended period of time.
The current process for dispute resolution in The Graph Network follows an Arbitration Charter that guides Indexers on expected slashing and dispute management. For features currently in Arbitration, the required level of trust moves to The Graph's protocol governance: The Graph Council (6 of 10 multisig), and their elected Arbiters.
In this stage, Indexers sign their claims on their indexing. Claims thought to be in error (ie. query returns incorrect data) can be disputed, and the offending Indexer slashed. Here is an example of a dispute and arbitration on mainnet.
Even in this early stage, Indexers participate in adding security by sourcing data from their own Ethereum archive node to validate the work of other Indexers.
Fraud proofs minimize the trust assumption to a 1 of N model. Refereed games replace the role of the Arbitrators, allowing any honest participant to slash a malicious Indexer and collect a reward if they can show bad behavior.
Validity proofs are zero-knowledge proofs that succinctly verify a given claim. This stage is the ultimate standard in verifiability.
Once a subgraph feature utilizes validity proofs fully, the consumer only needs to trust a block hash. The Indexer can provide all other required information to give the dapp developer and consumer complete confidence in the result.
Verifiable Indexing Roadmap
Verifiable Indexing validates the process of transforming a stream of blockchain data to a database tailored for efficient GraphQL queries. Given the above options, where are we now in terms of verifiable Indexing?
The vast majority of subgraph features are in the Arbitration stage today, with the plan to move to Fraud Proofs in the short term. Some subgraph features are in the experimental phase due to the outsized effort required to implement them deterministically, but none of those features are required for EIP-4444. In the Arbitration stage, Indexers submit PoIs regularly to collect indexing rewards. Because the data is public and on-chain, Indexers indexing the same subgraph can cheaply verify that a competing Indexer produced the same result as them. Any conflict found will be brought to the chain in a dispute, and the offending Indexer slashed.
Given that today's ZK-SNARKS do not meet the network’s performance goals for Indexing most subgraphs, it’s likely that Validity Proofs of Indexing won’t be available in the short term. Instead, the next step for verifiable Indexing will be fraud proofs of Indexing, reducing trust to 1-of-N.
Verifiable Queries Roadmap
Verifiable queries give a consumer confidence that the result of their query is correct for the database produced by indexing. This step is the most important to prevent consumers from being served fraudulent information. Dapps make billions of queries to The Graph every day, so the means of verifiability has strict performance requirements.
Today all query features are in the Arbitration stage. In the Arbitration stage, Indexers sign attestations of EIP-712 structured hashes containing the subgraph id, the body of the query, and the response. Since each query has a single, deterministic response, any conflicting attestations can be brought to the chain for a dispute. Consumers can probabilistically cross-check Indexers themselves and submit queries to Fishermen to cross-check responses across many Indexers.
Over the last two years, core researchers in The Graph ecosystem have been focused on developing the most efficient ZK-SNARK prover to enable proofs over a flexible GraphQL API. Using this prover, an Indexer will be able to deliver trustless responses to complex queries at the speed required for a great browsing experience, powering the full web3 vision. As a result, the roadmap is to skip past Fraud Proofs for queries and go straight to Validity Proofs. You can follow core dev updates and progress on the protocol’s R&D on the The Graph Forum.
Supporting Ethereum and EIP-4444
It’s important for the Ethereum community to always have reliable verifiable access to historical blockchain data. This is something The Graph community can support.
There are a few adjustments to our roadmap that will allow us to better serve this goal sooner. First, the community needs to ship an Ethereum Network Subgraph. Currently subgraphs are application-specific but we’ve been planning to also introduce network subgraphs that expose raw blockchain data like all the blocks, transactions, accounts, and logs. This Ethereum Network Subgraph would essentially be a superset of the JSON-RPC API with more advanced filtering, sorting, and pagination.
Next we need to support higher levels of Verifiable Indexing and Verifiable Queries on this raw blockchain data. Using validity proofs on the raw data gives us the same security guarantees as merkle proofs. And, fraud proofs for Indexing would reduce trust to the same 1-of-N model that light-clients have to rely on for an assumption of liveness in Ethereum. With these improvements The Graph Network will be able to ensure that verifiable access to historical Ethereum data is always available through an open marketplace.
We look forward to the exciting improvements coming in ETH 2.0 and supporting the Ethereum roadmap. With the Ethereum Network Subgraph, dapp developers will have access to the complete set of verifiable data made available by the web3 JSON-RPC APIs under the same trust model that they rely on today.
About The Graph
The Graph is the indexing and query layer of the decentralized web (Web3). Developers build and publish open APIs, called subgraphs, that applications can query using GraphQL. The Graph currently supports indexing data from 22 different networks including Ethereum, Arbitrium, Avalanche, Celo, Fantom, Moonbeam, IPFS, and PoA with more networks coming soon. To date, over 31,000 subgraphs have been deployed on the hosted service and now subgraphs can be deployed directly on the network! Over 24,000 developers have built subgraphs for applications, such as Uniswap, Synthetix, Foundation, Zora, KnownOrigin, Gnosis, Balancer, Livepeer, DAOstack, Audius, Decentraland, and many others.
If you are a developer building an application or Web3 application, you can use subgraphs for indexing and querying data from blockchains. The Graph allows applications to efficiently and performantly present data in a UI and allows other developers to use your subgraph too! You can deploy a subgraph to the network using the newly launched Subgraph Studio or query existing subgraphs that are in the Graph Explorer. The Graph would love to welcome you to be Indexers, Curators and/or Delegators on The Graph’s mainnet. Join The Graph community by introducing yourself in The Graph Discord for technical discussions, join The Graph’s Telegram chat, or follow The Graph on Twitter! The Graph’s developers and members of the community are always eager to chat with you, and The Graph ecosystem has a growing community of developers who support each other.
The Graph Foundation oversees The Graph Network. The Graph Foundation is overseen by the Technical Council. Edge & Node, StreamingFast and Figment are three of the many organizations within The Graph ecosystem.