File Data Sources Tutorial: Using Subgraphs to Index Off-Chain Data
TLDR: By combining IPFS and Arweave files with on-chain information, subgraphs are able to provide new data solutions to dapps. Dive into this blog post to explore both problem-solving techniques a subgraph developer may encounter as well as building a File Data Sources subgraph that indexes Lens Protocol posts.
The completed of this tutorial’s File Data Sources subgraph.
The published live on The Graph Network.
Technology And Terminology in This Tutorial
- A feature designed to index off-chain files and their contents. Currently, File Data Sources can index these off-chain files stored on Arweave and IFPS.
- A decentralized storage network that focuses on providing permanent, unchangeable data storage. In contrast to other storage solutions, it offers a pay-once model, ensuring that data is stored forever without recurring fees.
(InterPlanetary File System) - A peer-to-peer network that enables the decentralized sharing and storing of files, aiming to make the web faster, safer, and more open. Unlike Arweave's permanent storage model, IPFS allows for more dynamic data updating and retrieval but doesn't inherently guarantee perpetual storage of data.
- A decentralized social graph designed to give users control over their content and connections in a web3-native space. Lens Protocol stores post content on both Arweave and IPFS which is considerably more cost-effective than storing this data on-chain.
ABI - An ABI (Application Binary Interface) is a standardized method for interacting with smart contracts in a blockchain, defining how to call functions, how data is structured, and how results are interpreted. Subgraphs use ABIs to decode and index smart contract events and function calls, enabling the querying of blockchain data in an efficient and structured manner.
- Data Source Templates are used for indexing dynamic data sources. There are two ways Data Source Templates are used in subgraphs: to index off-chain files and to programmatically add new contracts to index on runtime like proxy smart contracts or factories.
Proxy pattern smart contracts - This smart contract architecture consists of a single implementation smart contract and possibly many proxy smart contracts. Learn more about proxy pattern smart contracts by watching a by Patrick Collins and reading from OpenZeppelin.
A File Data Sources Use-Case
In the past, a subgraph indexing off-chain data was linear, which means that if the subgraph was triggered to initiate indexing off-chain data, the on-chain indexing would pause until the file was retrieved. Now, File Data Sources allows for parallel indexing of both on-chain and off-chain data simultaneously, improving sync speed and reliability. Due to these improvements, now is a great time to learn to build with File Data Sources!
Let’s start with a use case. Wouldn’t it be powerful to gather all of Lens Protocol’s posts, including the usernames of those who posted?
From this subgraph, we could perform numerous analyses such as trend analysis, influencer identification, content recommendation, and more using libraries such as or even create no-code dashboards from .
Confirming Data Architecture and our Subgraph’s Specs
Before we start building, we need to confirm that there are both on-chain and off-chain data sources to trigger the indexing of our File Data Sources subgraph.
Most NFTs, including some Lens Protocol V1 posts, store their content and metadata in JSON files off-chain while storing their state on-chain. After confirming that and has its and smart contracts on Polygon, we are ready to move onto creating our subgraph’s specs.
Here are the two key operations our subgraph will perform to properly index off-chain data:
- Gather file IDs from an On-chain Event
- Index off-chain files from Arweave and IPFS with File Data Sources
- We will be gathering the post’s content from the JSON file stored on Arweave or IPFS and storing the file’s metadata in a
PostContent
entity in our subgraph.
- We will be gathering the post’s content from the JSON file stored on Arweave or IPFS and storing the file’s metadata in a
Off-chain File IDs That Trigger File Data Sources
Arweave
- As of version 0.33.0, Graph Node can fetch files stored on Arweave based on their from an Arweave gateway (). Arweave supports transactions uploaded via (formerly Bundlr), and Graph Node can also fetch files based on Irys manifests.
IPFS
- Graph Node supports (CIDs), and CIDs with directories such as
Qm.../metadata.json
.- For example, if the URI emitted from the chain looks like this:
https://ipfs.infura.io/ipfs/QmR7baNsHXNXEThcZNSw1SpRu1ZvKjaCnakEemT94Ur9Pn
, we need to extract this file ID to trigger IPFS File Data Sources:QmR7baNsHXNXEThcZNSw1SpRu1ZvKjaCnakEemT94Ur9Pn
- For example, if the URI emitted from the chain looks like this:
Now that we’ve confirmed our data’s architecture and understand our subgraph’s specs, let’s work on our first subgraph spec: gathering file IDs from an on-chain event.
Gathering File IDs from On-chain Events
If we were building our own smart contract, it would be quite easy to ensure it emits an event with a file’s ID for File Data Sources. As we are not building our our own smart contract, we will need to dig into Lens’ smart contracts to see if we can find an event that emits a file’s ID for our File Data Sources subgraph.
Let’s explore events.sol
to find an event that may contain a file’s ID that we can use to trigger File Data Sources off-chain indexing.
If we look at the PostCreated
event, we find that it is emitted every time a post is created. We can also see it has the contentURI
field; this field may contain file IDs that we need to trigger File Data Sources.
Let's deploy a subgraph that indexes this PostCreated
event and its contentURI
field to see if it contains the file’s IDs we need to trigger File Data Sources indexing. Once we confirm that this contentURI
has the file’s ID of the file that we need, we can extend this subgraph further to trigger File Data Sources indexing of that file.
Spin up our Subgraph to Index the PostCreated
Event
Start at and create a new subgraph that will choose Polygon as its indexed blockchain as Polygon is the chain where the smart contracts are deployed.
Using , init
our subgraph using the graph-init
command listed in the bottom right of Subgraph Studio.
When the CLI asks for a smart contract address, point it to the (0xDb46d1Dc155634FbC732f92E853b10B288AD5a1d
) and confirm that we want to index the events as entities. We want to target the proxy smart contract as this is the smart contract that Lens participants interact with and will emit events from both the implementation smart contract and proxy smart contract.
If the CLI ask us for the start block and ABI, we can go to and input the proxy smart contract address to gather that data.
After authorizing, deploy the subgraph!
This subgraph is now indexing the proxy smart contract, but as we will soon see, this subgraph does not yet have the capacity to index both the implementation and the proxy smart contract. We need our subgraph to capture the PostCreated
event defined on implementation smart contract that emits from the proxy smart contract.
Let’s move onto exploring this problem:
Problem #1: Indexing a Proxy Pattern Smart Contract
Proxy smart contracts emit both their events and their implementation’s events.
When setting up a subgraph in graph-cli
, the CLI scaffolds event handlers and entities using the provided contract's ABI. This ABI is obtained from the block explorer where the smart contract is deployed. If the CLI is pointed to a proxy smart contract, it only gathers this information from the proxy and not from its implementation.
This means our current subgraph is unaware of the implementation smart contract’s PostCreated
event! It needs the ABI, event handlers, and entities for this PostCreated
event!
Potential Solution A: Provide CLI with both Proxy and Implementation Smart Contracts
As we are indexing just a singular proxy and a singular implementation smart contract, we could provide the CLI with the addresses of both the implementation and the proxy smart contracts. Consequently, the subgraph will index any event that is emitted from both the proxy and the implementation smart contract.
When the CLI asks “Add another contract?” we could say yes
, and we would be able to simply index both smart contracts and see all events emitted from both smart contracts.
Unfortunately Potential Solution A will not work due to an edge-case! In rare cases, the ABI in the block explorer does not accurately reflect the smart contract.
If we look into the smart contract code of the Lens Protocol’s implementation smart contract, we can see that the PostCreated
event is in the events.sol
file, however it is nowhere to be found in the block explorer’s ABI.
This means using the CLI to gather the implementation ABI will not solve this problem as the ABI gathered will not have have the all-important PostCreated
event data.
Let’s try another solution:
Potential Solution B: Manually Creating the PostCreated
ABI Entry
Let’s manually build the implementation’s ABI so that our subgraph correctly captures the PostCreated
event.
To get the ABI entry that properly reflects PostCreated
, we will compile events.sol
in (any other smart contract framework works as well) then copy/paste the PostCreated
event definition into our subgraph’s ABI. This should solve our problem!
Start by copying the events.sol
smart contract from the and place it in the contracts folder of a new Remix IDE session.
You should see a few lines of code erroring out. We can comment those lines out.
The reason we can just comment these parts out is that we don’t need the entire file to compile, just the PostCreated
event.
Let’s go ahead and compile events.sol
:
Once compiled, look in the artifacts
folder for your PostCreated
ABI in Events.json
:::
Here’s our ABI entry we are missing!
Take this ABI entry and copy/paste it into our subgraph’s abi.json
. Now, our subgraph’s ABI is ready as it can now see the PostCreated
event emitted from the implementation smart contract that passes through the proxy smart contract.
Problem #1 has been solved!
Now that we have a subgraph that can accurately see the PostCreated
event passing through the proxy smart contract we’re indexing, we can move onto our next problem to solve.
As we recall, the initial intent of spinning up our subgraph was to see if we could find a file ID to trigger File Data Sources. However, as we had to manually generate our ABI, we encounter another interesting problem to solve.
Problem #2: Handling the PostCreated
Event
Now that our subgraph can see the PostCreated
event, we must extend our subgraph to handle the PostCreated
event data. Usually the CLI scaffolds this out automatically, but as we manually had to update the ABI, we must also manually extend our subgraph. This will be a good exercise to learn how data is handled in a subgraph.
Solution: Extend our Subgraph to Index the PostCreated
Event
Starting with our subgraph.yaml
file, let’s add the PostCreated
event handler, as well as define its PostCreated
entity:
Let’s move onto schema.graphql
.
Here, we will include the PostCreated
entity we just defined in subgraph.yaml
. We won’t include all the fields on the PostCreated
event as we don’t need that data. We just need to see if the contentURI
field contains a file ID to trigger File Data Sources.
With our subgraph.yaml
manifest and the schema.graphql
ready to accept the PostCreated
on-chain event, we are ready to build the handlePostCreated
handler in our mappings.ts
.
Before we do so we should run graph codegen
in our terminal.
Tip: any time we alter schema.graphql
, we should run graph codegen
to update our types as we import those autogenerated files at the top of our mappings.ts
.
Read the comments to take a dive into a step-by-step explanation of our mappings.ts
logic.
We’ve updated our subgraph.yaml
, our schema.graphql
, and our mappings.ts
files to properly reflect our updated ABI.
Problem #2 has been solved!
We’ve faced two problems and solved them! Now is the time to re-deploy our subgraph.
Go back to Subgraph Studio and run the graph-deploy
command to re-deploy.
Once it is deployed, run these two queries in your subgraph’s Playground to see the various URIs passed through the contentURI
:
Example Arweave query and response
Example IPFS query and response
We have found file IDs within the URIs!
- Arweave example file ID
s9qinED7mYvrNrYTEbRlNb60bVT754LkVpQoW7Ffi24
- IPFS example file ID:
QmTWJEzcxxcPjnB8Xj8S4EUJ7RxHGkfcDfMZyYRcQz7eYN
Let’s move on to our second subgraph’s spec: indexing these off-chain files from Arweave and IPFS with File Data Sources.
Indexing Off-chain Files from Arweave and IPFS with File Data Sources
In this section we will focus on programmatically triggering File Data Sources with the file IDs we have found, as well as the handling the this off-chain data.
Let’s start this process by designing two subgraph templates for Arweave and IPFS data:
File Data Sources Templates Specs
We will create two templates. One template will be for Arweave and the other will be for IPFS. Each template will:
- Have a unique name.
- Name the
PostContent
entity as the entity that will receive both Arweave and IPFS data. - Name the
handlePostContent
handler as the handler for both Arweave and IPFS data that will pass data to thePostContent
entity. - Maintain off-chain and on-chain data separation. Read more about data separation and other File Data Sources limitations .
Adding Two Templates to subgraph.yaml
We will add the ArweaveContent
and IpfsContent
templates below the proxy smart contract we are already indexing:
Extending our Subgraph to Reflect our new Templates
Our templates refer to the PostContent
entity and handlePostContent
handler. Let’s extend our subgraph to reflect these changes. We will:
- Create
PostContent
inschema.graphql
- This is where the off-chain data will be stored.
- Update
handlePostCreated
inmappings.ts
- This is where we will pass the file IDs gathered from the on-chain event into the template using
DataSourceTemplate.createWithContext()
.
- This is where we will pass the file IDs gathered from the on-chain event into the template using
- Create
handlePostContent
inmappings.ts
- Once File Data Sources is triggered, we need to pass that data into the
PostContent
entity we previously created inschema.graphql
.
- Once File Data Sources is triggered, we need to pass that data into the
Let’s get to work!
Create PostContent
Entity in schema.graphql
This entity will store content and the file ID that triggered File Data Source indexing from the Arweave and IPFS files.
Please note: the entity’s id
is specific to the entity itself and not the file ID of the off-chain file. The file ID of the off-chain file will be stored as hash
.
Tip: Any time we are creating an entity that is an event-log of either on-chain or off-chain data with no alterations in the mappings.ts
, it’s best to include (immutable: true)
as seen in the snippet below as this greatly improves indexing speed. See this to learn more about immutable entities and their performance benefits.
Now our subgraph.yaml
has the File Data Source templates ready, and our schema.graphql
is ready to accept data from both on-chain and off-chain sources.
Let's move on to extending our handlers in mappings.ts
.
Update handlePostCreated
in mappings.ts
For the sake of this tutorial, we’ll keep it simple by just gathering specific IDs from just one targeted URI structure for both Arweave and IPFS; just know that we wanted to gather all the Arweave and IPFS files, we’d need to build out more strategies to gather all the various returned URIs.
Read the comments to get a step-by-step explanation.
Now that we’ve triggered our templates, they are looking for a handlePostContent
handler that will handle the off-chain data that they are being passed.
Create handlePostContent
in mappings.ts
Read the comments to get a step-by-step explanation.
Go ahead and redeploy the subgraph to start indexing with File Data Sources!
Send this query through Subgraph Studio’s Playground to search through and see if we’re getting both Arweave and IPFS data.
We are now indexing with File Data Sources!
Here is our published on The Graph Network as well as the final .
From here, there are so many more things that we could do!
- Extend the subgraph further to parse the JSON data and populate entities. See how this could be accomplished with this .
- What if we wanted to build a dapp with this data? We could plug this subgraph into and continue building!
- Use python to query this subgraph using and perform data analysis.
- Create no-code dashboards from .
- Trigger File Data Sources indexing of another off-chain file from within another off-chain handler. See this .
Thank you for joining me on this journey into indexing using File Data Sources. I’m looking forward to seeing what you build!
–
Marcus Rein
Developer Relations and Success
Edge & Node - Working on The Graph