Two Simple Subgraph Performance Improvements

This blog was originally published in April 2022 by Edge & Node, a core developer of The Graph.

Edge & Node and the rest of The Graph community are focused on improving subgraph indexing and querying performance. Edge & Node has worked on two features that help in these areas: immutable entities and using  Bytes  as the  id  of entities. The two features are independent of each other; immutable entities only require a small change to the subgraph schema, while using byte strings as the  id  also requires small changes to the mappings. Both are explained in more detail below.

Both of these enhancements improve indexing speed, reduce the amount of data that needs to be stored for a subgraph, and also speed up some queries. The indexing speedup depends on how much of its time the subgraph spends with writing data to the database compared to other operations, like making contract calls or reading entities from the database.

These new features are currently deployed on the hosted service, and are a part of  graph-node  versions 0.26.0 and beyond. Subgraph authors also need to use  graph-cli  with a version of at least 0.28 and  graph-ts  with a version of at least 0.26 to use these features.

Performance measurements

The Edge & Node team did some performance measurements in a controlled environment where we deployed four variants of the same subgraph: a base variant that uses  ID  as the type of  id  and no immutable entities, one variant each that uses  Bytes  as the type of  id  and immutable entities, and one that uses both. The performance gains from these two features are striking.

We measured the average time it took to process a block, and the amount of storage that the subgraph required. The faster average block time directly translates to faster syncing while the reduced storage requirement is not only more storage efficient but will also have a positive impact on query speeds:

| Variant | block avg | speedup | sync time | storage | reduction |
|-----------+-----------+---------+-----------+---------+-----------|
| base | 268 ms | - | 352 hrs | 143 GB | - |
| immutable | 217 ms | 19% | 292 hrs | 89 GB | 37% |
| bytes | 225 ms | 16% | 299 hrs | 115 GB | 20% |
| both | 194 ms | 28% | 259 hrs | 74 GB | 48% |

Using immutable entities

Many entities represent on-chain data and are therefore immutable. Subgraph authors can indicate to the system that these entities will never be changed once they have been created by changing the  @entity  annotation in the subgraph's GraphQL schema to@entity(immutable: true). For example, for a  Transfer  entity, the schema would say

type Transfer @entity(immutable: true) {
id ID!
from Bytes!
to Bytes!
amount BigDecimal
}

When entity types are marked as immutable,  graph-node  can use database indexes that are much cheaper to build and maintain than the ones needed for normal mutable entity types. Of course, any attempt to modify an immutable entity will result in an indexing error.

Using  Bytes  as the id

Many subgraphs use binary data like addresses as the  id  of entities. So far,  graph-node  only allowed  ID  (a synonym for  String) as the type of the  idfield. Converting byte strings into character strings and using them as the  id  has several disadvantages: character strings take twice as much space as byte strings to store binary data, and comparisons of UTF-8 character strings must take the locale into account which is much more expensive than the bytewise comparison used to compare byte strings.

It is now possible to use  Bytes  as the type for the  id  field of entities, and it is highly recommended to use  Bytes  wherever that is possible, and only use  String  for attributes that truly contain human-readable text, like the name of a token. Subgraph authors can now simply change the type definition of the  id  attribute, using the example from above:

type Transfer @entity(immutable: true) {
id Bytes!
from Bytes!
to Bytes!
amount BigDecimal
}

In addition, some code changes will be needed. The most obvious change is to remove a lot of calls to  toHexString()  so that code like

transfer.id = event.transaction.hash.toHexString()

becomes

transfer.id = event.transaction.hash

For entities whose  id  consists of the concatenation of a byte array with some counter, setting a string  id  to"${address}-${counter}"  should be changed to simply concatenating the counter with the address so that code like

let id = event.transaction.hash
.toHexString()
.concat('-')
.concat(BigInt.fromI32(counter).toString())

becomes

let id = event.transaction.hash.concatI32(counter)

For entities that store aggregated data, for example, daily trade volumes and the like, the  id  usually contains the day number. Here, too, using a byte string as the  id  is beneficial. Determining the  id  would look like

let dayID = event.block.timestamp.toI32() / 86400
let id = Bytes.fromI32(dayID)

Finally, some constants that represent special addresses might have to be turned into byte strings, too, so that a definition like

const BEEF_ADDRESS = '0xdead...beef'

becomes

const BEEF_ADDRESS = Bytes.fromHexString('0xdead...beef')

Edge & Node is hiring Rust Engineers! See here for details.

Edge & Node is a software development company and the initial team behind The Graph. We create and support protocols and dapps that are building the decentralized future. Learn more about the Edge & Node vision at  edgeandnode.com  and follow us on  Twitter  and  LinkedIn.

About The Graph

The Graph is the source of data and information for the decentralized internet. As the original decentralized data marketplace that introduced and standardized subgraphs, The Graph has become web3’s method of indexing and accessing blockchain data. Since its launch in 2018, tens of thousands of developers have built subgraphs for dapps across 70+ blockchains - including  Ethereum, Solana, Arbitrum, Optimism, Base, Polygon, Celo, Fantom, Gnosis, and Avalanche.

As demand for data in web3 continues to grow, The Graph enters a New Era with a more expansive vision including new data services and query languages, ensuring the decentralized protocol can serve any use case - now and into the future.

Discover more about how The Graph is shaping the future of decentralized physical infrastructure networks (DePIN) and stay connected with the community. Follow The Graph on X, LinkedIn, Instagram, Facebook, Reddit, Farcaster and Medium. Join the community on The Graph’s Telegram, join technical discussions on The Graph’s Discord.

The Graph Foundation oversees The Graph Network. The Graph Foundation is overseen by the Technical Council. Edge & Node, StreamingFast, Semiotic Labs, Messari, GraphOps, Pinax and Geo are eight of the many organizations within The Graph ecosystem.


Category
Developer Corner
Published
May 11, 2023

David Lutterkort

View all blog posts