Operating Graph Node

Reading time: 16 min

Graph Node is the component which indexes subgraphs, and makes the resulting data available to query via a GraphQL API. As such it is central to the indexer stack, and correct operation of Graph Node is crucial to running a successful indexer.

This provides a contextual overview of Graph Node, and some of the more advanced options available to indexers. Detailed documentation and instructions can be found in the Graph Node repository.

Graph Node

⁠Link to this section

Graph Node is the reference implementation for indexing Subgraphs on The Graph Network, connecting to blockchain clients, indexing subgraphs and making indexed data available to query.

Graph Node (and the whole indexer stack) can be run on bare metal, or in a cloud environment. This flexibility of the central indexing component is crucial to the robustness of The Graph Protocol. Similarly, Graph Node can be built from source, or indexers can use one of the provided Docker Images.

PostgreSQL database

⁠Link to this section

The main store for the Graph Node, this is where subgraph data is stored, as well as metadata about subgraphs, and subgraph-agnostic network data such as the block cache, and eth_call cache.

Network clients

⁠Link to this section

In order to index a network, Graph Node needs access to a network client via an EVM-compatible JSON-RPC API. This RPC may connect to a single client or it could be a more complex setup that load balances across multiple.

While some subgraphs may just require a full node, some may have indexing features which require additional RPC functionality. Specifically subgraphs which make eth_calls as part of indexing will require an archive node which supports EIP-1898, and subgraphs with callHandlers, or blockHandlers with a call filter, require trace_filter support (see trace module documentation here).

Network Firehoses - a Firehose is a gRPC service providing an ordered, yet fork-aware, stream of blocks, developed by The Graph's core developers to better support performant indexing at scale. This is not currently an Indexer requirement, but Indexers are encouraged to familiarise themselves with the technology, ahead of full network support. Learn more about the Firehose here.

IPFS Nodes

⁠Link to this section

Subgraph deployment metadata is stored on the IPFS network. The Graph Node primarily accesses the IPFS node during subgraph deployment to fetch the subgraph manifest and all linked files. Network indexers do not need to host their own IPFS node. An IPFS node for the network is hosted at https://ipfs.network.thegraph.com.

Prometheus metrics server

⁠Link to this section

To enable monitoring and reporting, Graph Node can optionally log metrics to a Prometheus metrics server.

Getting started from source

⁠Link to this section

Install prerequisites

⁠Link to this section

Rust
PostgreSQL
IPFS
Additional Requirements for Ubuntu users - To run a Graph Node on Ubuntu a few additional packages may be needed.

sudo apt-get install -y clang libpq-dev libssl-dev pkg-config

Setup

⁠Link to this section

Start a PostgreSQL database server

initdb -D .postgres
pg_ctl -D .postgres -l logfile start
createdb graph-node

Clone Graph Node repo and build the source by running cargo build
Now that all the dependencies are setup, start the Graph Node:

cargo run -p graph-node --release -- \
  --postgres-url postgresql://[USERNAME]:[PASSWORD]@localhost:5432/graph-node \
  --ethereum-rpc [NETWORK_NAME]:[URL] \
  --ipfs https://ipfs.network.thegraph.com

Getting started with Kubernetes

⁠Link to this section

A complete Kubernetes example configuration can be found in the indexer repository.

Ports

⁠Link to this section

When it is running Graph Node exposes the following ports:

Port	Purpose	Routes	CLI Argument	Environment Variable
8000	GraphQL HTTP server (for subgraph queries)	/subgraphs/id/... /subgraphs/name/.../...	--http-port	-
8001	GraphQL WS (for subgraph subscriptions)	/subgraphs/id/... /subgraphs/name/.../...	--ws-port	-
8020	JSON-RPC (for managing deployments)	/	--admin-port	-
8030	Subgraph indexing status API	/graphql	--index-node-port	-
8040	Prometheus metrics	/metrics	--metrics-port	-

Important: Be careful about exposing ports publicly - administration ports should be kept locked down. This includes the the Graph Node JSON-RPC endpoint.

Advanced Graph Node configuration

⁠Link to this section

At its simplest, Graph Node can be operated with a single instance of Graph Node, a single PostgreSQL database, an IPFS node, and the network clients as required by the subgraphs to be indexed.

This setup can be scaled horizontally, by adding multiple Graph Nodes, and multiple databases to support those Graph Nodes. Advanced users may want to take advantage of some of the horizontal scaling capabilities of Graph Node, as well as some of the more advanced configuration options, via the config.toml file and Graph Node's environment variables.

`config.toml`

⁠Link to this section

A TOML configuration file can be used to set more complex configurations than those exposed in the CLI. The location of the file is passed with the --config command line switch.

When using a configuration file, it is not possible to use the options --postgres-url, --postgres-secondary-hosts, and --postgres-host-weights.

A minimal config.toml file can be provided; the following file is equivalent to using the --postgres-url command line option:

[store]
[store.primary]
connection="<.. postgres-url argument ..>"
[deployment]
[[deployment.rule]]
indexers = [ "<.. list of all indexing nodes ..>" ]

Full documentation of config.toml can be found in the Graph Node docs.

Multiple Graph Nodes

⁠Link to this section

Graph Node indexing can scale horizontally, running multiple instances of Graph Node to split indexing and querying across different nodes. This can be done simply by running Graph Nodes configured with a different node_id on startup (e.g. in the Docker Compose file), which can then be used in the config.toml file to specify dedicated query nodes, block ingestors, and splitting subgraphs across nodes with deployment rules.

Note that multiple Graph Nodes can all be configured to use the same database, which itself can be horizontally scaled via sharding.

Deployment rules

⁠Link to this section

Given multiple Graph Nodes, it is necessary to manage deployment of new subgraphs so that the same subgraph isn't being indexed by two different nodes, which would lead to collisions. This can be done by using deployment rules, which can also specify which shard a subgraph's data should be stored in, if database sharding is being used. Deployment rules can match on the subgraph name and the network that the deployment is indexing in order to make a decision.

Example deployment rule configuration:

[deployment]
[[deployment.rule]]
match = { name = "(vip|important)/.*" }
shard = "vip"
indexers = [ "index_node_vip_0", "index_node_vip_1" ]
[[deployment.rule]]
match = { network = "kovan" }
# No shard, so we use the default shard called 'primary'
indexers = [ "index_node_kovan_0" ]
[[deployment.rule]]
match = { network = [ "xdai", "poa-core" ] }
indexers = [ "index_node_other_0" ]
[[deployment.rule]]
# There's no 'match', so any subgraph matches
shards = [ "sharda", "shardb" ]
indexers = [
    "index_node_community_0",
    "index_node_community_1",
    "index_node_community_2",
    "index_node_community_3",
    "index_node_community_4",
    "index_node_community_5"
  ]

Read more about deployment rules here.

Dedicated query nodes

⁠Link to this section

Nodes can be configured to explicitly be query nodes by including the following in the configuration file:

[general]
query = "<regular expression>"

Any node whose --node-id matches the regular expression will be set up to only respond to queries.

Database scaling via sharding

⁠Link to this section

For most use cases, a single Postgres database is sufficient to support a graph-node instance. When a graph-node instance outgrows a single Postgres database, it is possible to split the storage of graph-node's data across multiple Postgres databases. All databases together form the store of the graph-node instance. Each individual database is called a shard.

Shards can be used to split subgraph deployments across multiple databases, and can also be used to use replicas to spread query load across databases. This includes configuring the number of available database connections each graph-node should keep in its connection pool for each database, which becomes increasingly important as more subgraphs are being indexed.

Sharding becomes useful when your existing database can't keep up with the load that Graph Node puts on it, and when it's not possible to increase the database size anymore.

It is generally better make a single database as big as possible, before starting with shards. One exception is where query traffic is split very unevenly between subgraphs; in those situations it can help dramatically if the high-volume subgraphs are kept in one shard and everything else in another because that setup makes it more likely that the data for the high-volume subgraphs stays in the db-internal cache and doesn't get replaced by data that's not needed as much from low-volume subgraphs.

In terms of configuring connections, start with max_connections in postgresql.conf set to 400 (or maybe even 200) and look at the store_connection_wait_time_ms and store_connection_checkout_count Prometheus metrics. Noticeable wait times (anything above 5ms) is an indication that there are too few connections available; high wait times there will also be caused by the database being very busy (like high CPU load). However if the database seems otherwise stable, high wait times indicate a need to increase the number of connections. In the configuration, how many connections each graph-node instance can use is an upper limit, and Graph Node will not keep connections open if it doesn't need them.

Read more about store configuration here.

Dedicated block ingestion

⁠Link to this section

If there are multiple nodes configured, it will be necessary to specify one node which is responsible for ingestion of new blocks, so that all configured index nodes aren't polling the chain head. This is done as part of the chains namespace, specifying the node_id to be used for block ingestion:

[chains]
ingestor = "block_ingestor_node"

Supporting multiple networks

⁠Link to this section

The Graph Protocol is increasing the number of networks supported for indexing rewards, and there exist many subgraphs indexing unsupported networks which an indexer would like to process. The config.toml file allows for expressive and flexible configuration of:

Multiple networks
Multiple providers per network (this can allow splitting of load across providers, and can also allow for configuration of full nodes as well as archive nodes, with Graph Node preferring cheaper providers if a given workload allows).
Additional provider details, such as features, authentication and the type of provider (for experimental Firehose support)

The [chains] section controls the ethereum providers that graph-node connects to, and where blocks and other metadata for each chain are stored. The following example configures two chains, mainnet and kovan, where blocks for mainnet are stored in the vip shard and blocks for kovan are stored in the primary shard. The mainnet chain can use two different providers, whereas kovan only has one provider.

[chains]
ingestor = "block_ingestor_node"
[chains.mainnet]
shard = "vip"
provider = [
  { label = "mainnet1", url = "http://..", features = [], headers = { Authorization = "Bearer foo" } },
  { label = "mainnet2", url = "http://..", features = [ "archive", "traces" ] }
]
[chains.kovan]
shard = "primary"
provider = [ { label = "kovan", url = "http://..", features = [] } ]

Read more about provider configuration here.

Environment variables

⁠Link to this section

Graph Node supports a range of environment variables which can enable features, or change Graph Node behaviour. These are documented here.

Continuous deployment

⁠Link to this section

Users who are operating a scaled indexing setup with advanced configuration may benefit from managing their Graph Nodes with Kubernetes.

The indexer repository has an example Kubernetes reference
Launchpad is a toolkit for running a Graph Protocol Indexer on Kubernetes maintained by GraphOps. It provides a set of Helm charts and a CLI to manage a Graph Node deployment.

Managing Graph Node

⁠Link to this section

Given a running Graph Node (or Graph Nodes!), the challenge is then to manage deployed subgraphs across those nodes. Graph Node surfaces a range of tools to help with managing subgraphs.

Logging

⁠Link to this section

Graph Node's logs can provide useful information for debugging and optimisation of Graph Node and specific subgraphs. Graph Node supports different log levels via the GRAPH_LOG environment variable, with the following levels: error, warn, info, debug or trace.

In addition setting GRAPH_LOG_QUERY_TIMING to gql provides more details about how GraphQL queries are running (though this will generate a large volume of logs).

Monitoring & alerting

⁠Link to this section

Graph Node provides the metrics via Prometheus endpoint on 8040 port by default. Grafana can then be used to visualise these metrics.

The indexer repository provides an example Grafana configuration.

Graphman

⁠Link to this section

graphman is a maintenance tool for Graph Node, helping with diagnosis and resolution of different day-to-day and exceptional tasks.

The graphman command is included in the official containers, and you can docker exec into your graph-node container to run it. It requires a config.toml file.

Full documentation of graphman commands is available in the Graph Node repository. See [/docs/graphman.md] (https://github.com/graphprotocol/graph-node/blob/master/docs/graphman.md) in the Graph Node /docs

Working with subgraphs

⁠Link to this section

Indexing status API

⁠Link to this section

Available on port 8030/graphql by default, the indexing status API exposes a range of methods for checking indexing status for different subgraphs, checking proofs of indexing, inspecting subgraph features and more.

The full schema is available here.

Indexing performance

⁠Link to this section

There are three separate parts of the indexing process:

Fetching events of interest from the provider
Processing events in order with the appropriate handlers (this can involve calling the chain for state, and fetching data from the store)
Writing the resulting data to the store

These stages are pipelined (i.e. they can be executed in parallel), but they are dependent on one another. Where subgraphs are slow to index, the underlying cause will depend on the specific subgraph.

Common causes of indexing slowness:

Time taken to find relevant events from the chain (call handlers in particular can be slow, given the reliance on trace_filter)
Making large numbers of eth_calls as part of handlers
A large amount of store interaction during execution
A large amount of data to save to the store
A large number of events to process
Slow database connection time, for crowded nodes
The provider itself falling behind the chain head
Slowness in fetching new receipts at the chain head from the provider

Subgraph indexing metrics can help diagnose the root cause of indexing slowness. In some cases, the problem lies with the subgraph itself, but in others, improved network providers, reduced database contention and other configuration improvements can markedly improve indexing performance.

Failed subgraphs

⁠Link to this section

During indexing subgraphs might fail, if they encounter data that is unexpected, some component not working as expected, or if there is some bug in the event handlers or configuration. There are two general types of failure:

Deterministic failures: these are failures which will not be resolved with retries
Non-deterministic failures: these might be down to issues with the provider, or some unexpected Graph Node error. When a non-deterministic failure occurs, Graph Node will retry the failing handlers, backing off over time.

In some cases a failure might be resolvable by the indexer (for example if the error is a result of not having the right kind of provider, adding the required provider will allow indexing to continue). However in others, a change in the subgraph code is required.

Deterministic failures are considered "final", with a Proof of Indexing generated for the failing block, while non-determinstic failures are not, as the subgraph may manage to "unfail" and continue indexing. In some cases, the non-deterministic label is incorrect, and the subgraph will never overcome the error; such failures should be reported as issues on the Graph Node repository.

Block and call cache

⁠Link to this section

Graph Node caches certain data in the store in order to save refetching from the provider. Blocks are cached, as are the results of eth_calls (the latter being cached as of a specific block). This caching can dramatically increase indexing speed during "resyncing" of a slightly altered subgraph.

However, in some instances, if an Ethereum node has provided incorrect data for some period, that can make its way into the cache, leading to incorrect data or failed subgraphs. In this case indexers can use graphman to clear the poisoned cache, and then rewind the affected subgraphs, which will then fetch fresh data from the (hopefully) healthy provider.

If a block cache inconsistency is suspected, such as a tx receipt missing event:

graphman chain list to find the chain name.
graphman chain check-blocks <CHAIN> by-number <NUMBER> will check if the cached block matches the provider, and deletes the block from the cache if it doesn’t.
1. If there is a difference, it may be safer to truncate the whole cache with graphman chain truncate <CHAIN>.
2. If the block matches the provider, then the issue can be debugged directly against the provider.

Querying issues and errors

⁠Link to this section

Once a subgraph has been indexed, indexers can expect to serve queries via the subgraph's dedicated query endpoint. If the indexer is hoping to serve significant query volume, a dedicated query node is recommended, and in case of very high query volumes, indexers may want to configure replica shards so that queries don't impact the indexing process.

However, even with a dedicated query node and replicas, certain queries can take a long time to execute, and in some cases increase memory usage and negatively impact the query time for other users.

There is not one "silver bullet", but a range of tools for preventing, diagnosing and dealing with slow queries.

Query caching

⁠Link to this section

Graph Node caches GraphQL queries by default, which can significantly reduce database load. This can be further configured with the GRAPH_QUERY_CACHE_BLOCKS and GRAPH_QUERY_CACHE_MAX_MEM settings - read more here.

Analysing queries

⁠Link to this section

Problematic queries most often surface in one of two ways. In some cases, users themselves report that a given query is slow. In that case the challenge is to diagnose the reason for the slowness - whether it is a general issue, or specific to that subgraph or query. And then of course to resolve it, if possible.

In other cases, the trigger might be high memory usage on a query node, in which case the challenge is first to identify the query causing the issue.

Indexers can use qlog to process and summarize Graph Node's query logs. GRAPH_LOG_QUERY_TIMING can also be enabled to help identify and debug slow queries.

Given a slow query, indexers have a few options. Of course they can alter their cost model, to significantly increase the cost of sending the problematic query. This may result in a reduction in the frequency of that query. However this often doesn't resolve the root cause of the issue.

Account-like optimisation

⁠Link to this section

Database tables that store entities seem to generally come in two varieties: 'transaction-like', where entities, once created, are never updated, i.e., they store something akin to a list of financial transactions, and 'account-like' where entities are updated very often, i.e., they store something like financial accounts that get modified every time a transaction is recorded. Account-like tables are characterized by the fact that they contain a large number of entity versions, but relatively few distinct entities. Often, in such tables the number of distinct entities is 1% of the total number of rows (entity versions)

For account-like tables, graph-node can generate queries that take advantage of details of how Postgres ends up storing data with such a high rate of change, namely that all of the versions for recent blocks are in a small subsection of the overall storage for such a table.

The command graphman stats show <sgdNNNN> shows, for each entity type/table in a deployment, how many distinct entities, and how many entity versions each table contains. That data is based on Postgres-internal estimates, and is therefore necessarily imprecise, and can be off by an order of magnitude. A -1 in the entities column means that Postgres believes that all rows contain a distinct entity.

In general, tables where the number of distinct entities are less than 1% of the total number of rows/entity versions are good candidates for the account-like optimization. When the output of graphman stats show indicates that a table might benefit from this optimization, running graphman stats show <sgdNNN> <table> will perform a full count of the table - that can be slow, but gives a precise measure of the ratio of distinct entities to overall entity versions.

Once a table has been determined to be account-like, running graphman stats account-like <sgdNNN>.<table> will turn on the account-like optimization for queries against that table. The optimization can be turned off again with graphman stats account-like --clear <sgdNNN>.<table> It takes up to 5 minutes for query nodes to notice that the optimization has been turned on or off. After turning the optimization on, it is necessary to verify that the change does not in fact make queries slower for that table. If you have configured Grafana to monitor Postgres, slow queries would show up in pg_stat_activityin large numbers, taking several seconds. In that case, the optimization needs to be turned off again.

For Uniswap-like subgraphs, the pair and token tables are prime candidates for this optimization, and can have a dramatic effect on database load.

Removing subgraphs

⁠Link to this section

This is new functionality, which will be available in Graph Node 0.29.x

At some point an indexer might want to remove a given subgraph. This can be easily done via graphman drop, which deletes a deployment and all it's indexed data. The deployment can be specified as either a subgraph name, an IPFS hash Qm.., or the database namespace sgdNNN. Further documentation is available here.

⁠Edit page⁠

Chain Integration Process Overview

⁠Edit page⁠