4 minutes

Subgraph Best Practice 5 - Simplify and Optimize with Timeseries and Aggregations

TLDR

Leveraging the new time-series and aggregations feature in Subgraphs can significantly enhance both indexing speed and query performance.

Overview

Timeseries and aggregations reduce data processing overhead and accelerate queries by offloading aggregation computations to the database and simplifying mapping code. This approach is particularly effective when handling large volumes of time-based data.

Benefits of Timeseries and Aggregations

Improved Indexing Time

Less Data to Load: Mappings handle less data since raw data points are stored as immutable timeseries entities.
Database-Managed Aggregations: Aggregations are automatically computed by the database, reducing the workload on the mappings.

Simplified Mapping Code

No Manual Calculations: Developers no longer need to write complex aggregation logic in mappings.
Reduced Complexity: Simplifies code maintenance and minimizes the potential for errors.

Dramatically Faster Queries

Immutable Data: All timeseries data is immutable, enabling efficient storage and retrieval.
Efficient Data Separation: Aggregates are stored separately from raw timeseries data, allowing queries to process significantly less data—often several orders of magnitude less.

Important Considerations

Immutable Data: Timeseries data cannot be altered once written, ensuring data integrity and simplifying indexing.
Automatic ID and Timestamp Management: id and timestamp fields are automatically managed by graph-node, reducing potential errors.
Efficient Data Storage: By separating raw data from aggregates, storage is optimized, and queries run faster.

How to Implement Timeseries and Aggregations

Prerequisites

You need spec version 1.1.0 for this feature.

Defining Timeseries Entities

A timeseries entity represents raw data points collected over time. It is defined with the @entity(timeseries: true) annotation. Key requirements:

Immutable: Timeseries entities are always immutable.
Mandatory Fields:
- id: Must be of type Int8! and is auto-incremented.
- timestamp: Must be of type Timestamp! and is automatically set to the block timestamp.

Example:

1type Data @entity(timeseries: true) {2  id: Int8!3  timestamp: Timestamp!4  amount: BigDecimal!5}

Defining Aggregation Entities

An aggregation entity computes aggregated values from a timeseries source. It is defined with the @aggregation annotation. Key components:

Annotation Arguments:
- intervals: Specifies time intervals (e.g., ["hour", "day"]).

Example:

1type Stats @aggregation(intervals: ["hour", "day"], source: "Data") {2  id: Int8!3  timestamp: Timestamp!4  sum: BigDecimal! @aggregate(fn: "sum", arg: "amount")5}

In this example, Stats aggregates the amount field from Data over hourly and daily intervals, computing the sum.

Querying Aggregated Data

Aggregations are exposed via query fields that allow filtering and retrieval based on dimensions and time intervals.

Example:

1{2  tokenStats(3    interval: "hour"4    where: { token: "0x1234567890abcdef", timestamp_gte: "1704164640000000", timestamp_lt: "1704251040000000" }5  ) {6    id7    timestamp8    token {9      id10    }11    totalVolume12    priceUSD13    count14  }15}

Using Dimensions in Aggregations

Dimensions are non-aggregated fields used to group data points. They enable aggregations based on specific criteria, such as a token in a financial application.

Example:

Timeseries Entity

1type TokenData @entity(timeseries: true) {2  id: Int8!3  timestamp: Timestamp!4  token: Token!5  amount: BigDecimal!6  priceUSD: BigDecimal!7}

Aggregation Entity with Dimension

1type TokenStats @aggregation(intervals: ["hour", "day"], source: "TokenData") {2  id: Int8!3  timestamp: Timestamp!4  token: Token!5  totalVolume: BigDecimal! @aggregate(fn: "sum", arg: "amount")6  priceUSD: BigDecimal! @aggregate(fn: "last", arg: "priceUSD")7  count: Int8! @aggregate(fn: "count", cumulative: true)8}

Dimension Field: token groups the data, so aggregates are computed per token.
Aggregates:
- totalVolume: Sum of amount.
- priceUSD: Last recorded priceUSD.
- count: Cumulative count of records.

Aggregation Functions and Expressions

Supported aggregation functions:

sum
count
min
max
first
last

The arg in @aggregate can be

A field name from the timeseries entity.
An expression using fields and constants.

Examples of Aggregation Expressions

Sum Token Value: @aggregate(fn: “sum”, arg: “priceUSD _ amount”)
Maximum Positive Amount: @aggregate(fn: “max”, arg: “greatest(amount0, amount1, 0)”)
Conditional Sum: @aggregate(fn: “sum”, arg: “case when amount0 > amount1 then amount0 else 0 end”)

Supported operators and functions include basic arithmetic (+, -, _, /), comparison operators, logical operators (and, or, not), and SQL functions like greatest, least, coalesce, etc.

Query Parameters

interval: Specifies the time interval (e.g., “hour”).
where: Filters based on dimensions and timestamp ranges.
timestamp_gte / timestamp_lt: Filters for start and end times (microseconds since epoch).

Notes

Sorting: Results are automatically sorted by timestamp and id in descending order.
Current Data: An optional current argument can include the current, partially filled interval.

Conclusion

Implementing timeseries and aggregations in Subgraphs is a best practice for projects dealing with time-based data. This approach:

Enhances Performance: Speeds up indexing and querying by reducing data processing overhead.
Simplifies Development: Eliminates the need for manual aggregation logic in mappings.
Scales Efficiently: Handles large volumes of data without compromising on speed or responsiveness.

By adopting this pattern, developers can build more efficient and scalable Subgraphs, providing faster and more reliable data access to end-users. To learn more about implementing timeseries and aggregations, refer to the Timeseries and Aggregations Readme⁠ and consider experimenting with this feature in your Subgraphs.

Subgraph Best Practices 1-6

⁠Edit on GitHub⁠