Delta vs. Cumulative Metrics: Key Differences and System Preferences

When it comes to collecting and analyzing metrics, one crucial decision is whether to use delta or cumulative metrics. These two approaches define how measurements are reported over time, and different metrics systems have their preferences based on their design and use cases. In this blog post, we’ll explore the difference between delta and cumulative metrics, examine their trade-offs, and highlight the systems that prefer one over the other.

Understanding Temporality

Temporality refers to the way metrics are recorded and reported over time. Choosing the right temporality depends on your monitoring backend and analytical needs.

  • Cumulative Temporality: Reports the running total of a metric from the beginning of a measurement period. Suitable for tracking trends and totals. Example: Total bytes transferred over time (source).

  • Delta Temporality: Reports the difference in value for a metric since the last measurement. Ideal for real-time insights into changes over intervals. Example: Number of requests processed per interval (source).

OpenTelemetry provides flexibility by supporting both delta and cumulative metrics. This allows observability platforms to choose a model that aligns with their architecture (learn more).

Cumulative Metrics:

  • These represent the total value accumulated from the start of the measurement up to the current point.
  • Every reported value includes all previous measurements, providing a running total.
  • Example:
    • At time t0: 5
    • At time t0+10s: 10
    • At time t0+20s: 20
  • Prometheus documentation explains how cumulative counters are processed.
  • Use case: Cumulative metrics are ideal when you need a holistic view of total occurrences, such as the total number of requests or errors since the application started. They also help mitigate data loss because missed intervals can still be accounted for in subsequent reports.

Delta Metrics:

  • Delta metrics report only the change in value since the last measurement.
  • Each value reflects the increment or decrement during a specific reporting interval.
  • Example:
    • At time t0: 5
    • At time t0+10s: 5
    • At time t0+20s: 10
  • Learn more about delta metrics and how they’re handled in OpenTelemetry.
  • Use case: Delta metrics are great for analyzing the rate of change over specific intervals. They’re especially useful in systems designed for high-frequency reporting where capturing incremental updates is more efficient.

Key Differences Between Delta and Cumulative Metrics

FeatureCumulative MetricsDelta Metrics
RepresentationRunning total since the startIncrement since the last report
Data Loss HandlingResilient to missed intervalsData may be lost if intervals are missed
Rate CalculationRequires backend calculationsDirectly provides the rate per interval
OverheadMore storage-intensive for long periodsMore efficient for short intervals

Metrics Systems and Their Preferences

Metrics Systems Using Cumulative Metrics

  1. Prometheus: Cumulative metrics are the default. Counters track the total value over time, and functions like rate() derive per-second rates.

  2. Google Cloud Monitoring: Uses cumulative metrics to represent continuous totals, like the number of requests served.

  3. Amazon CloudWatch: Designed around cumulative metrics, with built-in support for rate and delta calculations.

Metrics Systems Using Delta Metrics

  1. OpenTelemetry (OTLP): Delta temporality is supported for certain metric types, allowing granular reporting over intervals.

  2. StatsD: Metrics like counters are reported as deltas, focusing on changes during the last interval for lightweight and efficient ingestion.

  3. Datadog: Prefers delta metrics for some integrations to provide detailed visibility into per-second rates.

Systems Supporting Both Delta and Cumulative Metrics

  1. OpenTelemetry: Provides flexibility by allowing users to configure metrics as either cumulative or delta.

  2. Grafana Mimir and Cortex: While aligned with Prometheus’ cumulative approach, these systems can process both formats.

  3. Elastic Observability: Supports both, though cumulative metrics are more effective for long-term storage and trend analysis.

  4. Splunk Observability Cloud: Handles both delta and cumulative metrics, adapting to the data source requirements.

Choosing the Right Temporality

When deciding between delta and cumulative metrics, consider the following:

  • System Compatibility: Some systems (e.g., Prometheus) work better with cumulative metrics, while others (e.g., StatsD) prefer deltas.
  • Data Loss Resilience: Cumulative metrics are more forgiving when intervals are missed.
  • Analysis Needs: Use delta metrics for fine-grained interval analysis and cumulative metrics for overall trends and totals.

Further Reading and Sources

End
Built with Hugo
Theme Stack designed by Jimmy