This page looks best with JavaScript enabled

Designing a metrics system notes

 ·  ☕ 2 min read

Requirements for metrics system

  • Multidimensional data model which can be sliced and diced along different dimensions as defined by the service(example: instance, service, endpoint, method)
  • Operational simplicity
  • Scalable data collection and decentralized architecture, so that independent teams can setup independent monitoring servers
  • A powerful query language that leverages the data model for alerting and graphing

Client libraries

  • Client libraries take care of all the nitty gritty details like thread-safety, bookkeeping and producing the Prometheus text exposition format in response to HTTP request.
  • As metrics-based monitoring does not track individual events, client library memory usage does not increase the more events you have. Rather, memory is related to the number of metrics you have.

Instrumentation

There are 3 types of services

Online-serving systems:

RED method, count of requests, errors and duration(latency). “synchronous function calls, and benefit from the same metrics of requests, latency, and errors. For a cache you would want these metrics both for the cache overall and the cache misses that then need to calculate the result or request it from a backend”

Offline-serving systems:

eg log processor. They usually batch up work, have multiple stages in a pipeline with queues between them. These systems run continuously which differentiates them from batch jobs. USE method.
Utilization: How full your service is.how much work is in progress. How fast are you processing items
Saturation: Amount of queued worked and how much work is in progress
Errors: Self explanatory

Batch jobs:

How long it took to run, how long each stage took and the time at which the job last succeeded. Alerts for when the job hasn’t succeeded recently enough

Idempotency for batch jobs: Idempotency is the property that if you do something more than once, it has the same result as if it was only done once.

Share on