Metrics Exporter #1301

psilabs-dev · 2025-08-02T06:54:39Z

An implementation of the LANraragi metrics exporter, where library, API and process level metrics are conditionally collected and served through the "/api/metrics" endpoint in the Prometheus exposition format, and with Redis as the shared metrics state. Metrics data is stored in Redis at db4, but we can change to an existing db if we want.

Most of the code written by AI, and then reviewed/rewritten by me. Architectural decisions made by me.

Already tested this on personal prod environments for about a week, will continue doing so 👌

Demo screenshots and pretty pictures

Things you can do in Prometheus/Grafana, with the metrics provided

Things you can do with the metrics provided

Configuring the metrics exporter settings

3rd-party implementations and shared state

There are 2 perl implementations (probably more) of metrics exporter, Net::Prometheus and mojolicious-plugin-prometheus. The main issue with just using them is that we need a shared state. Net::Prometheus is lower-level but no shared state, while mojolicious uses shared state but is higher level with IPC.

On the other hand with LRR, I want to collect all the metrics, and might as well use Redis too since that's what it's good for anyways so I just decided to rawdog it

Opt-in

Metrics collection and endpoint exposure is optional and opt-in (via the enablemetrics setting flag from config), and instructions to enable metrics has been documented.

OS dependent

Each OS (e.g. macos, windows, linux*) needs an implementation of the process-level metrics collection. Currently only Linux process-level metrics are supported, but people are welcome to contribute for other OS's :)

Openmetrics and general spec stuff

This implementation is was written to be compliant to the OpenMetrics 1.0 specification, with minor adjustments to conform to Prometheus server capabilities (turns out that Prometheus didn't actually support the OpenMetrics spec, despite the spec being the first thing that shows up when you do Prom exporter specification research...!). A couple deviations from the OpenMetrics spec:

The metrics endpoint is exposed at "/api/metrics" instead of "/metrics"; this is flexible though we can change it back (overview)
Metrics endpoint is exposed on the same port (3000) as the end user port: this is because having mojolicious support a new port requires adding routing conditions to ensure that the right port serves the right content, which incurs performance costs (security)
Return headers are normal text instead of "application/openmetrics-text". (security)
Server info is type "GAUGE" not "INFO" (info)

Still, OpenMetrics has some good practices, so most of its rules were followed. There's also OpenTelemetry which is another thing, but we're sticking mostly to Prometheus exposition.

Metric collection types

There are 3 broad categories of metrics being collected: API/http, library, and process.

API

API/HTTP refers to metrics collected by a mojolicious worker handling a single HTTP request/endpoint. Metrics collected include duration and bytes sent/received.

The natural way to handle this passive collection is via mojo hooks.

Also, API metrics group requests by endpoint type. I.e., instead of the full "/api/archives/123456..." endpoint, we use the Routing.pm "/api/archives/:id` endpoint. This is to handle cardinality explosion.

Library

Library/stats refers to the stats mentioned in the initial issue: how many archives, how many pages, etc.. These are usually aggregated values by another worker process during a file monitor event, so metrics can get these data for free and we don't need a separate periodic hook. (archive byte size, on the other hand...)

And since prometheus servers periodically scrape metrics from LRR, it doesn't make sense for metric scraping to also trigger expensive calls that drag the whole server down, so it's best for the metrics API to be as lean as possible.

Process (Minion/Shinobu)

These are the CPU/memory/FD/IO metrics that one may find in node exporter. Process metrics collection is a passive "process", so it's done on as a 30s recurring task.

DB Cleanups

There are 3 ways of doing cleanups for metrics: shutdown cleanup, startup cleanup, and TTL (continuous cleanup).
TTL isn't exaclty a valid approach because it violates OpenMetrics specification that metrics should generally exist for the lifetime of the process (and it causes metrics to disappear). That leaves startup and shutdown, but startup is generally more reliable.

psilabs-dev · 2025-08-02T07:11:25Z

I still need to work on cleaning things up (and marinate on Model/Metrics.pm; I'm still contemplating whether to do MetricsFamily refactoring) and reviewing tests, but I have some things we might discuss at this point:

Do we want db4 to be a dedicated metrics store, or put metrics data in an existing db? Either is fine by me
What endpoint do we want to expose this in?
What other metrics (process/API/library) should we collect?
Thoughts/feedback on current direction?

On minion/shinobu, I only chose some basic process metrics to collect (cpu/io/disk), but if there are other process metrics (and there are dozens of them) that people find useful I wouldn't mind adding them. I just don't want to add things for no reason

Also, I've left out the implementation of more specific purpose-based types of metrics (e.g. plugin invokations, archive addition/deletion, archive size calculation, etc.) because these are kind of scattered over the place and it's not obvious which layer the metrics collection should be, if we want them.

psilabs-dev changed the title ~~Dev metrics/main~~ Metrics Exporter Aug 2, 2025

psilabs-dev force-pushed the dev-metrics/main branch from 7d453ed to ad385f6 Compare November 7, 2025 20:51

Implement prometheus metrics exporter

6a87536

psilabs-dev force-pushed the dev-metrics/main branch from ad385f6 to 6a87536 Compare December 2, 2025 07:39

psilabs-dev mentioned this pull request Dec 2, 2025

Implement metrics API client and integration tests psilabs-dev/aio-lanraragi#155

Open

psilabs-dev added 3 commits December 2, 2025 00:17

correct perlcritic issues

1e83e8e

formatting

a1ed301

conditionally flush api metrics to redis

fe54941

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Metrics Exporter #1301

Metrics Exporter #1301

Uh oh!

psilabs-dev commented Aug 2, 2025

Uh oh!

psilabs-dev commented Aug 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Metrics Exporter #1301

Are you sure you want to change the base?

Metrics Exporter #1301

Uh oh!

Conversation

psilabs-dev commented Aug 2, 2025

Demo screenshots and pretty pictures

3rd-party implementations and shared state

Opt-in

OS dependent

Openmetrics and general spec stuff

Metric collection types

API

Library

Process (Minion/Shinobu)

DB Cleanups

Uh oh!

psilabs-dev commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

psilabs-dev commented Aug 2, 2025 •

edited

Loading