Exposes average batch metrics at 1, 5 and 15 minutes time window. #18460

andsel · 2025-12-04T13:37:03Z

Release notes

Exposes batch size metrics for last 1, 5 and 15 minutes.

What does this PR do?

Updates stats API response to expose also 1m, 5m and 15m average batch metrics.

Changed the response map returned by refine_batch_metrics method as result of API query to _node/stats so tha contains the average values of last 1, 5 and 15 minutes for event_count and batch_size. These data is published once they are available from the metric collector.

Why is it important/What is the impact to the user?

This feature permit to the user of Logstash to have the metering of batch average values over some recent time windows.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
~~[ ] I have made corresponding change to the default configuration files (and/or docker env variables)~~
~~[ ] I have added tests that prove my fix is effective or that my feature works~~ This feature rely on ExtendedFlowMetric which is extensively tested about these time window management. To create a test at API level we should implement something that load for at least the time window duration and check the API response. Test that runs for minutes are not feasible.

Author's Checklist

[ ]

How to test this PR locally

Use the same test harness proposed in #18000, switch pipeline.batch.metrics.sampling_mode to full and monitor for 1, 5, and 15 minutes the result of _node/stats with:

curl http://localhost:9600/_node/stats | jq .pipelines.main.batch

Related issues

Closes Expose the meter of average value of batch's byte size and event count for 1m, 5m 15m windows #17998

Use cases

Screenshots

Logs

…h metrics

github-actions · 2025-12-04T13:37:13Z

🤖 GitHub comments

Just comment with:

run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
/run exhaustive tests : Run the exhaustive tests Buildkite pipeline.

mergify · 2025-12-04T13:37:40Z

This pull request does not have a backport label. Could you fix it @andsel? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
If no backport is necessary, please add the backport-skip label

andsel · 2025-12-04T16:51:56Z

run exhaustive test

…can't be created (1 minute and more intervals)

elasticmachine · 2025-12-05T10:49:12Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 8f2702c

Failed CI Steps

:java: Java unit tests - FIPS mode

History

💔 Build #3912 failed c22c384
💛 Build #3911 was flaky e083753

cc @andsel

donoghuc · 2025-12-05T17:47:43Z

docs/static/spec/openapi/logstash-api.yaml

                        current: 78
                        average:
                          lifetime: 115
+                          1_minute: 120


Any reason why in the api spec we are dropping the last_ prefix? Everywhere in the code these are last_n_minute(s). Seems like consistency would be nice unless there is a good reason not to?

donoghuc · 2025-12-05T17:55:09Z

logstash-core/lib/logstash/api/commands/stats.rb

                }
              }
            }
+            # Enrich byte_size and event_count averages with the last 1, 5, 15 minutes averages if available


Instead of helper methods and multiple calls we could do something like:

[:last_1_minute, :last_5_minutes, :last_15_minutes].each do |window| key = window.to_s result[:event_count][:average][window] = event_count_average_flow_metric[key]&.round if event_count_average_flow_metric[key] result[:byte_size][:average][window] = byte_size_average_flow_metric[key]&.round if byte_size_average_flow_metric[key] end

donoghuc

Should we add a few cases in the integration test? Seems like we have a place it would be easy to add

logstash/qa/integration/specs/monitoring_api_spec.rb

Lines 311 to 339 in 3659b6f

    
               Stud.try(max_retry.times, [StandardError, RSpec::Expectations::ExpectationNotMetError]) do 
        
                 # node_stats can fail if the stats subsystem isn't ready 
        
                 result = logstash_service.monitoring_api.node_stats rescue nil 
        
                 expect(result).not_to be_nil 
        
                 # we use fetch here since we want failed fetches to raise an exception 
        
                 # and trigger the retry block 
        
                 batch_stats = result.fetch("pipelines").fetch(pipeline_id).fetch("batch") 
        
                 expect(batch_stats).not_to be_nil 
        
                 expect(batch_stats["event_count"]).not_to be_nil 
        
                 expect(batch_stats["event_count"]["average"]).not_to be_nil 
        
                 expect(batch_stats["event_count"]["average"]["lifetime"]).not_to be_nil 
        
                 expect(batch_stats["event_count"]["average"]["lifetime"]).to be_a_kind_of(Numeric) 
        
                 expect(batch_stats["event_count"]["average"]["lifetime"]).to be > 0 
        
                 expect(batch_stats["event_count"]["current"]).not_to be_nil 
        
                 expect(batch_stats["event_count"]["current"]).to be >= 0 
        
                 expect(batch_stats["byte_size"]).not_to be_nil 
        
                 expect(batch_stats["byte_size"]["average"]).not_to be_nil 
        
                 expect(batch_stats["byte_size"]["average"]["lifetime"]).not_to be_nil 
        
                 expect(batch_stats["byte_size"]["average"]["lifetime"]).to be_a_kind_of(Numeric) 
        
                 expect(batch_stats["byte_size"]["average"]["lifetime"]).to be > 0 
        
                 expect(batch_stats["byte_size"]["current"]).not_to be_nil 
        
                 expect(batch_stats["byte_size"]["current"]).to be >= 0 
        
               end 
        
             end 
        
           end

Updates stats API response to expose also 1m, 5m and 15m average batc…

e083753

…h metrics

andsel self-assigned this Dec 4, 2025

andsel changed the title ~~Updates stats API response to expose also 1m, 5m and 15m average batc…~~ Exposes average batch metrics at 1, 5 and 15 minutes time window. Dec 4, 2025

andsel added the enhancement label Dec 4, 2025

andsel added 2 commits December 4, 2025 17:04

[Test] Added DSL definition to verify structure of the time windows

f865a53

[Doc] Updated sample in openapi definition to include the time windows

c22c384

github-actions bot deployed to docs-preview December 4, 2025 16:10 View deployment

andsel marked this pull request as draft December 4, 2025 16:51

[Test] Revomed verification of batch metric fields that in fast test …

8f2702c

…can't be created (1 minute and more intervals)

github-actions bot deployed to docs-preview December 5, 2025 10:26 View deployment

andsel marked this pull request as ready for review December 5, 2025 10:53

donoghuc reviewed Dec 5, 2025

View reviewed changes

donoghuc requested changes Dec 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Exposes average batch metrics at 1, 5 and 15 minutes time window. #18460

Exposes average batch metrics at 1, 5 and 15 minutes time window. #18460

andsel commented Dec 4, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

mergify bot commented Dec 4, 2025

Uh oh!

andsel commented Dec 4, 2025

Uh oh!

elasticmachine commented Dec 5, 2025

Uh oh!

donoghuc Dec 5, 2025

Uh oh!

donoghuc Dec 5, 2025

Uh oh!

donoghuc left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	Stud.try(max_retry.times, [StandardError, RSpec::Expectations::ExpectationNotMetError]) do
	# node_stats can fail if the stats subsystem isn't ready
	result = logstash_service.monitoring_api.node_stats rescue nil
	expect(result).not_to be_nil
	# we use fetch here since we want failed fetches to raise an exception
	# and trigger the retry block
	batch_stats = result.fetch("pipelines").fetch(pipeline_id).fetch("batch")
	expect(batch_stats).not_to be_nil

	expect(batch_stats["event_count"]).not_to be_nil
	expect(batch_stats["event_count"]["average"]).not_to be_nil
	expect(batch_stats["event_count"]["average"]["lifetime"]).not_to be_nil
	expect(batch_stats["event_count"]["average"]["lifetime"]).to be_a_kind_of(Numeric)
	expect(batch_stats["event_count"]["average"]["lifetime"]).to be > 0

	expect(batch_stats["event_count"]["current"]).not_to be_nil
	expect(batch_stats["event_count"]["current"]).to be >= 0

	expect(batch_stats["byte_size"]).not_to be_nil
	expect(batch_stats["byte_size"]["average"]).not_to be_nil
	expect(batch_stats["byte_size"]["average"]["lifetime"]).not_to be_nil
	expect(batch_stats["byte_size"]["average"]["lifetime"]).to be_a_kind_of(Numeric)
	expect(batch_stats["byte_size"]["average"]["lifetime"]).to be > 0

	expect(batch_stats["byte_size"]["current"]).not_to be_nil
	expect(batch_stats["byte_size"]["current"]).to be >= 0
	end
	end
	end

Exposes average batch metrics at 1, 5 and 15 minutes time window. #18460

Are you sure you want to change the base?

Exposes average batch metrics at 1, 5 and 15 minutes time window. #18460

Conversation

andsel commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release notes

What does this PR do?

Why is it important/What is the impact to the user?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Uh oh!

github-actions bot commented Dec 4, 2025

🤖 GitHub comments

Uh oh!

mergify bot commented Dec 4, 2025

Uh oh!

andsel commented Dec 4, 2025

Uh oh!

elasticmachine commented Dec 5, 2025

💛 Build succeeded, but was flaky

Failed CI Steps

History

Uh oh!

donoghuc Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

donoghuc Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

donoghuc left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andsel commented Dec 4, 2025 •

edited

Loading