Skip to content

Conversation

@srebhan
Copy link
Member

@srebhan srebhan commented Nov 5, 2025

Summary

The current code fails to handle serialization errors, e.g. due to NaN values or invalid field keys, as batching returns the serialization error straight away. Subsequently, the error triggers an early exit in the http client code because the used variable is zero.

This PR fixes two issues:

Do not return early in the http client if one of the batches does not contain any payload. Instead check if the size of the limited serializer is exceeded and if not continue to process the remaining batches. This prevents early exits and thus prevent the plugin to get stuck on serialization errors.

Reject metrics causing serialization errors to prevent infinite resend-loops and remove rejected metrics from the list of metrics acceptable later. Removing the metrics (or rather the indices) is required to not double-account the metric.

Checklist

Related issues

resolves #17205
resolves #17235

@telegraf-tiger telegraf-tiger bot added area/influxdb fix pr to fix corresponding bug plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins labels Nov 5, 2025
@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Nov 5, 2025

Copy link
Contributor

@lowjoel lowjoel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from some clarifications. I can't run this right now since I don't seem to be getting NaNs from my data source right now.

@senpro-ingwersenk
Copy link
Contributor

Getting to it. Thanks for the ping; will come back with results in a bit!

@srebhan
Copy link
Member Author

srebhan commented Nov 6, 2025

@lowjoel any chance you test the binary and report back in the issue?

@lowjoel
Copy link
Contributor

lowjoel commented Nov 6, 2025

@lowjoel any chance you test the binary and report back in the issue?

I can test the binary but I don't have any NaNs that can trip the bug now - the source was sporadically causing the metric rejection because it was sporadically writing NaNs. I couldn't figure out how but it hasn't in a few weeks.

@srebhan
Copy link
Member Author

srebhan commented Nov 6, 2025

Ok, as @senpro-ingwersenk gave a go and my unit-test reproduces the NaN case I think we are good for review...

@srebhan srebhan assigned skartikey and mstrandboge and unassigned srebhan Nov 6, 2025
Copy link
Contributor

@skartikey skartikey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srebhan The implementation is solid!
This correctly fixes the critical data loss bug where serialization errors would cause valid metrics to be dropped.

@skartikey skartikey removed their assignment Nov 6, 2025
@skartikey skartikey added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Nov 6, 2025
@IngwiePhoenix
Copy link

Oops, should've replied here instead of in #17235
Either way; LGTM! Seems to have resolved the issue and surfaced a new one - but none that seems to be too problematic.

Thank you for the PR!

@srebhan
Copy link
Member Author

srebhan commented Nov 10, 2025

@IngwiePhoenix no worries, I always check both the issue and the PR, replying to both is fine... ;-)

@mstrandboge mstrandboge merged commit e0fc4bf into influxdata:master Nov 10, 2025
27 checks passed
@github-actions github-actions bot added this to the v1.36.4 milestone Nov 10, 2025
srebhan added a commit that referenced this pull request Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/influxdb fix pr to fix corresponding bug plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.

Projects

None yet

6 participants