Skip to content

Conversation

@twthorn
Copy link
Contributor

@twthorn twthorn commented Feb 24, 2025

Description

Add more metrics for vstreams on vtgates.

Inspired partially by the tablet vstream metrics. Let me know if you think I should include any others. Once we are decided on the metrics list, I will put out docs PR.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation: Document new metrics for vtgate vstream website#1948

@vitess-bot
Copy link
Contributor

vitess-bot bot commented Feb 24, 2025

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Feb 24, 2025
@github-actions github-actions bot added this to the v22.0.0 milestone Feb 24, 2025
@twthorn twthorn force-pushed the vtgate-add-more-vstream-metrics branch from 6359a99 to f6f37d0 Compare February 24, 2025 23:00
@mattlord mattlord added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VReplication Component: Observability Pull requests that touch tracing/metrics/monitoring and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Feb 24, 2025
@codecov
Copy link

codecov bot commented Feb 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.47%. Comparing base (81ce29c) to head (239367b).
Report is 20 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff            @@
##             main   #17858    +/-   ##
========================================
  Coverage   67.46%   67.47%            
========================================
  Files        1593     1594     +1     
  Lines      258885   259089   +204     
========================================
+ Hits       174652   174814   +162     
- Misses      84233    84275    +42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this, @twthorn ! ❤️ I think that we should work out what we want these metrics to represent, and then we can refine from there. Hopefully all of my comments make sense?

Comment on lines 397 to 398
labels := []string{sgtid.Keyspace, sgtid.Shard, vs.tabletType.String()}
vs.vsm.vstreamsEndedWithErrors.Add(labels, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had assumed that this was a count of vtgate vstreams rather than tablet streams. Is this really supposed to be a count of tablet streams (one per shard, per vtgate vstream)?

Copy link
Contributor Author

@twthorn twthorn Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the most useful for observability is to have per shard per vtgate stream. For example, with the lag metric, if the vstream has multiple shards, we cannot accurately report lag (do we do the max, min, avg, etc.).

I believe the same is true

  1. errors - it saves operator the time to find in a log which stream has error
  2. active streams - if the vtgate is operating slow the vstream count is not enough granularity if those vstreams have hundreds of shards, we want to know how many shards this vtgate is handling
  3. events streamed - help operator detect which shard has disproportionately high load

I agree per tablet is maybe too granular, but per-shard would be very useful. I am going to update PR based on this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, makes sense. Thanks!

@twthorn twthorn requested a review from mattlord February 26, 2025 18:59
Copy link
Member

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I only had some minor comments and suggestions. Please let me know what you think.

Thanks again, @twthorn !

defer vs.wg.Done()

labelValues := []string{sgtid.Keyspace, sgtid.Shard, vs.tabletType.String()}
vs.vsm.vstreamsEndedWithErrors.Add(labelValues, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any value in adding 0. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to initialize the counter to zero for this keyspace/shard/tablet type. This is a best practice. Metrics are hard to work with if they only sometimes exist. This is also seen in the unit test for metrics that we can't assert that there were zero errors for a vstream that worked fine (the metric is simply missing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to Reset to hopefully make it more clear, as Add zero does appear to be not useful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Reset what you really want? It's a counter and not a gauge. I assumed that it was meant to be a counter that spanned the life of the vtgate as the description is "Number of vstreams that ended with errors".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes good point, this could be called multiple times. So we will just keep with Add zero.

@twthorn twthorn requested a review from mattlord February 27, 2025 20:22
Copy link
Member

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this, @twthorn ! ❤️ Much appreciated.

For docs, we should add these new metrics to the v22 docs here: https://vitess.io/docs/22.0/reference/vreplication/metrics/#vtgate-metrics

Do you mind opening a website/docs PR for that as well? I can help as needed.

@twthorn
Copy link
Contributor Author

twthorn commented Feb 27, 2025

Sure thing, opened here vitessio/website#1948

@twthorn twthorn requested a review from mattlord February 27, 2025 22:46
@mattlord mattlord removed the NeedsWebsiteDocsUpdate What it says label Feb 27, 2025
twthorn added a commit to twthorn/vitess that referenced this pull request Feb 28, 2025
twthorn added a commit to slackhq/vitess that referenced this pull request Feb 28, 2025
@notfelineit notfelineit merged commit ea9ea39 into vitessio:main Mar 3, 2025
103 of 104 checks passed
tanjinx pushed a commit to slackhq/vitess that referenced this pull request Mar 12, 2025
* VReplication: Improve error handling in VTGate VStreams (vitessio#17558)

Signed-off-by: Tom Thornton <[email protected]>

* Backport vitessio#17858

---------

Signed-off-by: Tom Thornton <[email protected]>
makinje16 pushed a commit to slackhq/vitess that referenced this pull request Mar 13, 2025
* VReplication: Improve error handling in VTGate VStreams (vitessio#17558)

Signed-off-by: Tom Thornton <[email protected]>

* Backport vitessio#17858

---------

Signed-off-by: Tom Thornton <[email protected]>
tanjinx pushed a commit to slackhq/vitess that referenced this pull request Mar 13, 2025
…#16593) (#620)

* VStream API: allow keyspace-level heartbeats to be streamed (vitessio#16593)

Signed-off-by: Malcolm Akinje <[email protected]>

* `slack-19.0` backport v22 `vtorc` optimizations + stats, part 3 (#618)

* Remove unused code in discovery queue creation (vitessio#17515)

Signed-off-by: Manan Gupta <[email protected]>

* vtorc: Cleanup unused code (vitessio#15508)

Signed-off-by: Dirkjan Bussink <[email protected]>

* `vtorc`: cleanup discover queue, add concurrency flag (vitessio#17825)

Signed-off-by: Tim Vaillancourt <[email protected]>

* `vtorc`: add tablets watched stats

Signed-off-by: Tim Vaillancourt <[email protected]>

* fix missing merge conflict update

Signed-off-by: Tim Vaillancourt <[email protected]>

* `vtorc`: skip unnecessary `inst.ReadTablet` in `logic.LockShard(...)`

Signed-off-by: Tim Vaillancourt <[email protected]>

* `vtorc`: use `errgroup` in keyspace/shard discovery

Signed-off-by: Tim Vaillancourt <[email protected]>

* fix import

Signed-off-by: Tim Vaillancourt <[email protected]>

* fix ineffassign

Signed-off-by: Tim Vaillancourt <[email protected]>

* missing import

Signed-off-by: Tim Vaillancourt <[email protected]>

* `vtorc`: add stats for discovery workers

Signed-off-by: Tim Vaillancourt <[email protected]>

* get count from backend

Signed-off-by: Tim Vaillancourt <[email protected]>

* rm unused map

Signed-off-by: Tim Vaillancourt <[email protected]>

---------

Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Dirkjan Bussink <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Co-authored-by: Manan Gupta <[email protected]>
Co-authored-by: Dirkjan Bussink <[email protected]>

* Bp pr 17558 pr 17858.slack19.0 (#615)

* VReplication: Improve error handling in VTGate VStreams (vitessio#17558)

Signed-off-by: Tom Thornton <[email protected]>

* Backport vitessio#17858

---------

Signed-off-by: Tom Thornton <[email protected]>

* `slack-19.0`: re-backport tweaks from vitessio#17911 (#621)

* fix bug in reverse `if`

Signed-off-by: Tim Vaillancourt <[email protected]>

* simplify

Signed-off-by: Tim Vaillancourt <[email protected]>

* add `ReadTabletCountsByShard` test

Signed-off-by: Tim Vaillancourt <[email protected]>

* use map of map

Signed-off-by: Tim Vaillancourt <[email protected]>

* capitalize Cell

Signed-off-by: Tim Vaillancourt <[email protected]>

* gofmt lint

Signed-off-by: Tim Vaillancourt <[email protected]>

* fix plural in names

Signed-off-by: Tim Vaillancourt <[email protected]>

---------

Signed-off-by: Tim Vaillancourt <[email protected]>

---------

Signed-off-by: Malcolm Akinje <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Dirkjan Bussink <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tom Thornton <[email protected]>
Signed-off-by: Malcolm Akinje <[email protected]>
Co-authored-by: Tim Vaillancourt <[email protected]>
Co-authored-by: Manan Gupta <[email protected]>
Co-authored-by: Dirkjan Bussink <[email protected]>
Co-authored-by: Tom Thornton <[email protected]>
makinje16 pushed a commit to slackhq/vitess that referenced this pull request Mar 20, 2025
* VReplication: Improve error handling in VTGate VStreams (vitessio#17558)

Signed-off-by: Tom Thornton <[email protected]>

* Backport vitessio#17858

---------

Signed-off-by: Tom Thornton <[email protected]>
makinje16 added a commit to slackhq/vitess that referenced this pull request Mar 20, 2025
…#16593) (#620)

* VStream API: allow keyspace-level heartbeats to be streamed (vitessio#16593)

Signed-off-by: Malcolm Akinje <[email protected]>

* `slack-19.0` backport v22 `vtorc` optimizations + stats, part 3 (#618)

* Remove unused code in discovery queue creation (vitessio#17515)

Signed-off-by: Manan Gupta <[email protected]>

* vtorc: Cleanup unused code (vitessio#15508)

Signed-off-by: Dirkjan Bussink <[email protected]>

* `vtorc`: cleanup discover queue, add concurrency flag (vitessio#17825)

Signed-off-by: Tim Vaillancourt <[email protected]>

* `vtorc`: add tablets watched stats

Signed-off-by: Tim Vaillancourt <[email protected]>

* fix missing merge conflict update

Signed-off-by: Tim Vaillancourt <[email protected]>

* `vtorc`: skip unnecessary `inst.ReadTablet` in `logic.LockShard(...)`

Signed-off-by: Tim Vaillancourt <[email protected]>

* `vtorc`: use `errgroup` in keyspace/shard discovery

Signed-off-by: Tim Vaillancourt <[email protected]>

* fix import

Signed-off-by: Tim Vaillancourt <[email protected]>

* fix ineffassign

Signed-off-by: Tim Vaillancourt <[email protected]>

* missing import

Signed-off-by: Tim Vaillancourt <[email protected]>

* `vtorc`: add stats for discovery workers

Signed-off-by: Tim Vaillancourt <[email protected]>

* get count from backend

Signed-off-by: Tim Vaillancourt <[email protected]>

* rm unused map

Signed-off-by: Tim Vaillancourt <[email protected]>

---------

Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Dirkjan Bussink <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Co-authored-by: Manan Gupta <[email protected]>
Co-authored-by: Dirkjan Bussink <[email protected]>

* Bp pr 17558 pr 17858.slack19.0 (#615)

* VReplication: Improve error handling in VTGate VStreams (vitessio#17558)

Signed-off-by: Tom Thornton <[email protected]>

* Backport vitessio#17858

---------

Signed-off-by: Tom Thornton <[email protected]>

* `slack-19.0`: re-backport tweaks from vitessio#17911 (#621)

* fix bug in reverse `if`

Signed-off-by: Tim Vaillancourt <[email protected]>

* simplify

Signed-off-by: Tim Vaillancourt <[email protected]>

* add `ReadTabletCountsByShard` test

Signed-off-by: Tim Vaillancourt <[email protected]>

* use map of map

Signed-off-by: Tim Vaillancourt <[email protected]>

* capitalize Cell

Signed-off-by: Tim Vaillancourt <[email protected]>

* gofmt lint

Signed-off-by: Tim Vaillancourt <[email protected]>

* fix plural in names

Signed-off-by: Tim Vaillancourt <[email protected]>

---------

Signed-off-by: Tim Vaillancourt <[email protected]>

---------

Signed-off-by: Malcolm Akinje <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Dirkjan Bussink <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tom Thornton <[email protected]>
Signed-off-by: Malcolm Akinje <[email protected]>
Co-authored-by: Tim Vaillancourt <[email protected]>
Co-authored-by: Manan Gupta <[email protected]>
Co-authored-by: Dirkjan Bussink <[email protected]>
Co-authored-by: Tom Thornton <[email protected]>
tanjinx added a commit to slackhq/vitess that referenced this pull request Mar 24, 2025
…d Journal Events (#585)

* VTGate VStream: Ensure reasonable delivery time for reshard journal event  (vitessio#16639)

Signed-off-by: Malcolm Akinje <[email protected]>
Signed-off-by: Malcolm Akinje <[email protected]>

* Backport sqlparser patch for v15->v19 upgrade: 14763 Fix accepting bind variables in time related function calls (#590)

* Fix accepting bind variables in time related function calls. (vitessio#14763)

Signed-off-by: Manan Gupta <[email protected]>

* fix test

---------

Signed-off-by: Manan Gupta <[email protected]>
Co-authored-by: Manan Gupta <[email protected]>

* Upgrade vitess addons to 0.19.8 (#591)

This upgrade allows us to control whether vtorc raises problems or not
via an environment variable.

Signed-off-by: Eduardo J. Ortega U. <[email protected]>

* Use prefix in all vtorc check and recover logs (vitessio#17526) (#592)

This is a backport of vitessio#17526 . Original PR description below:

Description
This is meant to make recovery actions more easily identified from the logs. See vitessio#17465

Signed-off-by: Eduardo J. Ortega U. <[email protected]>

* `slack-19.0`: various backports for `vtorc`, part 2 (#596)

* Ensure all topo read calls consider `--topo_read_concurrency` (vitessio#17276)

Signed-off-by: Tim Vaillancourt <[email protected]>

* Revert "add keyrange support for vtorc clusters_to_watch (#457)"

This reverts commit 45c2199.

* [release-19.0] `vtorc`: require topo for `Healthy: true` in `/debug/health` (vitessio#17129) (vitessio#17351)

Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
Co-authored-by: Tim Vaillancourt <[email protected]>
Co-authored-by: Manan Gupta <[email protected]>

* `vtorc`: fetch all tablets from cells once + filter during refresh (vitessio#17388)

Signed-off-by: Tim Vaillancourt <[email protected]>

* Support KeyRange in `--clusters_to_watch` flag (vitessio#17604)

Signed-off-by: Manan Gupta <[email protected]>

* missing func

Signed-off-by: Tim Vaillancourt <[email protected]>

* Add api end point to print the current database state in VTOrc (vitessio#15485)

Signed-off-by: Manan Gupta <[email protected]>

---------

Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
Co-authored-by: Manan Gupta <[email protected]>
Co-authored-by: Manan Gupta <[email protected]>

* `slack-19.0`: `vtorc`: improve handling of partial cell topo results (#599)

* `vtorc`: improve handling of partial cell topo results

Signed-off-by: Tim Vaillancourt <[email protected]>

* add unit test

Signed-off-by: Tim Vaillancourt <[email protected]>

* improve test

Signed-off-by: Tim Vaillancourt <[email protected]>

* add comments

Signed-off-by: Tim Vaillancourt <[email protected]>

* move sort to test

Signed-off-by: Tim Vaillancourt <[email protected]>

* goimports

Signed-off-by: Tim Vaillancourt <[email protected]>

---------

Signed-off-by: Tim Vaillancourt <[email protected]>

* `slack-19.0`: skip tests that will fail on v15 downgrade testing (#605)

Signed-off-by: Tim Vaillancourt <[email protected]>

* `slack-19.0`: Add stats for shards watched by VTOrc (#606)

* Add stats for shards watched by VTOrc

Signed-off-by: Tim Vaillancourt <[email protected]>

* Use len() in make

---------

Signed-off-by: Tim Vaillancourt <[email protected]>

* Add `GetServerStatus` RPC to use in PRS (vitessio#16022) (#607)

Signed-off-by: Manan Gupta <[email protected]>
Co-authored-by: Manan Gupta <[email protected]>

* backport/patch connection pool bug/perf fixes (#604)

* [release-19.0] smartconnpool: do not allow connections to starve (vitessio#17675) (vitessio#17683)

Signed-off-by: Dirkjan Bussink <[email protected]>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>

* smartconnpool: Better handling for idle expiration (vitessio#17756)

Signed-off-by: Vicent Marti <[email protected]>

---------

Signed-off-by: Dirkjan Bussink <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
Co-authored-by: Vicent Martí <[email protected]>
Co-authored-by: Tim Vaillancourt <[email protected]>

* pool: reopen connection closed by idle timeout (vitessio#17818) (#609)

Signed-off-by: Harshit Gangal <[email protected]>
Signed-off-by: Vicent Martí <[email protected]>
Co-authored-by: Harshit Gangal <[email protected]>
Co-authored-by: Vicent Martí <[email protected]>

* VReplication: Support excluding lagging tablets and use this in vstream manager (vitessio#17835) (#612)

* `slack-19.0`: backport v22 VTOrc optimizations, part 2 (#613)

* `vtorc`: remove duplicate instance read from backend (vitessio#17834)

Signed-off-by: Tim Vaillancourt <[email protected]>

* `vtorc`: add index for `inst.ReadInstanceClusterAttributes` table scan

Signed-off-by: Tim Vaillancourt <[email protected]>

---------

Signed-off-by: Tim Vaillancourt <[email protected]>

* Add stats for shards watched by VTOrc, purge stale shards (vitessio#17815) (#616)

* --consolidator-query-waiter-cap to set the max number of waiter for consolidated query (vitessio#17244) (#614)

Signed-off-by: Jun Wang <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Co-authored-by: jwang <[email protected]>
Co-authored-by: Jun Wang <[email protected]>

* `slack-19.0` backport v22 `vtorc` optimizations + stats, part 3 (#618)

* Remove unused code in discovery queue creation (vitessio#17515)

Signed-off-by: Manan Gupta <[email protected]>

* vtorc: Cleanup unused code (vitessio#15508)

Signed-off-by: Dirkjan Bussink <[email protected]>

* `vtorc`: cleanup discover queue, add concurrency flag (vitessio#17825)

Signed-off-by: Tim Vaillancourt <[email protected]>

* `vtorc`: add tablets watched stats

Signed-off-by: Tim Vaillancourt <[email protected]>

* fix missing merge conflict update

Signed-off-by: Tim Vaillancourt <[email protected]>

* `vtorc`: skip unnecessary `inst.ReadTablet` in `logic.LockShard(...)`

Signed-off-by: Tim Vaillancourt <[email protected]>

* `vtorc`: use `errgroup` in keyspace/shard discovery

Signed-off-by: Tim Vaillancourt <[email protected]>

* fix import

Signed-off-by: Tim Vaillancourt <[email protected]>

* fix ineffassign

Signed-off-by: Tim Vaillancourt <[email protected]>

* missing import

Signed-off-by: Tim Vaillancourt <[email protected]>

* `vtorc`: add stats for discovery workers

Signed-off-by: Tim Vaillancourt <[email protected]>

* get count from backend

Signed-off-by: Tim Vaillancourt <[email protected]>

* rm unused map

Signed-off-by: Tim Vaillancourt <[email protected]>

---------

Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Dirkjan Bussink <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Co-authored-by: Manan Gupta <[email protected]>
Co-authored-by: Dirkjan Bussink <[email protected]>

* Bp pr 17558 pr 17858.slack19.0 (#615)

* VReplication: Improve error handling in VTGate VStreams (vitessio#17558)

Signed-off-by: Tom Thornton <[email protected]>

* Backport vitessio#17858

---------

Signed-off-by: Tom Thornton <[email protected]>

* `slack-19.0`: re-backport tweaks from vitessio#17911 (#621)

* fix bug in reverse `if`

Signed-off-by: Tim Vaillancourt <[email protected]>

* simplify

Signed-off-by: Tim Vaillancourt <[email protected]>

* add `ReadTabletCountsByShard` test

Signed-off-by: Tim Vaillancourt <[email protected]>

* use map of map

Signed-off-by: Tim Vaillancourt <[email protected]>

* capitalize Cell

Signed-off-by: Tim Vaillancourt <[email protected]>

* gofmt lint

Signed-off-by: Tim Vaillancourt <[email protected]>

* fix plural in names

Signed-off-by: Tim Vaillancourt <[email protected]>

---------

Signed-off-by: Tim Vaillancourt <[email protected]>

* fix releasing the global read lock when mysqlshell backup fails (vitessio#17000) (#623)

Signed-off-by: Renan Rangel <[email protected]>

* VStream API: allow keyspace-level heartbeats to be streamed (vitessio#16593) (#620)

* VStream API: allow keyspace-level heartbeats to be streamed (vitessio#16593)

Signed-off-by: Malcolm Akinje <[email protected]>

* `slack-19.0` backport v22 `vtorc` optimizations + stats, part 3 (#618)

* Remove unused code in discovery queue creation (vitessio#17515)

Signed-off-by: Manan Gupta <[email protected]>

* vtorc: Cleanup unused code (vitessio#15508)

Signed-off-by: Dirkjan Bussink <[email protected]>

* `vtorc`: cleanup discover queue, add concurrency flag (vitessio#17825)

Signed-off-by: Tim Vaillancourt <[email protected]>

* `vtorc`: add tablets watched stats

Signed-off-by: Tim Vaillancourt <[email protected]>

* fix missing merge conflict update

Signed-off-by: Tim Vaillancourt <[email protected]>

* `vtorc`: skip unnecessary `inst.ReadTablet` in `logic.LockShard(...)`

Signed-off-by: Tim Vaillancourt <[email protected]>

* `vtorc`: use `errgroup` in keyspace/shard discovery

Signed-off-by: Tim Vaillancourt <[email protected]>

* fix import

Signed-off-by: Tim Vaillancourt <[email protected]>

* fix ineffassign

Signed-off-by: Tim Vaillancourt <[email protected]>

* missing import

Signed-off-by: Tim Vaillancourt <[email protected]>

* `vtorc`: add stats for discovery workers

Signed-off-by: Tim Vaillancourt <[email protected]>

* get count from backend

Signed-off-by: Tim Vaillancourt <[email protected]>

* rm unused map

Signed-off-by: Tim Vaillancourt <[email protected]>

---------

Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Dirkjan Bussink <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Co-authored-by: Manan Gupta <[email protected]>
Co-authored-by: Dirkjan Bussink <[email protected]>

* Bp pr 17558 pr 17858.slack19.0 (#615)

* VReplication: Improve error handling in VTGate VStreams (vitessio#17558)

Signed-off-by: Tom Thornton <[email protected]>

* Backport vitessio#17858

---------

Signed-off-by: Tom Thornton <[email protected]>

* `slack-19.0`: re-backport tweaks from vitessio#17911 (#621)

* fix bug in reverse `if`

Signed-off-by: Tim Vaillancourt <[email protected]>

* simplify

Signed-off-by: Tim Vaillancourt <[email protected]>

* add `ReadTabletCountsByShard` test

Signed-off-by: Tim Vaillancourt <[email protected]>

* use map of map

Signed-off-by: Tim Vaillancourt <[email protected]>

* capitalize Cell

Signed-off-by: Tim Vaillancourt <[email protected]>

* gofmt lint

Signed-off-by: Tim Vaillancourt <[email protected]>

* fix plural in names

Signed-off-by: Tim Vaillancourt <[email protected]>

---------

Signed-off-by: Tim Vaillancourt <[email protected]>

---------

Signed-off-by: Malcolm Akinje <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Dirkjan Bussink <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tom Thornton <[email protected]>
Signed-off-by: Malcolm Akinje <[email protected]>
Co-authored-by: Tim Vaillancourt <[email protected]>
Co-authored-by: Manan Gupta <[email protected]>
Co-authored-by: Dirkjan Bussink <[email protected]>
Co-authored-by: Tom Thornton <[email protected]>

* Increase health check channel buffer (vitessio#17821) (#625)

Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Malcolm Akinje <[email protected]>
Co-authored-by: Manan Gupta <[email protected]>

* VStream: Allow for automatic resume after Reshard across VStreams (vitessio#15393) (#627)

Signed-off-by: Tanjin Xu <[email protected]>
Co-authored-by: Matt Lord <[email protected]>

---------

Signed-off-by: Malcolm Akinje <[email protected]>
Signed-off-by: Malcolm Akinje <[email protected]>
Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Eduardo J. Ortega U. <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Dirkjan Bussink <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Harshit Gangal <[email protected]>
Signed-off-by: Vicent Martí <[email protected]>
Signed-off-by: Jun Wang <[email protected]>
Signed-off-by: Tom Thornton <[email protected]>
Signed-off-by: Renan Rangel <[email protected]>
Signed-off-by: Tanjin Xu <[email protected]>
Co-authored-by: Tanjin Xu <[email protected]>
Co-authored-by: Manan Gupta <[email protected]>
Co-authored-by: Eduardo J. Ortega U. <[email protected]>
Co-authored-by: Tim Vaillancourt <[email protected]>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
Co-authored-by: Manan Gupta <[email protected]>
Co-authored-by: Vicent Martí <[email protected]>
Co-authored-by: Harshit Gangal <[email protected]>
Co-authored-by: Tom Thornton <[email protected]>
Co-authored-by: jwang <[email protected]>
Co-authored-by: Jun Wang <[email protected]>
Co-authored-by: Dirkjan Bussink <[email protected]>
Co-authored-by: Renan Rangel <[email protected]>
Co-authored-by: Matt Lord <[email protected]>
twthorn added a commit to slackhq/vitess that referenced this pull request May 13, 2025
* VReplication: Improve error handling in VTGate VStreams (vitessio#17558)

Signed-off-by: Tom Thornton <[email protected]>

* Backport vitessio#17858

---------

Signed-off-by: Tom Thornton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Observability Pull requests that touch tracing/metrics/monitoring Component: VReplication Type: Enhancement Logical improvement (somewhere between a bug and feature)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: VTGates should report more metrics on vstreams

3 participants