-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Online DDL: resume vreplication after cut-over/RENAME failure #18428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Online DDL: resume vreplication after cut-over/RENAME failure #18428
Conversation
Signed-off-by: Shlomi Noach <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
| cancellable = append(cancellable, newCancellableMigration(uuid, s.message)) | ||
| } | ||
| if !s.isRunning() { | ||
| log.Infof("migration %s in 'running' state but vreplication state is '%s'", uuid, s.state.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some added visibility.
|
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18428 +/- ##
==========================================
+ Coverage 67.49% 67.50% +0.01%
==========================================
Files 1607 1607
Lines 262741 262791 +50
==========================================
+ Hits 177339 177407 +68
+ Misses 85402 85384 -18 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…entry can exist on RETRYing a failed migration) Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
|
Add the backport labels |
| if err := e.startVreplication(ctx, tablet.Tablet, s.workflow); err != nil { | ||
| log.Errorf("cutOverVReplMigration %v: failed restarting vreplication after cutover failure: %v", s.workflow, err) | ||
| } | ||
| go log.Infof("cutOverVReplMigration %v: started vreplication after cutover failure", s.workflow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why to start a goroutine for calling in log function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because in this critical section I don't want to block on writing a log line, as silly as it may be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shlomi-noach Is it then essential to log at all? Starting a goroutine also has some cost 😄.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbussink , yeah, I'd like this entry to appear in the logs. If it appears, it means vreplication has definitely started, which is an important info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not be the only go log.Infof call in the cutOverVReplMigration function. There's a bunch of other such calls.
harshit-gangal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one comment. rest looks good
Co-authored-by: Noble Mittal <[email protected]> Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]> Co-authored-by: Noble Mittal <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]> Co-authored-by: Noble Mittal <[email protected]>
…failure (#18428) (#18437) Signed-off-by: Shlomi Noach <[email protected]> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com> Co-authored-by: Noble Mittal <[email protected]>
…failure (#18428) (#18436) Signed-off-by: Shlomi Noach <[email protected]> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com> Co-authored-by: Noble Mittal <[email protected]>
* origin/master: bugfix: Fix impossible query for UNION (vitessio#18463) fix topo use in local_example (vitessio#18357) fix: update go-upgrade tool to check patch number (vitessio#18252) (vitessio#18402) Update MAINTAINERS.md and CODEOWNERS (vitessio#18462) Add logging to binlog watcher actions (vitessio#18264) `schemadiff`: `RelatedForeignKeyTables()` (vitessio#18195) `vtorc`: allow recoveries to be disabled from startup (vitessio#18005) Fix `vttablet` not being marked as not serving when MySQL stalls (vitessio#17883) make xtrabackup ShouldDrainForBackup configurable (vitessio#18431) Reset in-memory sequence info on vttablet on UpdateSequenceTables request (vitessio#18415) Fix watcher storm during topo outages (vitessio#18434) Online DDL: resume vreplication after cut-over/RENAME failure (vitessio#18428) Online DDL cutover enhancements (vitessio#18423) VStreamer: change in filter logic (vitessio#18319) Online DDL metrics: `OnlineDDLStaleMigrationMinutes` (vitessio#18417) Signed-off-by: Morgan Tocker <[email protected]>
…failure (vitessio#18428) (vitessio#18437) Signed-off-by: Shlomi Noach <[email protected]> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com> Co-authored-by: Noble Mittal <[email protected]>
Description
In Online DDL cut-over, and once vreplication is stopped just before renaming the tables, ensure to restart vreplication in case the cut-over ends up failing (specifically there is one remaining operation, the
RENAME).Another thing in this PR: when canceling a vreplication migration (can be due to user's explicit
CANCEL, or due to failure), do not delete the vreplication entry, as it is useful for debugging/analysis purposes.Backport
This is a bug that needs to be backported to supported versions.
Related Issue(s)
Fixes #18427
Checklist
Deployment Notes