-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Feature(onlineddl): Add shard-specific completion to online ddl #18331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature(onlineddl): Add shard-specific completion to online ddl #18331
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
90ed323 to
d79d7ac
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18331 +/- ##
==========================================
- Coverage 67.51% 67.50% -0.02%
==========================================
Files 1607 1607
Lines 262684 262688 +4
==========================================
- Hits 177343 177315 -28
- Misses 85341 85373 +32 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Doesn't this get much more complex when planning scatter queries that use |
|
I guess if we limit the class of changes that are allowed it could work out. Your use case for changing onyl indexes would fit into this. |
|
Thanks for the note, @GrahamCampbell You're right that scatter queries using SELECT * can run into issues if the schema is not consistent across shards. That said, even today Vitess already allows schema changes to be launched only on specific shards using ApplySQLSchema with --postpone-launch and then selectively launching on those shard. So this PR doesn’t introduce that inconsistency — it just extends support to complete migrations at the shard level, which currently isn't possible. At our organisation, we have an internal mandate to do safe, staged schema changes. This means:
In terms of correctness, i believe it's the responsibility of the operator or database admin to plan and execute shard-by-shard rollouts safely. Enforcing consistency or rollout strategy checks at the Vitess code level would add unnecessary complexity and may not fit all use cases. This PR gives operators the control and flexibility, while assuming they will apply it responsibly. |
ec39edc to
f1f1e11
Compare
shlomi-noach
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! This is the exact same way I would have implemented this. A very minor suggestion for the endtoend test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add an artificial sleep here just to ensire we're not missing a would-be completion
| // Migration should still be in running state | |
| time.Sleep(2 * time.Second) | |
| // Migration should still be in running state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would drop this file's changes. The correct place is in onlineddl_scheduler_test.go. I see the test is also failing, but in all honesty, I think we should just remove these changes rather than debug this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with respect to correct place for this test, i can place it in onlineddl_scheduler_test.go
functionality - Enhance `CompleteMigration` to support shard-specific arguments - Update test cases to validate shard-based online DDL migrations - Modify query execution in `tabletserver` to include shard details - Adjust SQL parser mappings to incorporate shard-specific changes - Refactor OnlineDDL executor logic for better shard-based migration handling This update improves the granularity of migration completion, allowing shard-specific operations for postponed migrations. Signed-off-by: Siddharth Singh <siddharth16396@@gmail.com> Signed-off-by: siddharth16396 <[email protected]>
Signed-off-by: siddharth16396 <[email protected]>
Signed-off-by: siddharth16396 <[email protected]>
Signed-off-by: siddharth16396 <[email protected]>
…pletion Signed-off-by: siddharth16396 <[email protected]>
4a5da2c to
bd9e5d0
Compare
|
WRT the website documentation, i'll create a PR soon. |
|
@shlomi-noach : i have incoporated your reviews. also i think by mistake (or because i pushed another commit) it requires a re-stamp from you. |
|
Thank you, looking again. Please do not |
|
@shlomi-noach : sorry for the force push, i messed a few things up in my local branch and had no option. only change i made from last review was :: c87e90f Which was to incorp your this comment:
Nothing else is changed from previous stamp. |
|
I can see why the test is failing, as the |
Signed-off-by: Shlomi Noach <[email protected]>
|
I pushed a fix and |
|
Thanks a lot for the fix, and like you suggested i will remove the test from Updated with new commit to remove that test entirely. Really appreciate all the help 🙇 |
Signed-off-by: siddharth16396 <[email protected]>
…tests * origin/master: (32 commits) test: Fix race condition in TestStreamRowsHeartbeat (vitessio#18414) VReplication: Improve permission check logic on external tablets on SwitchTraffic (vitessio#18348) Perform post copy actions in atomic copy (vitessio#18411) Update `operator.yaml` (vitessio#18364) Feature(onlineddl): Add shard-specific completion to online ddl (vitessio#18331) Set parsed comments in operator for subqueries (vitessio#18369) `vtorc`: move shard primary timestamp to time type (vitessio#18401) `vtorc`: rename `isClusterWideRecovery` -> `isShardWideRecovery` (vitessio#18351) `vtorc`: remove dupe keyspace/shard in replication analysis (vitessio#18395) Topo: Add NamedLock test for zk2 and consul and get them passing (vitessio#18407) Handle MySQL 9.x as New Flavor in getFlavor() (vitessio#18399) Add support for sending grpc server backend metrics via ORCA (vitessio#18282) asthelpergen: add design documentation (vitessio#18403) `vtorc`: add keyspace/shard labels to recoveries stats (vitessio#18304) `vtorc`: cleanup `database_instance` location fields (vitessio#18339) avoid derived tables for UNION when possible (vitessio#18393) [Bugfix] Broken Heartbeat system in Row Streamer (vitessio#18390) Update MAINTAINERS.md (vitessio#18394) move vmg to emeritus (vitessio#18388) Make sure to check if the server is closed in etcd2topo (vitessio#18352) ...


Description
This PR implements the feature request in #18335
TL;DR: While Vitess already supports launching schema changes shard by shard, there is currently no way to complete those migrations shard by shard. This PR addresses that gap.
This enhancement facilitates scenarios where teams aim to safely test the impact of schema changes, such as adding an index, on a specific shard before deploying it across the entire keyspace.
Example:
Consider a keyspace with multiple shards (e.g., X shards). The objective is to:
To limit the blast radius, the schema change should be:
This feature enables shard-level control and observability, allowing for targeted testing and validation.
The code changes in this PR add shard-specific completion to online alter
CompleteMigrationto support shard-specific argumentstabletserverto include shard detailsThis update improves the granularity of migration completion, allowing
shard-specific operations for postponed migrations.
Vitess #general slack discussion with shlomi:: https://vitess.slack.com/archives/C0PQY0PTK/p1737524865276269
Related Issue(s)
Checklist
Deployment Notes
testing Notes