-
Notifications
You must be signed in to change notification settings - Fork 479
TopologyAssignment v1beta2 #7544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @olekzabl. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
|
/ok-to-test |
| return autoConvert_v1beta1_WorkloadStatus_To_v1beta2_WorkloadStatus(in, out, s) | ||
| } | ||
|
|
||
| func Convert_v1beta1_TopologyAssignment_To_v1beta2_TopologyAssignment(in *TopologyAssignment, out *v1beta2.TopologyAssignment, s conversionapi.Scope) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the conversion is non-trivial it would be great to add a unit test in workload_conversion_test.go
To make diff initial you could draft some base tests as part of #7394, which we could extend in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Specifically, I added:
- a richer set of unit test cases in a dedicated file,
tas_assigment_test.go - a smaller set of test cases in
workload_conversion_test.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new conversion tests lgtm
|
/retest |
3 similar comments
|
/retest |
|
/retest |
|
/retest |
|
/test |
|
@olekzabl: The Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
1 similar comment
|
/retest |
|
/retests |
|
/retest |
|
/retest |
mimowo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, mostly nits, thank you for making the diff so easy to read thanks to the helper functions
One hesitation I have is still about handling the punctuation: "-" and ".". If we don't use punctuation now, then using it in the future to optimize performance may be breaking for some clusters on prem which don't use punctuation for node naming. OTOH, I'm not sure it makes sense to prematurely complicate the code - we will be able to evolve the code using feature gates anyway.
As mentioned in the description of this PR, I prefer not to deal with punctuation at this point - for 2 reasons:
but also
Then, to address your specific point:
Once we enter the level of sophistication where punctuation would be used to aid performance - we can also think of some heuristic like "drop sticking to punctuation for those inputs where it's very rare". WDYT? |
Maybe, or we just use the new algorithm if there are any punctuations, and they are at consistent positions for all node names. I think if we, in the future, rollout a new algorithm we will need to use a feature-gate anyway. Then, we can enable that new feature gate as default, and basically wait for 2-3 releases to see if we get feedback from users. These users will be able to disable the feature gate if needed. If we don't hear feedback we will be free to roll the new algorithm. |
|
Thank you 👍 |
|
LGTM label has been added. Git tree hash: fc9998269a0307f6383f42742513ebfae64a227b
|
| // It must be either nil pointer or a non-empty string. | ||
| // +optional | ||
| // +kubebuilder:validation:MaxLength=63 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
- Wording.
| // It must be either nil pointer or a non-empty string. | |
| // +optional | |
| // +kubebuilder:validation:MaxLength=63 | |
| // It must be either a nil pointer or a non-empty string. | |
| // +optional | |
| // +kubebuilder:validation:MaxLength=63 |
Also I suppose the wording could be changed to sth like:
| // It must be either nil pointer or a non-empty string. | |
| // +optional | |
| // +kubebuilder:validation:MaxLength=63 | |
| // If the values do not share a common prefix, it must be nil. | |
| // +optional | |
| // +kubebuilder:validation:MaxLength=63 |
You can also say ...it must be nil rather than an empty string.
- If the string has to be non-empty, can we add a
MinLength=1validation? Or does it somehow not play nicely with thenilvalue?
| ) { | ||
| var prefix, suffix string | ||
| var maxLen, minLen, count int | ||
| for s := range inputProvider() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth to add an early exit if both the prefix and suffix become empty inside this loop? In infrastructures using a naming scheme where the nodes don't usually share a prefix/suffix (whatever they may be) it would save some iterations.
| if count == 1 { | ||
| prefix = s | ||
| suffix = s | ||
| maxLen = len(s) | ||
| minLen = len(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can consider moving the initialization outside of the loop by converting the iterator to a "pull operator". But this would change the code in other ways as well.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kshalot, mimowo, olekzabl The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Edited v1beta2 types * (empty commit by mistake) * TAS assignment v1beta2 * Ran 'make manifest'; some fixes * Fix integration tests * Another fix of integration tests * Updated the API comment on v1beta2 TopologyAssignment * Fixes after sync * Adding generated files; Prow tests seem to fail without them * Omit empty slices in JSON * Regenerating a file * Fix comment typo * Fixed duplicated import * Add licensing header in tas_assignment.go * Formatting fixes * Fix imports arrangement * More fixes * Regenerated files * Ran `make update-helm` * Use ptr.Deref * Adding tas_assignment_test.go; fixing a bug :) * Reorganized unit tests - testing 3 functions with same test cases * Complete unit tests for tas_assignment.go * Fix test * Fix after sync * Fix after sync * Fix imports * Set MinLength=1 for prefix & suffix (following KEP discussion) * Apply data format changes suggested in KEP PR * Added unit tests for conversion webhook * Add comments on fuctions (where non-trivial) * Adjust after sync * Fix wrong JSON field name spec * Regenerated files * Another test update * Please a linter * Regenerated files * Fixes in magic comments * More fixes in magic comments * Addressing comments
What type of PR is this?
/kind feature
/kind api-change
What this PR does / why we need it:
This PR:
TopologyAssignmentin v1beta2, as proposed in Support larger workloads in TAS #7220 and planned in KEP KEP changes for v1beta2 TopologyAssignment #7419universalPodCount&universalValuewhenever possibleuniversalValue, always extracts the longest possibleprefix&suffix.The main motivation for this change is to support more nodes for a single workload.
For some real-life node naming schemes which we considered:
Which issue(s) this PR fixes:
Fixes #7220
Special notes for your reviewer:
Currently WIP. A summary of what's already done vs. postponed:
tas_assignment.go(99.1%, the only line not tested is this one - which is not testable but probably still good to keep)
And some optional points (I think they could be postponed to follow-up PRs, or dropped altogether - feedback is welcome):
I will open an issue for this once I hear voices preferring sth else than A.
Tracked in Add validation tests for v1beta2 TopologyAssignment #7639.
The encoding always extracts the longest possible prefix & suffix.
We once considered an alternative to extract only up to a punctuation character (
-or.).For now, I chose against this, for the following reasons:
{nodePoolId}-000023etc.For a single node pool, we would keep the
0000outside of the prefix, for no good reason.0000outside of the prefix could also confuse the users.My general viewpoint is that the old format is better for reasoning about. The new format is only needed "at the surface" of Kueue, to be used in what we send to etcd (and then, obviously, what we read from it), to ensure scalability there.
Consequently, I stick to the old format in internal Kueue code wherever I could.
(To avoid depending on
v1beta1, I introduced new internal type,tas.TopologyAssignment).This had the benefit of reducing the size of this refactor.
However, my main motivation is permanent, so I intend this
tas.TopologyAssignmentto stay permanently.As for how the tests were migrated, this was a quick attempt which we may choose to improve.
My main tactic so far was to replace
with
i.e. switch old to internal format, then just convert it to the new one (on the "expected" side).
This may be considered ugly, in 2 ways: (i) it makes all tests depend on the conversion logic, and (ii) it asserts a very specific encoding, one of many possible variants, of a TopologyAssignment.
I see 3 general approaches to this, and I'll be curious on feedback here
(where, besides direction feedback, we could also discuss if changes should happen in this PR or a follow-up one):
A. Keep this as is, it's good enough.
B. Replace the calls to
tas.V1Beta2From()with their results, so that we explicitly say what we expect.I don't like it, as it would make tests depend on the implementation details of internal -> v1beta2 conversion.
C. Switch to converting on the other side:
Here, the benefit is that we abstract from potential future changes of our encoding strategy.
We just assess what assignment is encoded, no matter how exactly.
Does this PR introduce a user-facing change?