Skip to content

Conversation

@olekzabl
Copy link
Contributor

@olekzabl olekzabl commented Nov 5, 2025

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

This PR:

The main motivation for this change is to support more nodes for a single workload.
For some real-life node naming schemes which we considered:

  • the current limit (without this change) is ~25k nodes
  • the new limit (with this change) is ~60k nodes
  • this change also leaves a path for future enhancements, of two kinds:
    • by changing the underlying encoding strategy (but not the format), so that multiple Slices are used, we might grow beyond 100k
    • by exporting some Slices to separate CRD objects (a larger move, but based on this PR), we can practically get rid of any limit.

Which issue(s) this PR fixes:

Fixes #7220

Special notes for your reviewer:

  1. Currently WIP. A summary of what's already done vs. postponed:

    • API types
      • Comments on those
      • Kubebuilder (cross-)validation rules
    • Conversion between old & new format
    • Existing code builds
    • Existing unit tests migrated to pass (minimal effort so far)
    • Existing integration tests migrated to pass (minimal effort so far)
    • Existing E2E tests migrated to pass (minimal effort so far)
    • Test coverage for tas_assignment.go
      (99.1%, the only line not tested is this one - which is not testable but probably still good to keep)
    • Test coverage for the conversion webhook
    • Comments for the new implementation code (minimalistic, where non-trivial)
    • Release note

    And some optional points (I think they could be postponed to follow-up PRs, or dropped altogether - feedback is welcome):

    • Ensuring that the existing tests are adjusted "nicely" (see # 4 below)
      • Current approach: A is good enough for this PR, C may be done as follow-up.
        I will open an issue for this once I hear voices preferring sth else than A.
    • Tests for validation rules (likely integration tests)
    • "Performance" tests, tracked in Add performance tests for v1beta2 TopologyAssignment encoding #7640:
      • Tests for limits of how many nodes can fit in 1 etcd entry, assuming most popular node naming schemes
      • Benchmarks of "encoding" performance for those naming schemes
  2. The encoding always extracts the longest possible prefix & suffix.
    We once considered an alternative to extract only up to a punctuation character (- or .).
    For now, I chose against this, for the following reasons:

    • This would degrade "compression efficiency" e.g. for AKS node naming scheme: {nodePoolId}-000023 etc.
      For a single node pool, we would keep the 0000 outside of the prefix, for no good reason.
    • In my feeling, leaving 0000 outside of the prefix could also confuse the users.
    • Last, the code is slightly simpler without this.
  3. My general viewpoint is that the old format is better for reasoning about. The new format is only needed "at the surface" of Kueue, to be used in what we send to etcd (and then, obviously, what we read from it), to ensure scalability there.
    Consequently, I stick to the old format in internal Kueue code wherever I could.
    (To avoid depending on v1beta1, I introduced new internal type, tas.TopologyAssignment).
    This had the benefit of reducing the size of this refactor.
    However, my main motivation is permanent, so I intend this tas.TopologyAssignment to stay permanently.

  4. As for how the tests were migrated, this was a quick attempt which we may choose to improve.
    My main tactic so far was to replace

    Expect(X).Should(BeComparableTo(kueue.TopologyAssignment{ ... }))

    with

    Expect(X).Should(BeComparableTo(tas.V1Beta2From(tas.TopologyAssignment{ ... })))

    i.e. switch old to internal format, then just convert it to the new one (on the "expected" side).

    This may be considered ugly, in 2 ways: (i) it makes all tests depend on the conversion logic, and (ii) it asserts a very specific encoding, one of many possible variants, of a TopologyAssignment.

    I see 3 general approaches to this, and I'll be curious on feedback here
    (where, besides direction feedback, we could also discuss if changes should happen in this PR or a follow-up one):

    A. Keep this as is, it's good enough.

    B. Replace the calls to tas.V1Beta2From() with their results, so that we explicitly say what we expect.
    I don't like it, as it would make tests depend on the implementation details of internal -> v1beta2 conversion.

    C. Switch to converting on the other side:

    Expect(tas.InternalFrom(X)).Should(BeComparableTo(tas.TopologyAssignment{ ... }))

    Here, the benefit is that we abstract from potential future changes of our encoding strategy.
    We just assess what assignment is encoded, no matter how exactly.

Does this PR introduce a user-facing change?

v1beta2: The internal representation of TopologyAssignment (in WorkloadStatus) has been reorganized to allow using TAS for larger workloads. (More specifically, under the assumptions described in issue #7220, it allows to increase the maximal workload size from approx. 20k to approx. 60k nodes).

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 5, 2025
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 5, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @olekzabl. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@netlify
Copy link

netlify bot commented Nov 5, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 46b74af
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/69164580f0ffdc00088554ba

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 5, 2025
@olekzabl
Copy link
Contributor Author

olekzabl commented Nov 5, 2025

@mimowo @mwielgus @PBundyra FYI
(I guess you're busy these days, so I'm willing to wait - but any early feedback will be appreciated)

@olekzabl olekzabl changed the title TopologyAssignment v1beta2 WIP: TopologyAssignment v1beta2 Nov 5, 2025
@mimowo
Copy link
Contributor

mimowo commented Nov 5, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 5, 2025
return autoConvert_v1beta1_WorkloadStatus_To_v1beta2_WorkloadStatus(in, out, s)
}

func Convert_v1beta1_TopologyAssignment_To_v1beta2_TopologyAssignment(in *TopologyAssignment, out *v1beta2.TopologyAssignment, s conversionapi.Scope) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the conversion is non-trivial it would be great to add a unit test in workload_conversion_test.go

To make diff initial you could draft some base tests as part of #7394, which we could extend in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Specifically, I added:

  • a richer set of unit test cases in a dedicated file, tas_assigment_test.go
  • a smaller set of test cases in workload_conversion_test.go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new conversion tests lgtm

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2025
@olekzabl
Copy link
Contributor Author

olekzabl commented Nov 6, 2025

/retest

3 similar comments
@olekzabl
Copy link
Contributor Author

olekzabl commented Nov 6, 2025

/retest

@olekzabl
Copy link
Contributor Author

olekzabl commented Nov 6, 2025

/retest

@olekzabl
Copy link
Contributor Author

olekzabl commented Nov 6, 2025

/retest

@olekzabl
Copy link
Contributor Author

olekzabl commented Nov 6, 2025

/test

@k8s-ci-robot
Copy link
Contributor

@olekzabl: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

/test pull-kueue-build-image-main
/test pull-kueue-test-e2e-certmanager-main
/test pull-kueue-test-e2e-customconfigs-main
/test pull-kueue-test-e2e-kueueviz-main
/test pull-kueue-test-e2e-main-1-31
/test pull-kueue-test-e2e-main-1-32
/test pull-kueue-test-e2e-main-1-33
/test pull-kueue-test-e2e-main-1-34
/test pull-kueue-test-e2e-multikueue-main
/test pull-kueue-test-e2e-tas-main
/test pull-kueue-test-integration-baseline-main
/test pull-kueue-test-integration-extended-main
/test pull-kueue-test-integration-multikueue-main
/test pull-kueue-test-scheduling-perf-main
/test pull-kueue-test-unit-main
/test pull-kueue-verify-main

Use /test all to run all jobs.

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@olekzabl
Copy link
Contributor Author

olekzabl commented Nov 6, 2025

/retest

1 similar comment
@olekzabl
Copy link
Contributor Author

olekzabl commented Nov 6, 2025

/retest

@olekzabl
Copy link
Contributor Author

olekzabl commented Nov 6, 2025

/retests

@olekzabl
Copy link
Contributor Author

/retest

@olekzabl
Copy link
Contributor Author

/retest

@olekzabl olekzabl marked this pull request as ready for review November 13, 2025 15:33
@k8s-ci-robot k8s-ci-robot requested a review from mimowo November 13, 2025 15:33
@olekzabl olekzabl requested a review from kshalot November 13, 2025 15:33
@olekzabl olekzabl changed the title WIP: TopologyAssignment v1beta2 TopologyAssignment v1beta2 Nov 13, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 13, 2025
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, mostly nits, thank you for making the diff so easy to read thanks to the helper functions

One hesitation I have is still about handling the punctuation: "-" and ".". If we don't use punctuation now, then using it in the future to optimize performance may be breaking for some clusters on prem which don't use punctuation for node naming. OTOH, I'm not sure it makes sense to prematurely complicate the code - we will be able to evolve the code using feature gates anyway.

@olekzabl
Copy link
Contributor Author

olekzabl commented Nov 13, 2025

One hesitation I have is still about handling the punctuation: "-" and ".".

As mentioned in the description of this PR, I prefer not to deal with punctuation at this point - for 2 reasons:

  • as you mentioned, for not complicating the code

but also

  • for not discriminating against naming schemes like X-00000Y.

Then, to address your specific point:

If we don't use punctuation now, then using it in the future to optimize performance may be breaking for some clusters on prem which don't use punctuation for node naming.

Once we enter the level of sophistication where punctuation would be used to aid performance - we can also think of some heuristic like "drop sticking to punctuation for those inputs where it's very rare". WDYT?

@olekzabl olekzabl requested a review from mimowo November 13, 2025 21:07
@mimowo
Copy link
Contributor

mimowo commented Nov 14, 2025

Once we enter the level of sophistication where punctuation would be used to aid performance - we can also think of some heuristic like "drop sticking to punctuation for those inputs where it's very rare". WDYT?

Maybe, or we just use the new algorithm if there are any punctuations, and they are at consistent positions for all node names.

I think if we, in the future, rollout a new algorithm we will need to use a feature-gate anyway. Then, we can enable that new feature gate as default, and basically wait for 2-3 releases to see if we get feedback from users. These users will be able to disable the feature gate if needed. If we don't hear feedback we will be free to roll the new algorithm.

@mimowo
Copy link
Contributor

mimowo commented Nov 14, 2025

Thank you 👍
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 14, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: fc9998269a0307f6383f42742513ebfae64a227b

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 14, 2025
@k8s-ci-robot k8s-ci-robot merged commit 7126635 into kubernetes-sigs:main Nov 14, 2025
22 checks passed
Comment on lines +477 to +479
// It must be either nil pointer or a non-empty string.
// +optional
// +kubebuilder:validation:MaxLength=63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

  1. Wording.
Suggested change
// It must be either nil pointer or a non-empty string.
// +optional
// +kubebuilder:validation:MaxLength=63
// It must be either a nil pointer or a non-empty string.
// +optional
// +kubebuilder:validation:MaxLength=63

Also I suppose the wording could be changed to sth like:

Suggested change
// It must be either nil pointer or a non-empty string.
// +optional
// +kubebuilder:validation:MaxLength=63
// If the values do not share a common prefix, it must be nil.
// +optional
// +kubebuilder:validation:MaxLength=63

You can also say ...it must be nil rather than an empty string.

  1. If the string has to be non-empty, can we add a MinLength=1 validation? Or does it somehow not play nicely with the nil value?

) {
var prefix, suffix string
var maxLen, minLen, count int
for s := range inputProvider() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth to add an early exit if both the prefix and suffix become empty inside this loop? In infrastructures using a naming scheme where the nodes don't usually share a prefix/suffix (whatever they may be) it would save some iterations.

Comment on lines +141 to +145
if count == 1 {
prefix = s
suffix = s
maxLen = len(s)
minLen = len(s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can consider moving the initialization outside of the loop by converting the iterator to a "pull operator". But this would change the code in other ways as well.

@k8s-ci-robot k8s-ci-robot added this to the v0.15 milestone Nov 14, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kshalot, mimowo, olekzabl

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Singularity23x0 pushed a commit to Singularity23x0/kueue that referenced this pull request Nov 17, 2025
* Edited v1beta2 types

* (empty commit by mistake)

* TAS assignment v1beta2

* Ran 'make manifest'; some fixes

* Fix integration tests

* Another fix of integration tests

* Updated the API comment on v1beta2 TopologyAssignment

* Fixes after sync

* Adding generated files; Prow tests seem to fail without them

* Omit empty slices in JSON

* Regenerating a file

* Fix comment typo

* Fixed duplicated import

* Add licensing header in tas_assignment.go

* Formatting fixes

* Fix imports arrangement

* More fixes

* Regenerated files

* Ran `make update-helm`

* Use ptr.Deref

* Adding tas_assignment_test.go; fixing a bug :)

* Reorganized unit tests - testing 3 functions with same test cases

* Complete unit tests for tas_assignment.go

* Fix test

* Fix after sync

* Fix after sync

* Fix imports

* Set MinLength=1 for prefix & suffix (following KEP discussion)

* Apply data format changes suggested in KEP PR

* Added unit tests for conversion webhook

* Add comments on fuctions (where non-trivial)

* Adjust after sync

* Fix wrong JSON field name spec

* Regenerated files

* Another test update

* Please a linter

* Regenerated files

* Fixes in magic comments

* More fixes in magic comments

* Addressing comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support larger workloads in TAS

4 participants