Skip to content

Conversation

@olekzabl
Copy link
Contributor

@olekzabl olekzabl commented Nov 26, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

It fixes #6711 - analogously as in the discontinued #6737 - but adding 2 more bits:

  • an integration test which passes now and fails without the fix
  • assertions deeper down the stack to ensure no future degradation of this kind

Which issue(s) this PR fixes:

Fixes #6711

Special notes for your reviewer:

Please see the discussion in #6711, starting from this comment.

Does this PR introduce a user-facing change?

Fix issue #6711 where an inactive workload could transiently get admitted into a queue.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Nov 26, 2025
@netlify
Copy link

netlify bot commented Nov 26, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 856f764
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/69271d5cb03fe20008fea29b

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 26, 2025
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 26, 2025
@olekzabl
Copy link
Contributor Author

/test all

if err = h.r.queues.AddOrUpdateWorkload(wlCopy); err != nil {
log.V(2).Info("ignored an error for now", "error", err)
if workload.IsActive(wlCopy) && !workload.HasQuotaReservation(wlCopy) {
if err = h.r.queues.AddOrUpdateWorkload(wlCopy); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see AddOrUpdateWorkload calls AddOrUpdateWorkloadWithoutLock under the hood and you added the checks to that function as well.

Is it purely "just in case"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you highlighted is the actual check.

What I added deeper in AddOrUpdateWorkloadWithoutLock is an assertion.

The story is that there are 6 paths to reach AddOrUpdateWorkloadWithoutLock altogether, and the other 5 were already covered with this check. (BTW not easily migratable down the call stack). So I added the 6-th check on the same level, to be "little invasive" into the code structure.

However, I also chose to add an assertion deeper down, so that tests can catch any change which adds a 7-th path but forgets to pre-check.

(Discussed in more detail in the issue, #6711).

@olekzabl olekzabl requested a review from kshalot November 26, 2025 15:13
@olekzabl
Copy link
Contributor Author

/retest

@olekzabl olekzabl marked this pull request as ready for review November 26, 2025 15:31
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 26, 2025
@mimowo
Copy link
Contributor

mimowo commented Nov 26, 2025

Thanks 👍
/lgtm
/approve
/cherrypick release-0.14
/cherrypick release-0.13

@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: once the present PR merges, I will cherry-pick it on top of release-0.13, release-0.14 in new PRs and assign them to you.

In response to this:

Thanks 👍
/lgtm
/approve
/cherrypick release-0.14
/cherrypick release-0.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 26, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 643fc0ce1864fe42115770cb12246fb93f29182e

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, olekzabl

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 26, 2025
@k8s-ci-robot k8s-ci-robot merged commit 9486673 into kubernetes-sigs:main Nov 26, 2025
28 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.15 milestone Nov 26, 2025
@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: new pull request created: #7918

In response to this:

Thanks 👍
/lgtm
/approve
/cherrypick release-0.14
/cherrypick release-0.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: #7913 failed to apply on top of branch "release-0.13":

Applying: Prevent admitting inactive workloads
Using index info to reconstruct a base tree...
A	pkg/cache/queue/manager.go
M	pkg/controller/core/workload_controller.go
M	test/integration/singlecluster/controller/core/suite_test.go
M	test/integration/singlecluster/controller/core/workload_controller_test.go
Falling back to patching base and 3-way merge...
Auto-merging test/integration/singlecluster/controller/core/workload_controller_test.go
Auto-merging test/integration/singlecluster/controller/core/suite_test.go
CONFLICT (content): Merge conflict in test/integration/singlecluster/controller/core/suite_test.go
Auto-merging pkg/queue/manager.go
CONFLICT (content): Merge conflict in pkg/queue/manager.go
Auto-merging pkg/controller/core/workload_controller.go
CONFLICT (content): Merge conflict in pkg/controller/core/workload_controller.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 Prevent admitting inactive workloads

In response to this:

Thanks 👍
/lgtm
/approve
/cherrypick release-0.14
/cherrypick release-0.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot pushed a commit that referenced this pull request Nov 27, 2025
…7939)

* Prevent admitting inactive workloads

* Adress lint finding & review comment

* Yet another linter fix

* Manually fix imports
k8s-ci-robot pushed a commit that referenced this pull request Nov 27, 2025
…7944)

* Prevent admitting inactive workloads

* Adress lint finding & review comment

* Yet another linter fix

* Manually fix imports

* Remove an empty line
Singularity23x0 added a commit to Singularity23x0/kueue that referenced this pull request Nov 28, 2025
* Initital implementation.

* added assumed workloads verification to tests

* [Cleanup] Restrict controller-manager access of ClusterProfiles to the Kueue namespace (kubernetes-sigs#7843)

* Restrict access to ClusterProfiles

to those existing in kueue's namespace

* Use fake client instead of stub

* Log when skipping ClusterProfile

* feat: display flavor assignment attempts in events (kubernetes-sigs#7646)

* Prevent admitting inactive workloads (kubernetes-sigs#7913)

* Prevent admitting inactive workloads

* Adress lint finding & review comment

* Yet another linter fix

* Add integration tests for excludeResourcePrefixes scheduler configuration (kubernetes-sigs#7492)

This adds comprehensive integration tests for the excludeResourcePrefixes
feature in a dedicated test directory following the pattern of other
scheduler tests (fairsharing, podsready).

The tests verify:
- Basic resource exclusion from quota calculations
- Multiple excluded resource prefixes
- Quota enforcement for non-excluded resources
- Exact prefix matching (not substring matching)

All tests create Workload objects directly and use the scheduler test
utilities for robust assertions.

* Add performance tests for v1beta2 TopologyAssignment encoding (kubernetes-sigs#7821)

* Add performance tests for v1beta2 TAS assignment

* Various fixes

* Address linter findings

* Add some logging to debug Prow timeout

* Round node limits to thousands (to speed up the test)

* Remove logging; speed up the test

* Change node pool counts

* TAS: Respect `requiredDuringSchedulingIgnoredDuringExecution` affinity (kubernetes-sigs#7899)

* Respect `requiredDuringSchedulingIgnoredDuringExecution` in TAS

* Remove redundant information from `leafDomain`

* Reduce the amount of `nil` checks when handling affinity in TAS

* Adjust comment describing node selector check

* Add unit test for TAS required affinity

* Make label keys less ambiguous in test

* Reword the TAS affinity check test cases

* Add wait for ClusterQueue Active status in visibility test (kubernetes-sigs#7922)

* Unauthorized error when running renovate (kubernetes-sigs#7926)

* Fix the availability check after rolling restart (kubernetes-sigs#7925)

* test: fix pod groups E2E test flake by checking workload admission before gates (kubernetes-sigs#7917)

Signed-off-by: Sohan Kunkerkar <[email protected]>

* Use main context for CertBootstrap (kubernetes-sigs#7930)

* Replace kueue-populator references for release (kubernetes-sigs#7932)

* add replaces to prepare-release-branch target

* prepare READMEs

* make DES_CHART_DIR overridable

* cleanup

* cleanup

* cleanup

* add makefile comment

* replace registry in readmes

* Revert "replace registry in readmes"

This reverts commit 0b066cc.

* Fix the flaky test for resources (kubernetes-sigs#7941)

* Cleanup.

* Applied review comments.

* Applied review comments 2.

* Restore log.

* Using wrong qKey source in heads fix.

* Fixed pernding workloads reporting bug.

* cq test TestFIFOClusterQueue cleanup.

* Appwraper test cleanup.

---------

Signed-off-by: Sohan Kunkerkar <[email protected]>
Co-authored-by: Michał Szadkowski <[email protected]>
Co-authored-by: Mykyta Derhunov <[email protected]>
Co-authored-by: Olek Zabłocki <[email protected]>
Co-authored-by: Kevin Hannon <[email protected]>
Co-authored-by: Karol Szuster <[email protected]>
Co-authored-by: Irving Mondragón <[email protected]>
Co-authored-by: vladikkuzn <[email protected]>
Co-authored-by: Michał Woźniak <[email protected]>
Co-authored-by: Sohan Kunkerkar <[email protected]>
Co-authored-by: Jakub Skiba <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inactive workloads are occasionally admitted

5 participants