[TAS] Balanced placement #6851

pajakd · 2025-09-16T10:51:34Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR introduces an alternative mode of pod (or pod slices) placement in TAS: balanced placement. In this placement, Kueue TAS still tries to use the minimum possible number of topology domains but also maximizes the minimum number of pods (or pod slices) placed on a domain. This balancing happens when feature gate TASBalancedPlacement is enabled on a level indicated by the user with the Preferred flag.

Which issue(s) this PR fixes:

Fixes #6554

Special notes for your reviewer:

Does this PR introduce a user-facing change?

TAS: The balanced placement is introduced with the TASBalancedPlacement feature gate.

k8s-ci-robot · 2025-09-16T10:51:37Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

netlify · 2025-09-16T10:51:41Z

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Name	Link
🔨 Latest commit	`de3d9c5`
🔍 Latest deploy log	https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/690daa3d4a60580008cee22e

pajakd · 2025-09-16T10:52:08Z

/test all

pajakd · 2025-09-16T13:33:23Z

/test all

keps/2724-topology-aware-scheduling/README.md

pajakd · 2025-10-17T12:23:11Z

/test all

pajakd · 2025-10-17T13:04:32Z

/test all

keps/2724-topology-aware-scheduling/README.md

pkg/cache/scheduler/tas_flavor_snapshot.go

pkg/features/kube_features.go

keps/2724-topology-aware-scheduling/README.md

pkg/cache/scheduler/tas_balanced_placement.go

pkg/cache/scheduler/tas_flavor_snapshot.go

keps/2724-topology-aware-scheduling/README.md

pkg/cache/scheduler/tas_flavor_snapshot.go

olekzabl · 2025-10-29T11:20:45Z

pkg/cache/scheduler/tas_balanced_placement.go

+		sortDomainsByCapacityAndEntropy(domains)
+	}
+
+	// domain_placements[i][j][k] stores a list of domains that uses 'i' domains with


Nit: this is explanation is generally nice - but in 1 way ambiguous:

A. "use (i domains with j leaders) and (k pods left to fit)
B. "use i domains with ((j leaders and k pods) left to fit)

It's not "abstract mean creativity"; I originally understood A (while I think you meant B).
I think I fell into A due to reading left-to-right too impatiently.

I'm not sure if it can be phrased more clearly and still nicely.
How about just placing a pair of ( ) around "j leaders and k pods"?

Updated the description. Also updated k pods to k workers. Bc leaders are pods too. But this might actually cause more confusion as LeaderWorkerSet is only one of the types of jobs we support.

pkg/cache/scheduler/tas_balanced_placement.go

keps/2724-topology-aware-scheduling/README.md

pkg/cache/scheduler/tas_flavor_snapshot.go

PBundyra

Will proceed with the review when you address olekzabl's comments

pkg/cache/scheduler/tas_balanced_placement.go

pkg/features/kube_features.go

pajakd · 2025-11-06T09:40:42Z

@mwysokin we can definitely think about how to incorporate required and unconstrained into the balanced TAS.

The idea of using the preferred is that we need a way to mark a specific level of topology on which we perform the balancing. I might be wrong but I think that the preferred should cover all the use-cases we had so far.

mwysokin · 2025-11-06T14:12:21Z

I wouldn't block this PR from being merged just because there's no support for required and unconstrained. I do think though that it'd be quite beneficial long-term so I'd do a quick follow-up. Maybe we could have someone from AWS and Azure chime-in what the default behavior should be for unconstrained? We'll need to make assumptions about what is the level where the balancing happens, we know that for GKE the topology is block/rack/host/node. If AWS, Azure, Oracle and OpenShift have similar structures maybe we could always assume that we're balancing at 3rd from the bottom (racks) if not level is provided like in unconstrained.

mimowo · 2025-11-12T12:46:09Z

What is the algorithmic complexity of the balanced placement in relation to BestFit, at least in some simplified cases. Do we expect increased performance might be a drawback?

Knowing that could influence the decision if this should be default mode or the drawbacks are so big that it should be explicitly opted in

EDIT: it is not a blocker for Alpha. just would like to know going forward

pajakd · 2025-11-12T14:49:31Z

What is the algorithmic complexity of the balanced placement in relation to BestFit, at least in some simplified cases. Do we expect increased performance might be a drawback?

Knowing that could influence the decision if this should be default mode or the drawbacks are so big that it should be explicitly opted in

EDIT: it is not a blocker for Alpha. just would like to know going forward

The main component in the complexity of Balanced placement is something like

(max # racks in a block)^2 * (# slices to place)

if we balance on "rack" level.

The complexity of BestFit is (unless I'm missing something) linear everywhere. So, there might be some increase in complexity but I don't think it will be too much -- especially that this feature is designed for a special hardware.

mimowo · 2025-11-12T15:40:29Z

pkg/cache/scheduler/tas_flavor_snapshot.go

+		// if unconstrained is set, we'll only do it once
+		currFitDomain = s.updateCountsToMinimumGeneric(currFitDomain, count, leaderCount, sliceSize, unconstrained, true)
+
+		for levelIdx := fitLevelIdx; levelIdx+1 < len(s.domainsPerLevel); levelIdx++ {


I'm wondering about the proper injection point. The current approach is to split at the very top, but I think conceptually balanced placement is very similar to phase 2b. I can see similar traverse down code. So, the natural idea is to inject as we are stepping down and reuse the iteration. I would expect what balanced placement changes is the algorithm for sorting the domains inside sortedDomains, but to reuse calls to updateCountsToMinimumGeneric.

I'm not totally sure if this is a good idea, this is an exploratory question to check your thoughts.

mimowo · 2025-11-12T15:52:38Z

pkg/cache/scheduler/tas_flavor_snapshot.go

+	var currFitDomain []*domain
+	var fitLevelIdx int


I think it is better to reduce the scope of the helper variables between the algorithms

mimowo · 2025-11-12T15:59:02Z

pkg/cache/scheduler/tas_flavor_snapshot.go

+	if features.Enabled(features.TASBalancedPlacement) && !required && !unconstrained {
+		var bestThreshold int32
+		currFitDomain, bestThreshold, useBalancedPlacement = findBestDomainsForBalancedPlacement(s, levelIdx, sliceLevelIdx, count, leaderCount, sliceSize)


I would prefer to encapsulate the information, like:

type balancedPlacementInput struct{
...
}
then

initializationInfo := initializeAlgorithm() if initializationInfo.useBalancedPlacement { // applyBalancedPlacementAlgorithm } else { // applyBestFit }

but I also left an exploratory question to consider commonizing the code more #6851 (comment)

Assuming the overall structure of the code stays, it feels it would read better in this form:

var result var reason if tryUsingBalancedPlacement() { result, reason = useBalancedPlacement() if len(reason) > 0 { return nil, reason } } result, reason := useDefaultPlacement() if len(reason) > 0 { return nil, reason } return result

As to make the dispatching logic more clear

I think what I'm trying to say is that the current injection method seems like a weird mix of two approaches:

generalize the current algorithm flow for 2b and inject the logic when stepping down

generalize the top-level idea of a placement algorithm and introduce some structs for results, using structure similar to [TAS] Balanced placement #6851 (comment)

So, currently we don't benefit from code-reuse in (1.), but also there is no clear place for algorithm dispatching (2.) , because some internal structures are leaking into the dispatching logic. If we go with (2.) I would like to have some structs representing the result in a generic way.

I think going with (1.) feels more natural to me, and is more consistent with the direction we chose for LeastFreeCapacity, but I don't have a strong view between them.

IF the directions were considered but determined as "too hard" I would like to know the reasons. We may then consider if this goes in the PR, or we plan some refactoring for future, or just keep as is long term.

mimowo · 2025-11-12T16:01:56Z

pkg/cache/scheduler/tas_flavor_snapshot.go

+func clearState(d *domain) {
+	d.state = int32(0)
+	d.sliceState = int32(0)
+	d.stateWithLeader = int32(0)
+	d.sliceStateWithLeader = int32(0)
+	d.leaderState = int32(0)
+	for _, child := range d.children {
+		clearState(child)
+	}
+}
+
+func (s *TASFlavorSnapshot) pruneDomainsBelowThreshold(domains []*domain, threshold int32, sliceSize int32, sliceLevelIdx int, level int) {
+	for _, d := range domains {
+		for _, c := range d.children {
+			if c.sliceStateWithLeader < threshold {
+				clearState(c)
+			}
+		}
+	}
+	for _, d := range domains {
+		d.state, d.sliceState, d.stateWithLeader, d.sliceStateWithLeader, d.leaderState = s.fillInCountsHelper(d, sliceSize, sliceLevelIdx, level)
+		if d.sliceStateWithLeader < threshold {
+			clearState(d)
+		}
+	}
+}


I would move out these functions from tas_flavor_snapshot, because they are not used here, so we have a cyclic dependency between files. While this works, it is a bit of code smell. If the functions have a potential for re-usability (they may have), then move them to a helper file like tas_helpers.go

mimowo · 2025-11-13T06:47:54Z

Thank you 👍
/lgtm
/approve

I think the remaining comments are non-blocking:

the potential refactoring [TAS] Balanced placement #6851 (comment) is nice-to-have. In any case the impact will be easier to judge in a new diff (PR).
[TAS] Balanced placement #6851 (comment) is just a small follow up.

We also need to improve the release-note, so that a user get a feeling of how it works, and why to enable it.

k8s-ci-robot · 2025-11-13T06:48:01Z

LGTM label has been added.

Git tree hash: 3175763111f4e891762e487df15487006f3d3264

k8s-ci-robot · 2025-11-13T06:48:02Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, olekzabl, pajakd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mimowo]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mimowo · 2025-11-13T08:33:44Z

pkg/cache/scheduler/tas_balanced_placement.go

+
+// findBestDomainsForBalancedPlacement evaluates domains for balanced placement.
+// It returns the best set of domains, the balance threshold, and whether a balanced placement is possible.
+func findBestDomainsForBalancedPlacement(s *TASFlavorSnapshot, levelIdx, sliceLevelIdx int, count, leaderCount, sliceSize int32) ([]*domain, int32, bool) {


The last returned bool is redundant, I think we could just check if threshold > 0

* Balanced placement for TAS * Update after merge * Sort keys * Address comments * Fix pruning * Fix * Address comments * Fix verify * Comment on the future of the FG * Update doc * Update site/content/en/docs/concepts/topology_aware_scheduling.md Co-authored-by: Olek Zabłocki <[email protected]> * Update site/content/en/docs/concepts/topology_aware_scheduling.md Co-authored-by: Olek Zabłocki <[email protected]> * Update keps/2724-topology-aware-scheduling/README.md Co-authored-by: Olek Zabłocki <[email protected]> * Add comment * Add link to cleanup issue * Address comments * Align after merge * Additional testcases to increase coverage --------- Co-authored-by: Olek Zabłocki <[email protected]>

tenzen-y · 2025-11-28T09:14:36Z

/release-note-edit

TAS: The balanced placement is introduced with the TASBalancedPlacement feature gate.

pajakd added 3 commits September 16, 2025 06:50

Balanced placement for TAS

8266648

Merge branch 'main' into balanced_tas

4c26383

Update after merge

7358d91

k8s-ci-robot requested review from PBundyra and kannon92 September 16, 2025 10:51

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 16, 2025

Sort keys

e482315

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 17, 2025

olekzabl reviewed Oct 2, 2025

View reviewed changes

keps/2724-topology-aware-scheduling/README.md Show resolved Hide resolved

olekzabl reviewed Oct 2, 2025

View reviewed changes

keps/2724-topology-aware-scheduling/README.md Outdated Show resolved Hide resolved

Merge branch 'main' into balanced_tas

da9e7d1

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 17, 2025

pajakd marked this pull request as ready for review October 20, 2025 06:26

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 20, 2025

k8s-ci-robot requested a review from tenzen-y October 20, 2025 06:26

PBundyra reviewed Oct 27, 2025

View reviewed changes

keps/2724-topology-aware-scheduling/README.md Outdated Show resolved Hide resolved

pkg/cache/scheduler/tas_flavor_snapshot.go Outdated Show resolved Hide resolved

PBundyra reviewed Oct 27, 2025

View reviewed changes

pkg/features/kube_features.go Show resolved Hide resolved

olekzabl reviewed Oct 29, 2025

View reviewed changes

PBundyra reviewed Oct 29, 2025

View reviewed changes

pkg/cache/scheduler/tas_balanced_placement.go Outdated Show resolved Hide resolved

pkg/cache/scheduler/tas_balanced_placement.go Outdated Show resolved Hide resolved

kannon92 reviewed Oct 29, 2025

View reviewed changes

pkg/features/kube_features.go Outdated Show resolved Hide resolved

Address comments

069a314

pajakd force-pushed the balanced_tas branch from 24f92b7 to 069a314 Compare November 6, 2025 15:07

pajakd added 3 commits November 6, 2025 15:08

Merge branch 'main' into balanced_tas

98e197c

Align after merge

cdbc55c

Additional testcases to increase coverage

de3d9c5

mimowo reviewed Nov 12, 2025

View reviewed changes

olekzabl approved these changes Nov 12, 2025

View reviewed changes

olekzabl mentioned this pull request Nov 12, 2025

REQUEST: New membership for olekzabl kubernetes/org#5977

Closed

11 tasks

k8s-ci-robot assigned mimowo Nov 13, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 13, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 13, 2025

k8s-ci-robot merged commit 79cef95 into kubernetes-sigs:main Nov 13, 2025
23 checks passed

k8s-ci-robot added this to the v0.15 milestone Nov 13, 2025

mimowo reviewed Nov 13, 2025

View reviewed changes

mimowo mentioned this pull request Nov 13, 2025

Balanced placement cleanup: consider injecting the code inside the stepping down loop #7627

Open

This was referenced Nov 13, 2025

Cleanup of Balanced TAS #7645

Merged

Balanced placement refactor #7700

Merged

pajakd mentioned this pull request Nov 27, 2025

Self nominate pajakd as reviewer #7946

Merged

3 tasks

[TAS] Balanced placement #6851

[TAS] Balanced placement #6851

Uh oh!

Conversation

pajakd commented Sep 16, 2025 • edited by k8s-ci-robot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

k8s-ci-robot commented Sep 16, 2025

Uh oh!

netlify bot commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Uh oh!

pajakd commented Sep 16, 2025

Uh oh!

pajakd commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

pajakd commented Oct 17, 2025

Uh oh!

pajakd commented Oct 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PBundyra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pajakd commented Nov 6, 2025

Uh oh!

mwysokin commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mimowo commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pajakd commented Nov 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mimowo Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mimowo commented Nov 13, 2025

Uh oh!

k8s-ci-robot commented Nov 13, 2025

Uh oh!

k8s-ci-robot commented Nov 13, 2025

Uh oh!

pajakd commented Sep 16, 2025 •

edited by k8s-ci-robot

Loading

netlify bot commented Sep 16, 2025 •

edited

Loading

mwysokin commented Nov 6, 2025 •

edited

Loading

mimowo commented Nov 12, 2025 •

edited

Loading

mimowo Nov 12, 2025 •

edited

Loading