Skip to content

Conversation

@sbgla-sas
Copy link
Contributor

@sbgla-sas sbgla-sas commented Oct 7, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

The current Kueue Helm chart grants cluster-wide access to Secrets. Kueue, however, interacts only with Secret resources in its own namespace to:

  • Generate/rotate certs if using internal certificate management.
  • Obtain kubeconfig(s) for worker clusters if using MultiKueue.

This PR aligns Kueue controller manager RBAC with the principle of least privilege, restricting access to Secrets in the same namespace as the Kueue controller manager.

A similar approach is taken to the existing leader-election-role Role and RoleBinding. Specifically:

  • Remove Secrets RBAC marker from pkg/util/cert/cert.go, in turn removing Secrets permissions from the generated ClusterRole.
  • Add Role manager-secrets-role and RoleBinding manager-secrets-rolebinding to config/components/rbac, using system namespace to allow replacement by yaml-processor.
  • Add yaml-processor generated Helm chart templates for the above.

Which issue(s) this PR fixes:

Fixes #6968

Special notes for your reviewer:

N/A

Does this PR introduce a user-facing change?

RBAC: Restrict access to secrets for the Kueue controller manager only to secrets in the Kueue system namespace, ie
kueue-system by default, or the one specified during installation with Helm.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. labels Oct 7, 2025
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 7, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot
Copy link
Contributor

Welcome @sbgla-sas!

It looks like this is your first PR to kubernetes-sigs/kueue 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/kueue has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @sbgla-sas. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 7, 2025
@netlify
Copy link

netlify bot commented Oct 7, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit d42a80e
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/68efb2518801a60008a54345

@mimowo
Copy link
Contributor

mimowo commented Oct 7, 2025

@sbgla-sas thank you for working on the issue. Is this though a problem just for Helm? what if Kueue is installed from manifest directly? Basically we try to keep both ways of installing Kueue in sync

@sbgla-sas
Copy link
Contributor Author

Good catch and mea culpa, @mimowo.

I'd missed adding the new role manifests to the RBAC kustomization.yaml - fix inbound.

@mimowo
Copy link
Contributor

mimowo commented Oct 7, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 7, 2025
@mimowo
Copy link
Contributor

mimowo commented Oct 7, 2025

please also sign CLA

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Oct 7, 2025
@sbgla-sas
Copy link
Contributor Author

CLA signed, though I appear to have stepped directly onto a rake...

While Kueue does consume Secrets only within its own namespace, the MultiKueueCluster controller watches them cluster-wide.

One approach would be to narrow the scope at the manager cache level via cache.Options.ByObject, but as this isn't the manifest-only change I'd originally thought I think input/discussion may be useful.

@mimowo
Copy link
Contributor

mimowo commented Oct 7, 2025

In that case Im leaning to restrict secretes for multikueue to Kueue namespace only.

However, indeed the transition period might be bumpy. If we go this way we need to warn users in release notes to copy the secrets I think.

Let's see if others can contribute opinions.

@mimowo
Copy link
Contributor

mimowo commented Oct 8, 2025

However, is the MultiKueue the only issue? I see tests failing also for CI Jobs not using MultiKueue, eg pull-kueue-test-e2e-tas-main

@sbgla-sas
Copy link
Contributor Author

The MultiKueue feature gate is enabled for that job from what I can see, and the MultiKueueCluster controller seems to be present in the pod logs for node kind-worker5. The same forbidden cluster-scope Secret access error seen in other failing test runs is also present.

It looks like the cluster is being set up as a would-be MultiKueue manager cluster, though it won't be used that way, and therefore hitting the same issue?

@mimowo
Copy link
Contributor

mimowo commented Oct 8, 2025

Possible, worth checking the logs. If this is the case I think we should consider a bug and fix separately, so that we don't break users of Kueue who are not using MultiKueue, and restrict the bumpy upgrade only to the users of MultiKueue who indeed configure the clusters to use secrets.

@sbgla-sas
Copy link
Contributor Author

I think the issue may be resolvable with minimal bumpiness - notes below, most of it being intended to help surface any misunderstanding on my part.

The MultiKueueCluster controller secret watch is registered, with the cache configuration left to the manager, here:

return ctrl.NewControllerManagedBy(mgr).
For(&kueue.MultiKueueCluster{}).
Watches(&corev1.Secret{}, &secretHandler{client: c.localClient}).

The secretHandler handles all secret events to determine whether a given secret is linked to any MultiKueueClusters:

users := &kueue.MultiKueueClusterList{}
if err := s.client.List(ctx, users, client.MatchingFields{UsingKubeConfigs: strings.Join([]string{secret.Namespace, secret.Name}, "/")}); err != nil {

The clustersReconciler, however, only uses the Kueue namespace when consuming kubeconfigs from secrets:

func (c *clustersReconciler) getKubeConfigFromSecret(ctx context.Context, secretName string) ([]byte, bool, error) {
sec := corev1.Secret{}
secretObjKey := types.NamespacedName{
Namespace: c.configNamespace,
Name: secretName,
}

Additionally, the MultiKueueCluster spec explicitly notes that KubeConfig.Location refers to a secret in the Kueue namespace where KubeConfig.LocationType is LocationTypeSecret:

type KubeConfig struct {
// location of the KubeConfig.
//
// If LocationType is Secret then Location is the name of the secret inside the namespace in
// which the kueue controller manager is running. The config should be stored in the "kubeconfig" key.
Location string `json:"location"`

With the above as context I'm not sure that any secrets outwith the Kueue namespace could actually be used for MultiKueue functionality. Unless I've missed something, of course.

Interestingly enough, restricting the manager secret access to the Kueue namespace doesn't seem to break internal certificate management in the otherwise-failing tests. The rotator is added here:

// ManageCerts creates all certs for webhooks. This function is called from main.go.
func ManageCerts(mgr ctrl.Manager, cfg config.Configuration, setupFinished chan struct{}) error {
// DNSName is <service name>.<namespace>.svc
var dnsName = fmt.Sprintf("%s.%s.svc", *cfg.InternalCertManagement.WebhookServiceName, *cfg.Namespace)
// Derive webhook configuration names from the webhook service name
webhookBaseName := deriveWebhookBaseName(*cfg.InternalCertManagement.WebhookServiceName)
mutatingWebhookName := buildWebhookConfigurationName(webhookBaseName, "mutating")
validatingWebhookName := buildWebhookConfigurationName(webhookBaseName, "validating")
return cert.AddRotator(mgr, &cert.CertRotator{

Which, with a brief detour to the vendor dir, registers a new namespaced cache:

// AddRotator adds the CertRotator and ReconcileWH to the manager.
func AddRotator(mgr manager.Manager, cr *CertRotator) error {
if mgr == nil || cr == nil {
return fmt.Errorf("nil arguments")
}
ns := cr.SecretKey.Namespace
if ns == "" {
return fmt.Errorf("invalid namespace for secret")
}
cache, err := addNamespacedCache(mgr, cr, ns)
if err != nil {
return fmt.Errorf("creating namespaced cache: %w", err)
}

As I see it the main approaches here would be either:

  • Create a separate, namespaced cache for MultiKueue. This should work, but might add unnecessary complexity depending on what might be planned going forward. Any future controller that might watch single-namespace secrets alongside cluster-wide namespaced resources would have to use different caches explicitly.
  • Configure the cache at the manager level, leaving everything else as-is but restricting secrets to the Kueue namespace. This seems a little cleaner to me, as it restricts access to secrets without adding another cache to keep track of.

I'd defer to the experts here for confirmation, but think the implementation risk is low for both approaches. Internal cert management shouldn't be impacted as it uses a different cache, and MultiKueue should continue to work as expected as only what's watched (not what's acted upon) would change.

There would likely be additional benefits in terms of reduced footprint and Kubernetes API calls, though the extent would depend on the number/churn of secrets for a given cluster.

@mimowo
Copy link
Contributor

mimowo commented Oct 10, 2025

Thank you for the analysis 👍

The clustersReconciler, however, only uses the Kueue namespace when consuming kubeconfigs from secrets:
With the above as context I'm not sure that any secrets outwith the Kueue namespace could actually be used for MultiKueue functionality. Unless I've missed something, of course.

Yes, this is great news! We only support secrets in the kueue-system namespace, and it really simplifies things.

It reminded me of the past context: we experimented with broader scope of the secrets so that we could have multiple MultiKueue instances on a single cluster for the purpose of writing e2e tests easier (without creating many cluster). Later we decided to restrict to kueue-system namespace for simplicity, but maybe some code is still left behind supporting secrets in multiple namepsaces.

Create a separate, namespaced cache for MultiKueue. This should work, but might add unnecessary complexity depending on what might be planned going forward. Any future controller that might watch single-namespace secrets alongside cluster-wide namespaced resources would have to use different caches explicitly.
Configure the cache at the manager level, leaving everything else as-is but restricting secrets to the Kueue namespace. This seems a little cleaner to me, as it restricts access to secrets without adding another cache to keep track of.

I'm wondering if, instead of maintaining our cache we could rely on the cache built-in inside controller-runtime, just make sure all watches are scoped to the kueue namespace?

If we need to have our own cache, then both options sound reasonable. I don't have a preference TBH, maybe slightly towards (1.) assuming YAGNI, but that is minimal.

@sbgla-sas
Copy link
Contributor Author

I'm wondering if, instead of maintaining our cache we could rely on the cache built-in inside controller-runtime, just make sure all watches are scoped to the kueue namespace?

Both approaches would use a controller-runtime cache, with the difference being where it's created and what (directly) interacts with it.

That being said, I think the second option better aligns with the above. It would scope all secret watches using the shared, manager-provided client/cache to the Kueue namespace while allowing the manager to deal with starting caches, waiting for sync, etc.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 15, 2025
@mimowo
Copy link
Contributor

mimowo commented Oct 15, 2025

Feel free to propose a solution

@sbgla-sas
Copy link
Contributor Author

The failing checks seem to have been a kind, rather than Kueue, issue.

/retest

}
}

func addCacheByObjectTo(o *ctrl.Options, cfg *configapi.Configuration) {
Copy link
Contributor

@mimowo mimowo Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What there is a pre-existing cluster with that configuration? IIUC we should also update kueue/apis/config/v1beta1/defaults.go to make sure it would default on that clusters without manual intervention?

Ah, it seems this would also apply "by default", but seems like obsolete mechanism. IIUC we should migrate to using defaults.go, but maybe this place is not migrated all. Please investigate moving the defautling there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving manager cache options to the Kueue defaulter would (I think?) require a Kueue configuration API change to expose all/some of them. There would also still need to be a similar translation between types to provide default/derived values to the controller-runtime manager.

In this case the delegated cache is fully determined by the Kueue namespace, which is covered by the existing defaulter; are there additional cache options you'd like to see exposed more directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, no I don't think we need to expose this to users. Let me double check what is the difference here.

@mimowo
Copy link
Contributor

mimowo commented Oct 16, 2025

I think this is worth a release note, even though this is a cleanup:

/release-note-edit

Restrict access to secrets for the Kueue controller manager only to secrets in the Kueue system namespace, ie
kueue-system by default, or the one specified during installation with Helm. 

/remove-kind feature
/kind cleanup

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed kind/feature Categorizes issue or PR as related to a new feature. release-note-none Denotes a PR that doesn't merit a release note. labels Oct 16, 2025
@mimowo
Copy link
Contributor

mimowo commented Oct 16, 2025

Thank you 👍 Nice to see strengthening of permissions.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 16, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: ea6b99e0f946628828eeba01ed191c2589786bfb

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, sbgla-sas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mimowo
Copy link
Contributor

mimowo commented Nov 28, 2025

/release-note-edit

RBAC: Restrict access to secrets for the Kueue controller manager only to secrets in the Kueue system namespace, ie
kueue-system by default, or the one specified during installation with Helm. 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Namespaced Secret access

3 participants