-
Notifications
You must be signed in to change notification settings - Fork 479
Restrict controller-manager Secrets access to Kueue namespace #7188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restrict controller-manager Secrets access to Kueue namespace #7188
Conversation
|
Welcome @sbgla-sas! |
|
Hi @sbgla-sas. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
|
@sbgla-sas thank you for working on the issue. Is this though a problem just for Helm? what if Kueue is installed from manifest directly? Basically we try to keep both ways of installing Kueue in sync |
|
Good catch and mea culpa, @mimowo. I'd missed adding the new role manifests to the RBAC |
|
/ok-to-test |
|
please also sign CLA |
|
CLA signed, though I appear to have stepped directly onto a rake... While Kueue does consume Secrets only within its own namespace, the One approach would be to narrow the scope at the manager cache level via |
|
In that case Im leaning to restrict secretes for multikueue to Kueue namespace only. However, indeed the transition period might be bumpy. If we go this way we need to warn users in release notes to copy the secrets I think. Let's see if others can contribute opinions. |
|
However, is the MultiKueue the only issue? I see tests failing also for CI Jobs not using MultiKueue, eg pull-kueue-test-e2e-tas-main |
|
The MultiKueue feature gate is enabled for that job from what I can see, and the It looks like the cluster is being set up as a would-be MultiKueue manager cluster, though it won't be used that way, and therefore hitting the same issue? |
|
Possible, worth checking the logs. If this is the case I think we should consider a bug and fix separately, so that we don't break users of Kueue who are not using MultiKueue, and restrict the bumpy upgrade only to the users of MultiKueue who indeed configure the clusters to use secrets. |
|
I think the issue may be resolvable with minimal bumpiness - notes below, most of it being intended to help surface any misunderstanding on my part. The MultiKueueCluster controller secret watch is registered, with the cache configuration left to the manager, here: kueue/pkg/controller/admissionchecks/multikueue/multikueuecluster.go Lines 599 to 601 in 63a5681
The kueue/pkg/controller/admissionchecks/multikueue/multikueuecluster.go Lines 659 to 660 in 63a5681
The kueue/pkg/controller/admissionchecks/multikueue/multikueuecluster.go Lines 437 to 442 in 63a5681
Additionally, the MultiKueueCluster spec explicitly notes that kueue/apis/kueue/v1beta1/multikueue_types.go Lines 47 to 52 in 63a5681
With the above as context I'm not sure that any secrets outwith the Kueue namespace could actually be used for MultiKueue functionality. Unless I've missed something, of course. Interestingly enough, restricting the manager secret access to the Kueue namespace doesn't seem to break internal certificate management in the otherwise-failing tests. The rotator is added here: Lines 41 to 51 in 63a5681
Which, with a brief detour to the kueue/vendor/github.com/open-policy-agent/cert-controller/pkg/rotator/rotator.go Lines 112 to 124 in 63a5681
As I see it the main approaches here would be either:
I'd defer to the experts here for confirmation, but think the implementation risk is low for both approaches. Internal cert management shouldn't be impacted as it uses a different cache, and MultiKueue should continue to work as expected as only what's watched (not what's acted upon) would change. There would likely be additional benefits in terms of reduced footprint and Kubernetes API calls, though the extent would depend on the number/churn of secrets for a given cluster. |
|
Thank you for the analysis 👍
Yes, this is great news! We only support secrets in the kueue-system namespace, and it really simplifies things. It reminded me of the past context: we experimented with broader scope of the secrets so that we could have multiple MultiKueue instances on a single cluster for the purpose of writing e2e tests easier (without creating many cluster). Later we decided to restrict to kueue-system namespace for simplicity, but maybe some code is still left behind supporting secrets in multiple namepsaces.
I'm wondering if, instead of maintaining our cache we could rely on the cache built-in inside controller-runtime, just make sure all watches are scoped to the kueue namespace? If we need to have our own cache, then both options sound reasonable. I don't have a preference TBH, maybe slightly towards (1.) assuming YAGNI, but that is minimal. |
Both approaches would use a controller-runtime cache, with the difference being where it's created and what (directly) interacts with it. That being said, I think the second option better aligns with the above. It would scope all secret watches using the shared, manager-provided client/cache to the Kueue namespace while allowing the manager to deal with starting caches, waiting for sync, etc. |
|
Feel free to propose a solution |
|
The failing checks seem to have been a kind, rather than Kueue, issue. /retest |
| } | ||
| } | ||
|
|
||
| func addCacheByObjectTo(o *ctrl.Options, cfg *configapi.Configuration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What there is a pre-existing cluster with that configuration? IIUC we should also update kueue/apis/config/v1beta1/defaults.go to make sure it would default on that clusters without manual intervention?
Ah, it seems this would also apply "by default", but seems like obsolete mechanism. IIUC we should migrate to using defaults.go, but maybe this place is not migrated all. Please investigate moving the defautling there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving manager cache options to the Kueue defaulter would (I think?) require a Kueue configuration API change to expose all/some of them. There would also still need to be a similar translation between types to provide default/derived values to the controller-runtime manager.
In this case the delegated cache is fully determined by the Kueue namespace, which is covered by the existing defaulter; are there additional cache options you'd like to see exposed more directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, no I don't think we need to expose this to users. Let me double check what is the difference here.
|
I think this is worth a release note, even though this is a cleanup: /release-note-edit /remove-kind feature |
|
Thank you 👍 Nice to see strengthening of permissions. /lgtm |
|
LGTM label has been added. Git tree hash: ea6b99e0f946628828eeba01ed191c2589786bfb
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mimowo, sbgla-sas The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/release-note-edit |
What type of PR is this?
/kind feature
What this PR does / why we need it:
The current Kueue Helm chart grants cluster-wide access to Secrets. Kueue, however, interacts only with Secret resources in its own namespace to:
This PR aligns Kueue controller manager RBAC with the principle of least privilege, restricting access to Secrets in the same namespace as the Kueue controller manager.
A similar approach is taken to the existing
leader-election-roleRole and RoleBinding. Specifically:pkg/util/cert/cert.go, in turn removing Secrets permissions from the generated ClusterRole.manager-secrets-roleand RoleBindingmanager-secrets-rolebindingtoconfig/components/rbac, usingsystemnamespace to allow replacement byyaml-processor.yaml-processorgenerated Helm chart templates for the above.Which issue(s) this PR fixes:
Fixes #6968
Special notes for your reviewer:
N/A
Does this PR introduce a user-facing change?