-
-
Notifications
You must be signed in to change notification settings - Fork 98
Connect to remote SSH nodes using SSH CLI #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connect to remote SSH nodes using SSH CLI #152
Conversation
|
Hola! Took a quick stab at this as it was a bit painful for me having to unlock 1Password every single time when Tailscale manages it transparently and keyless 😅 I didn't remove the current SSH connector, but I replaced it with SSHCLI one for the CLI to use it. This is a very naive implementation. Blindly stole the logic from Docker's dial stdio implementation, but is not exactly the same as I didn't fully understand the code (I'm not versed in all the dark incantations of Go) 😬 Since there was no SSH test, I wasn't sure if we should be testing it or not, so I added a kinda pointless test (actually was Claude Code who did it, but I take responsibility for that) 😁 I've tested it against my fleet of machines connected to my tailnet and my finger is much better that I don't have to unlock 1Password on every deployment 🤣 Let me know what do you think! Cheers |
psviderski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on the new SSH connector! 👏
Looks much simpler than what I expected to see. While it's on the right track, please see a few comments and questions we need to resolve before we can merge it.
|
Hello @psviderski, thank you for your patience and looking into this! In the case of Tailscale, it does support both port forwarding and IO forwarding transiently, so the Dialer should work.
Ideally the connector should replace all the flows, including that to avoid discrepancies.
That is a great idea, just for testing I did I went ahead and completed the Dialer so Adding responses do your other comments now 😊 Important note: I run out of brain for some stuff so decided to use Claude Code to assist in some cleanup and address some of the feedback. Note that it was heavily guided as the generated code was not verbatim as it generated it. Anyhow, wanted to disclose that as I'm not sure what would be your position towards AI-assisted contributions. Thank you. |
psviderski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Dialer looks good to me! 👍 Great work! ❤️
So it looks like we need to address half-closing in dial-stdio and we can merge this?
I think the Executor could be addressed in the next one. What do you think?
Replace Go-native SSH implementation with SSH CLI execution to support diverse SSH configurations and agents. Implements 'uncloudd dial-stdio' subcommand that proxies gRPC connections over stdin/stdout, similar to Docker's approach. This change addresses compatibility issues with: - SSH agents exposing many keys (1Password, causing "too many authentication failures") - Tailscale SSH (which doesn't support advanced SSH channel types like direct-streamlocal) - Custom SSH configurations in ~/.ssh/config The dial-stdio approach reduces SSH feature requirements by streaming the unix socket connection over stdin/stdout instead of using SSH channel forwarding. Changes: - Add 'uncloudd dial-stdio' hidden subcommand for socket proxy - Add SSHCLIConnector using ssh command + dial-stdio - Update connection logic to use SSH CLI connector - Maintain backward compatibility with SSHKeyFile config Resolves psviderski#131
Introduced short connection timeout on the first change but forgot to update tests to match.
- Add SSHCLI field to support ssh_cli YAML configuration - Update String() to use URI-like format (ssh://, ssh+cli://, tcp://)
- Add Validate() to ensure connection methods are mutually exclusive - Add tests for validation and String() method
Revert provisionOrConnectRemoteMachine to use Go SSH connector: - Root users: reuse SSH connection from provisioning - Non-root users: establish new connection for group membership - Remove SSH CLI as default connector SSH CLI connector remains available via ssh_cli config field.
Implement proxy.ContextDialer for SSHCLIConnector using SSH -W flag. Each dial spawns a new SSH process for TCP forwarding, enabling independent connections separate from the gRPC dial-stdio connection.
Return sshCLIDialer instead of error, enabling uc image push functionality with SSHCLIConnector. Validates connector is configured before returning dialer.
This way I can skip the configuration file while testing things out,
and confirm it works correctly:
$ unset SSH_AUTH_SOCK
$ ./uncloud --connect ssh://provision@blatta11 machine ls
Error: connect to cluster: connect to machine: SSH login to
provision@blatta11:22: connect using SSH agent: connect to SSH
agent: dial unix: missing address
$ ./uncloud --connect ssh+cli://provision@blatta11 machine ls
NAME STATE ADDRESS PUBLIC IP WIREGUARD ENDPOINTS MACHINE ID
blatta11 Up 10.210.0.1/24 - 100.64.0.22:51820, ...
No longer attempt to validate the connection when instantiating a new client. Later on we could validate it in different places.
There were some serious slop in those tests, so took the time to clean them up and kept only the relevant ones. There is some repetition between buildSSHArgs and buildDialArgs but can be tackled at a later stage.
f25b320 to
1129a43
Compare
|
Hello @psviderski, thank you for your patience. Finally I was able to take a look to the feedback and address it.
re: sshexec, interestingly, I didn't change anything and it works out of the box with Tailscale SSH (no need for SSH keys or anything) but then either Note that I would love to tackle the sshexec/Executor aspect in another PR, for the time being, I can rely on 1Password SSH Agent socket and the connection for the time being. Apologies for the force push, I wanted to ensure I was testing against the latest changes from Thank you in advance for looking into this and I hope this looks much better now 😊 Cheers, |
Prefix connection with ssh ssh+cli respectively.
| } | ||
| case err = <-socket2stdout: | ||
| // return immediately, matching Docker's approach | ||
| // (stdin is never closed when TTY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this comment mean that _, err := io.Copy(conn, stdin) may never finish if stdin is a TTY, hence err = <-stdin2socket never succeeds?
| case err = <-stdin2socket: | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand the intended behaviour correctly?
If we finished reading from stdin and there is an error, return it and complete the dial-stdio command.
If there is no error, complete the dial-stdio command, potentially not finishing reading everything from socket to stdout. Note that cases in the select statements don't fallback to the next one.
Ah, I checked the docker implementation, and it indeed contains the missing logic to wait for another goroutine before returning in this case:
https://github.com/docker/cli/blob/7414d73fc119a6b97ff240b7c7df91f1f642e323/cli/command/system/dial_stdio.go#L66-L67
psviderski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work! 👏 👏 👏
I left a comment about a tiny edge cases we need to address in dial-stdio and we can merge this one 💪
re: sshexec, interestingly, I didn't change anything and it works out of the box with Tailscale SSH (no need for SSH keys or anything) but then either init or add fail as it tries to use the SSH connector to initialize the node:
Hm, this is indeed interesting as sshexec.Connect is what is directly used in the SSHConnector that doesn't seem to work:
uncloud/pkg/client/connector/ssh.go
Line 49 in 2969b40
| c.client, err = sshexec.Connect(c.config.User, c.config.Host, c.config.Port, c.config.KeyPath) |
As you said, tackling sshexec/Executor aspect in another PR would make a lot of sense.
Thanks again for a lot of effort you put into this work! ❤️
Missed copy & pasta from Docker dial-stdio implementation (this happens when you stare at the code for too long that it burns your eyes).
|
Hello @psviderski, addressed the comment about stdio, I completely missed it! 🤦 Ready for last review! I've already worked a bit on a SSH CLI connector, but I will explain that more in detail in a separate PR. Thank you! |
Replace Go-native SSH implementation with SSH CLI execution to support diverse SSH configurations and agents. Implements 'uncloudd dial-stdio' subcommand that proxies gRPC connections over stdin/stdout, similar to Docker's approach.
This change addresses compatibility issues with:
The dial-stdio approach reduces SSH feature requirements by streaming the unix socket connection over stdin/stdout instead of using SSH channel forwarding.
Changes:
Resolves #131