-
Notifications
You must be signed in to change notification settings - Fork 335
Use azcopy for uploading large artifacts #1679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Testing with https://github.com/microsoft/vcpkg/pull/45461/files. Would azcopy upload 6GB in 3.7 min? Taking a heavy reference from nightly CI (curl): So curl uploads can take much longer. (But it is a different workload.) FTR download works (curl): Would curl download 6GB in 1.3 min? IDK. |
|
Verified that So with just curl, uploading |
|
It seems that First is curl 3 GB, second is azcopy 6 GB. (Assuming tools are selected as intented.) OTOH the remote may have had a weak moment: |
|
It might be necessary to look at azopy environment variables. There are knobs for tuning concurrency and memory usage: vcpkg would transfer one file per call. It is not clear to me if it benefits from the concurrency capabilities. But there might be parallel transfers of blocks. Would vcpkg run multiple instances of azcopy at the same time? This doesn't seem to be the pattern preferred by azcopy. |
|
I wish this receives review before I I trigger another 1024 rebuilds of llvm and friends. |
vicroms
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AugP @BillyONeal @JavierMatosD @ras0219-msft and myself discussed this PR today.
Overall, we prefer this approach using azcopy and have the following requested change:
azcopyneeds to be added tovcpkg-tools.jsonso it can be acquired automatically; we should also investigate what dependencies it requires on Linux.
Not a request for this PR, but as a follow up, we would like to see a new provider that always uses azcopy for its increased efficiency. Said provider can also take advantage of the alternate authentication mechanisms (Microsoft Entra ID for example, via az login or env vars). Then we can use that in our own CI instead.
BTW this was developed and tested on Ubuntu 22.04. |
|
We should move to test this on the registry, trial by fire. |
|
Given the testing in microsoft/vcpkg#45461, I think we are ready to move on with this PR. We can leave the provisioning of Azurite in CI machines to a follow up PR that also adds an |
| const SanitizedUrl& sanitized_url, | ||
| const Path& file) | ||
| { | ||
| auto azcopy_cmd = Command{"azcopy"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: We can't call fetch_tool naïvely here to get azcopy because this is happening on the background / binary cache submission thread.
@ras0219-msft points out that a potential way to reduce complexity here would be adding an explicit azcopy provider which moves the tool fetch up front like we do for the other binary caching backends e.g. awscli.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC I'm asked to make changes now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vicroms
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We believe this is OK as-is, the fact that we cannot prefetch azcopy reinforces the necessity for an azcopy exclusive provider, but that is its own follow-up task.
I can look at such a provide ("x-azcopy") soon. I just couldn't do anything when it wasn't clear if this would be merged or not. |
Alternative to #1658.
Curl is still used for artifacts below the max single write limit (5 GB) and for all downloads. Fetching and uploading azcopy via vcpkg-tools.json (microsoft/vcpkg@c178e54) doesn't need azcopy.