Skip to content

Conversation

@winglian
Copy link
Collaborator

@winglian winglian commented Aug 6, 2025

What does this PR do?

This is a fix for how Trainer by default resets the PartialState shared state, which means we can't read the ParallelismConfig from PartialState anymore. Since we can infer the ParallelismConfig from the env vars like we do all the other ACCELERATE_* options (e.g. FSDP, Deepspeed,etc), we can repopulate this inside the constructor.

Addtionally, there is a fix to the indent level of setting the fsdp_plugin, as if it the conditional it currently is in would bypass setting self.fsdp_plugin if the env var was set to allow standalone CP, which is independent of FSDP.

We'll need to add some fixes down the line in Trainer, but this will get us to a working release.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@winglian winglian requested a review from S1ro1 August 6, 2025 03:17
@winglian winglian changed the title Parallel config env Set parallelism_config from env due to Trainer reset Aug 6, 2025
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@winglian winglian changed the title Set parallelism_config from env due to Trainer reset Set parallelism_config in constructor due to Trainer reset of State Aug 6, 2025
@S1ro1 S1ro1 merged commit 24c8157 into huggingface:main Aug 6, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants