Skip to content

Conversation

@IlyasMoutawwakil
Copy link
Member

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a nit

@IlyasMoutawwakil IlyasMoutawwakil changed the title Use the state's mixed precision which has undergone all processing better handle FP8 with and without deepspeed Jun 9, 2025
"Tried to train with `fp8` and auto-detect backend, but no FP8-compatible backend was installed. "
"Valid backends are: `torchao`, `transformer-engine`, and `msamp`."
)
self.has_fp8_handler = True
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk how it worked before but this was missing 😅

Copy link
Member Author

@IlyasMoutawwakil IlyasMoutawwakil Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if has_fp8_handler stays false, the property fp8_backend becomes erroneous as it can only return MSAMP or None.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah indeed ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I add some test for different expected values of fp8_backend ?

Copy link
Contributor

@S1ro1 S1ro1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too sure of the DeepSpeed related changes, rest looks fine. Thanks! Should probably wait for @SunMarc as well

@IlyasMoutawwakil
Copy link
Member Author

Not too sure of the DeepSpeed related changes, rest looks fine.

looking at the structure of the accelerator state, I guess moving its mixed precision logic there makes more sense and avoids reloading the env var (see fsdp logic which does the same).

Initial deepspeed tests are passing, I'm currently running the slow ones.

@S1ro1
Copy link
Contributor

S1ro1 commented Jun 9, 2025

Not too sure of the DeepSpeed related changes, rest looks fine.

looking at the structure of the accelerator state, I guess moving its mixed precision logic there makes more sense and avoids reloading the env var (see fsdp logic which does the same).

Initial deepspeed tests are passing, I'm currently running the slow ones.

Yes, I think we want to slowly move into having logic local to the plugins. This centralised handling caused many issues i.e. with FSDP2 (see #3585 which did the same basically), it introduces a bit of copy-paste, but gives us a lot more freedom to make stuff work properly. Just noticed that Marc already reviewed, so just lmk when it's ready to merge.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this better ! Just a nit

"Tried to train with `fp8` and auto-detect backend, but no FP8-compatible backend was installed. "
"Valid backends are: `torchao`, `transformer-engine`, and `msamp`."
)
self.has_fp8_handler = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah indeed ;)

@SunMarc SunMarc merged commit b9fee48 into main Jun 10, 2025
28 of 29 checks passed
@SunMarc SunMarc deleted the mixed-precision-from-env branch June 10, 2025 12:24
S1ro1 added a commit that referenced this pull request Jun 10, 2025
commit 2f8fd72
Author: Simon <[email protected]>
Date:   Tue Jun 10 13:50:34 2025 +0100

    Remove device_count (#3587)

commit d2e6b03
Author: Matej Sirovatka <[email protected]>
Date:   Tue Jun 10 05:26:48 2025 -0700

    [FSDP2] Refactor + FP8 (#3585)

    * Fix double wrap

    * Clocking off, ~equal to torch baseline

    * works?

    * Working version

    * Partial rewrite

    * FSDP2 path works

    * Fix back prepare

    * Almost done, proper AC left

    * Feat: should work, cleanup + test more benchmarks left

    * Style+quality

    * Feat: fp8 example

    * Feat: better example

    * Feat: add readme

    * Docs + should be done

    * Fix: typos

    * Fix: protect imports

    * Feat: address comments

    * Feat: add flops image

commit b9fee48
Author: Ilyas Moutawwakil <[email protected]>
Date:   Tue Jun 10 13:24:43 2025 +0100

    better handle FP8 with and without deepspeed (#3611)

    * use the state mixed precision which has undergone all preprocessing

    * Update src/accelerate/accelerator.py

    Co-authored-by: Marc Sun <[email protected]>

    * Update src/accelerate/accelerator.py

    * accelerator state sets the mixed precision for deepspeed and fp8_enabled

    * fix

    * fix

    ---------

    Co-authored-by: Marc Sun <[email protected]>

commit 3a82b05
Author: Marc Sun <[email protected]>
Date:   Tue Jun 10 11:29:59 2025 +0200

    Fix bf16 training with TP  (#3610)

    * fix

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 6b61a37
Author: Ilyas Moutawwakil <[email protected]>
Date:   Fri Jun 6 13:48:43 2025 +0100

    fix deepspeed regional compilation (#3609)

commit 682691d
Author: Ilyas Moutawwakil <[email protected]>
Date:   Tue Jun 3 12:36:56 2025 +0200

    Update Gaudi Runners (#3593)

    * test

    * fix

    * push

    * in the morning

    * fix backend

    * run first

    * set habana modules

    * dynamo backend

    * trigger

    * remove on pr

    * remove on file change

commit 791055b
Author: Matej Sirovatka <[email protected]>
Date:   Tue Jun 3 12:24:20 2025 +0200

    Fix: list object has no attribute keys (#3603)

commit 16bf1d8
Author: Yao Matrix <[email protected]>
Date:   Fri May 30 23:36:34 2025 +0800

    enable torchao and pippy test cases on XPU (#3599)

    * enable torchao and pippy test cases on XPU

    Signed-off-by: Matrix YAO <[email protected]>

    * fix style

    Signed-off-by: Matrix YAO <[email protected]>

    ---------

    Signed-off-by: Matrix YAO <[email protected]>

commit ab3c604
Author: Yao Matrix <[email protected]>
Date:   Fri May 30 23:23:26 2025 +0800

    enable big_model_inference on xpu (#3595)

    * enable big_model_inference on XPU

    Signed-off-by: Matrix YAO <[email protected]>

    * fix style

    Signed-off-by: Matrix YAO <[email protected]>

    * fix quality

    Signed-off-by: Matrix YAO <[email protected]>

    ---------

    Signed-off-by: Matrix YAO <[email protected]>

commit 273799c
Author: Yao Matrix <[email protected]>
Date:   Tue May 27 20:08:59 2025 +0800

    enable fsdp2 benchmark on XPU (#3590)

    * enable fsdp2 benchmark on XPU

    Signed-off-by: Matrix YAO <[email protected]>

    * add deterministic

    Signed-off-by: Matrix YAO <[email protected]>

    ---------

    Signed-off-by: Matrix YAO <[email protected]>

commit 43526c5
Author: Yao Matrix <[email protected]>
Date:   Tue May 27 17:44:50 2025 +0800

    add device-agnostic GradScaler (#3588)

    * add device-agnostic GradScaler

    Signed-off-by: Matrix YAO <[email protected]>

    * fix bug

    Signed-off-by: Matrix YAO <[email protected]>

    * fix review comments

    Signed-off-by: Matrix YAO <[email protected]>

    * fix

    Signed-off-by: Matrix YAO <[email protected]>

    * format

    Signed-off-by: Matrix YAO <[email protected]>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <[email protected]>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 07f2392
Author: Yao Matrix <[email protected]>
Date:   Tue May 27 17:17:18 2025 +0800

    change to use torch.device (#3594)

    Signed-off-by: Matrix YAO <[email protected]>

commit ee2f48c
Author: Fanli Lin <[email protected]>
Date:   Tue May 27 17:16:42 2025 +0800

    [docs] no hard-coded cuda in the ddp documentation (#3589)

    * make device-agnostic

    * refactor

commit 4f3abb7
Author: jiqing-feng <[email protected]>
Date:   Mon May 26 21:55:10 2025 +0800

    Set ccl and KMP param in simple launch (#3575)

    * Even 1 CPU mechine can also run multi process

    Signed-off-by: jiqing-feng <[email protected]>

    * fix ccl and kml param setting

    Signed-off-by: jiqing-feng <[email protected]>

    * set master addr only when processes > 1

    Signed-off-by: jiqing-feng <[email protected]>

    * fix num process check

    Signed-off-by: jiqing-feng <[email protected]>

    * fix ccl args check

    Signed-off-by: jiqing-feng <[email protected]>

    ---------

    Signed-off-by: jiqing-feng <[email protected]>

commit db536cb
Author: Yuanzhou Cai <[email protected]>
Date:   Mon May 26 21:08:13 2025 +0800

    Fix: Defer Tracker Initialization to Prevent Premature Distributed Setup (#3581)

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Add test for bug #3550

    * Improve test for #3550

    * Remove redundant code

    Co-authored-by: Marc Sun <[email protected]>

    * fix style

    ---------

    Co-authored-by: Marc Sun <[email protected]>

commit 4e9d0de
Author: Yao Matrix <[email protected]>
Date:   Mon May 26 21:05:42 2025 +0800

    enable regional_compilation benchmark on xpu (#3592)

    * enable regional_compilation benchmark on xpu

    Signed-off-by: Matrix YAO <[email protected]>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <[email protected]>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 8cb3ace
Author: Luiz F. G. dos Santos <[email protected]>
Date:   Thu May 22 10:21:54 2025 -0500

    Add kwargs to optimizer, scheduler and dataloader using function `accelerator().load_state()` (#3540)

    * Added artifacts and figure tracking at MLFlow tracker

    * Added `log_artifact` to the MLFlowTracker

    * Remove changes

    * Added kwargs when loading state.

    * added doc string

    * Adjusted correct default types of kwargs

    * Changed the load kwargs to a single one

    * removed None value from kwargs

    * fix kwargs for loading the model

    * removed load_kwargs from optimizer state dict

    * make load_kwargs a dictionary

    * revert last changes

    * reverted load_kwargs

    * fix docstring

    * added dict initiation

    * Fix quality error during PR

commit b6d97cb
Author: Emmanuel Ferdman <[email protected]>
Date:   Thu May 22 17:26:31 2025 +0300

    Resolve logger warnings (#3582)

    Signed-off-by: Emmanuel Ferdman <[email protected]>

commit 33967d4
Author: Francesco Laiti <[email protected]>
Date:   Tue May 20 12:29:53 2025 +0200

    Add support for standalone mode when default port is occupied on single node (#3576)

    * add standalone mode and replace ConnectionError with a warning when the main process port is in use, allowing for automatic port selection

    * address review feedback: warn on port conflict only for single-node; raise error for multi-node

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 5b1fcda
Author: Yao Matrix <[email protected]>
Date:   Tue May 20 18:04:24 2025 +0800

    enable test_cli & test_example cases on XPU (#3578)

    * enable test_cli & test_example cases on XPU

    Signed-off-by: Matrix Yao <[email protected]>

    * fix style

    Signed-off-by: Matrix Yao <[email protected]>

    * fix style

    Signed-off-by: Matrix Yao <[email protected]>

    * remove print

    Signed-off-by: Matrix Yao <[email protected]>

    * fix ci issue

    Signed-off-by: YAO Matrix <[email protected]>

    ---------

    Signed-off-by: Matrix Yao <[email protected]>
    Signed-off-by: YAO Matrix <[email protected]>

commit f55f053
Author: Yao Matrix <[email protected]>
Date:   Tue May 20 18:02:14 2025 +0800

    goodbye torch_ccl (#3580)

    Signed-off-by: Matrix Yao <[email protected]>

commit 1ec99f0
Author: Yao Matrix <[email protected]>
Date:   Mon May 19 17:27:40 2025 +0800

    enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU (#3579)

    * enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU

    Signed-off-by: Matrix Yao <[email protected]>

    * fix style

    Signed-off-by: Matrix Yao <[email protected]>

    * Update test_load_checkpoint_and_dispatch_with_broadcast.py

    ---------

    Signed-off-by: Matrix Yao <[email protected]>
S1ro1 added a commit that referenced this pull request Jun 10, 2025
commit 2f8fd72
Author: Simon <[email protected]>
Date:   Tue Jun 10 13:50:34 2025 +0100

    Remove device_count (#3587)

commit d2e6b03
Author: Matej Sirovatka <[email protected]>
Date:   Tue Jun 10 05:26:48 2025 -0700

    [FSDP2] Refactor + FP8 (#3585)

    * Fix double wrap

    * Clocking off, ~equal to torch baseline

    * works?

    * Working version

    * Partial rewrite

    * FSDP2 path works

    * Fix back prepare

    * Almost done, proper AC left

    * Feat: should work, cleanup + test more benchmarks left

    * Style+quality

    * Feat: fp8 example

    * Feat: better example

    * Feat: add readme

    * Docs + should be done

    * Fix: typos

    * Fix: protect imports

    * Feat: address comments

    * Feat: add flops image

commit b9fee48
Author: Ilyas Moutawwakil <[email protected]>
Date:   Tue Jun 10 13:24:43 2025 +0100

    better handle FP8 with and without deepspeed (#3611)

    * use the state mixed precision which has undergone all preprocessing

    * Update src/accelerate/accelerator.py

    Co-authored-by: Marc Sun <[email protected]>

    * Update src/accelerate/accelerator.py

    * accelerator state sets the mixed precision for deepspeed and fp8_enabled

    * fix

    * fix

    ---------

    Co-authored-by: Marc Sun <[email protected]>

commit 3a82b05
Author: Marc Sun <[email protected]>
Date:   Tue Jun 10 11:29:59 2025 +0200

    Fix bf16 training with TP  (#3610)

    * fix

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 6b61a37
Author: Ilyas Moutawwakil <[email protected]>
Date:   Fri Jun 6 13:48:43 2025 +0100

    fix deepspeed regional compilation (#3609)

commit 682691d
Author: Ilyas Moutawwakil <[email protected]>
Date:   Tue Jun 3 12:36:56 2025 +0200

    Update Gaudi Runners (#3593)

    * test

    * fix

    * push

    * in the morning

    * fix backend

    * run first

    * set habana modules

    * dynamo backend

    * trigger

    * remove on pr

    * remove on file change

commit 791055b
Author: Matej Sirovatka <[email protected]>
Date:   Tue Jun 3 12:24:20 2025 +0200

    Fix: list object has no attribute keys (#3603)

commit 16bf1d8
Author: Yao Matrix <[email protected]>
Date:   Fri May 30 23:36:34 2025 +0800

    enable torchao and pippy test cases on XPU (#3599)

    * enable torchao and pippy test cases on XPU

    Signed-off-by: Matrix YAO <[email protected]>

    * fix style

    Signed-off-by: Matrix YAO <[email protected]>

    ---------

    Signed-off-by: Matrix YAO <[email protected]>

commit ab3c604
Author: Yao Matrix <[email protected]>
Date:   Fri May 30 23:23:26 2025 +0800

    enable big_model_inference on xpu (#3595)

    * enable big_model_inference on XPU

    Signed-off-by: Matrix YAO <[email protected]>

    * fix style

    Signed-off-by: Matrix YAO <[email protected]>

    * fix quality

    Signed-off-by: Matrix YAO <[email protected]>

    ---------

    Signed-off-by: Matrix YAO <[email protected]>

commit 273799c
Author: Yao Matrix <[email protected]>
Date:   Tue May 27 20:08:59 2025 +0800

    enable fsdp2 benchmark on XPU (#3590)

    * enable fsdp2 benchmark on XPU

    Signed-off-by: Matrix YAO <[email protected]>

    * add deterministic

    Signed-off-by: Matrix YAO <[email protected]>

    ---------

    Signed-off-by: Matrix YAO <[email protected]>

commit 43526c5
Author: Yao Matrix <[email protected]>
Date:   Tue May 27 17:44:50 2025 +0800

    add device-agnostic GradScaler (#3588)

    * add device-agnostic GradScaler

    Signed-off-by: Matrix YAO <[email protected]>

    * fix bug

    Signed-off-by: Matrix YAO <[email protected]>

    * fix review comments

    Signed-off-by: Matrix YAO <[email protected]>

    * fix

    Signed-off-by: Matrix YAO <[email protected]>

    * format

    Signed-off-by: Matrix YAO <[email protected]>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <[email protected]>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 07f2392
Author: Yao Matrix <[email protected]>
Date:   Tue May 27 17:17:18 2025 +0800

    change to use torch.device (#3594)

    Signed-off-by: Matrix YAO <[email protected]>

commit ee2f48c
Author: Fanli Lin <[email protected]>
Date:   Tue May 27 17:16:42 2025 +0800

    [docs] no hard-coded cuda in the ddp documentation (#3589)

    * make device-agnostic

    * refactor

commit 4f3abb7
Author: jiqing-feng <[email protected]>
Date:   Mon May 26 21:55:10 2025 +0800

    Set ccl and KMP param in simple launch (#3575)

    * Even 1 CPU mechine can also run multi process

    Signed-off-by: jiqing-feng <[email protected]>

    * fix ccl and kml param setting

    Signed-off-by: jiqing-feng <[email protected]>

    * set master addr only when processes > 1

    Signed-off-by: jiqing-feng <[email protected]>

    * fix num process check

    Signed-off-by: jiqing-feng <[email protected]>

    * fix ccl args check

    Signed-off-by: jiqing-feng <[email protected]>

    ---------

    Signed-off-by: jiqing-feng <[email protected]>

commit db536cb
Author: Yuanzhou Cai <[email protected]>
Date:   Mon May 26 21:08:13 2025 +0800

    Fix: Defer Tracker Initialization to Prevent Premature Distributed Setup (#3581)

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Add test for bug #3550

    * Improve test for #3550

    * Remove redundant code

    Co-authored-by: Marc Sun <[email protected]>

    * fix style

    ---------

    Co-authored-by: Marc Sun <[email protected]>

commit 4e9d0de
Author: Yao Matrix <[email protected]>
Date:   Mon May 26 21:05:42 2025 +0800

    enable regional_compilation benchmark on xpu (#3592)

    * enable regional_compilation benchmark on xpu

    Signed-off-by: Matrix YAO <[email protected]>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <[email protected]>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 8cb3ace
Author: Luiz F. G. dos Santos <[email protected]>
Date:   Thu May 22 10:21:54 2025 -0500

    Add kwargs to optimizer, scheduler and dataloader using function `accelerator().load_state()` (#3540)

    * Added artifacts and figure tracking at MLFlow tracker

    * Added `log_artifact` to the MLFlowTracker

    * Remove changes

    * Added kwargs when loading state.

    * added doc string

    * Adjusted correct default types of kwargs

    * Changed the load kwargs to a single one

    * removed None value from kwargs

    * fix kwargs for loading the model

    * removed load_kwargs from optimizer state dict

    * make load_kwargs a dictionary

    * revert last changes

    * reverted load_kwargs

    * fix docstring

    * added dict initiation

    * Fix quality error during PR

commit b6d97cb
Author: Emmanuel Ferdman <[email protected]>
Date:   Thu May 22 17:26:31 2025 +0300

    Resolve logger warnings (#3582)

    Signed-off-by: Emmanuel Ferdman <[email protected]>

commit 33967d4
Author: Francesco Laiti <[email protected]>
Date:   Tue May 20 12:29:53 2025 +0200

    Add support for standalone mode when default port is occupied on single node (#3576)

    * add standalone mode and replace ConnectionError with a warning when the main process port is in use, allowing for automatic port selection

    * address review feedback: warn on port conflict only for single-node; raise error for multi-node

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 5b1fcda
Author: Yao Matrix <[email protected]>
Date:   Tue May 20 18:04:24 2025 +0800

    enable test_cli & test_example cases on XPU (#3578)

    * enable test_cli & test_example cases on XPU

    Signed-off-by: Matrix Yao <[email protected]>

    * fix style

    Signed-off-by: Matrix Yao <[email protected]>

    * fix style

    Signed-off-by: Matrix Yao <[email protected]>

    * remove print

    Signed-off-by: Matrix Yao <[email protected]>

    * fix ci issue

    Signed-off-by: YAO Matrix <[email protected]>

    ---------

    Signed-off-by: Matrix Yao <[email protected]>
    Signed-off-by: YAO Matrix <[email protected]>

commit f55f053
Author: Yao Matrix <[email protected]>
Date:   Tue May 20 18:02:14 2025 +0800

    goodbye torch_ccl (#3580)

    Signed-off-by: Matrix Yao <[email protected]>

commit 1ec99f0
Author: Yao Matrix <[email protected]>
Date:   Mon May 19 17:27:40 2025 +0800

    enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU (#3579)

    * enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU

    Signed-off-by: Matrix Yao <[email protected]>

    * fix style

    Signed-off-by: Matrix Yao <[email protected]>

    * Update test_load_checkpoint_and_dispatch_with_broadcast.py

    ---------

    Signed-off-by: Matrix Yao <[email protected]>
S1ro1 added a commit that referenced this pull request Jul 9, 2025
commit 2f8fd72
Author: Simon <[email protected]>
Date:   Tue Jun 10 13:50:34 2025 +0100

    Remove device_count (#3587)

commit d2e6b03
Author: Matej Sirovatka <[email protected]>
Date:   Tue Jun 10 05:26:48 2025 -0700

    [FSDP2] Refactor + FP8 (#3585)

    * Fix double wrap

    * Clocking off, ~equal to torch baseline

    * works?

    * Working version

    * Partial rewrite

    * FSDP2 path works

    * Fix back prepare

    * Almost done, proper AC left

    * Feat: should work, cleanup + test more benchmarks left

    * Style+quality

    * Feat: fp8 example

    * Feat: better example

    * Feat: add readme

    * Docs + should be done

    * Fix: typos

    * Fix: protect imports

    * Feat: address comments

    * Feat: add flops image

commit b9fee48
Author: Ilyas Moutawwakil <[email protected]>
Date:   Tue Jun 10 13:24:43 2025 +0100

    better handle FP8 with and without deepspeed (#3611)

    * use the state mixed precision which has undergone all preprocessing

    * Update src/accelerate/accelerator.py

    Co-authored-by: Marc Sun <[email protected]>

    * Update src/accelerate/accelerator.py

    * accelerator state sets the mixed precision for deepspeed and fp8_enabled

    * fix

    * fix

    ---------

    Co-authored-by: Marc Sun <[email protected]>

commit 3a82b05
Author: Marc Sun <[email protected]>
Date:   Tue Jun 10 11:29:59 2025 +0200

    Fix bf16 training with TP  (#3610)

    * fix

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 6b61a37
Author: Ilyas Moutawwakil <[email protected]>
Date:   Fri Jun 6 13:48:43 2025 +0100

    fix deepspeed regional compilation (#3609)

commit 682691d
Author: Ilyas Moutawwakil <[email protected]>
Date:   Tue Jun 3 12:36:56 2025 +0200

    Update Gaudi Runners (#3593)

    * test

    * fix

    * push

    * in the morning

    * fix backend

    * run first

    * set habana modules

    * dynamo backend

    * trigger

    * remove on pr

    * remove on file change

commit 791055b
Author: Matej Sirovatka <[email protected]>
Date:   Tue Jun 3 12:24:20 2025 +0200

    Fix: list object has no attribute keys (#3603)

commit 16bf1d8
Author: Yao Matrix <[email protected]>
Date:   Fri May 30 23:36:34 2025 +0800

    enable torchao and pippy test cases on XPU (#3599)

    * enable torchao and pippy test cases on XPU

    Signed-off-by: Matrix YAO <[email protected]>

    * fix style

    Signed-off-by: Matrix YAO <[email protected]>

    ---------

    Signed-off-by: Matrix YAO <[email protected]>

commit ab3c604
Author: Yao Matrix <[email protected]>
Date:   Fri May 30 23:23:26 2025 +0800

    enable big_model_inference on xpu (#3595)

    * enable big_model_inference on XPU

    Signed-off-by: Matrix YAO <[email protected]>

    * fix style

    Signed-off-by: Matrix YAO <[email protected]>

    * fix quality

    Signed-off-by: Matrix YAO <[email protected]>

    ---------

    Signed-off-by: Matrix YAO <[email protected]>

commit 273799c
Author: Yao Matrix <[email protected]>
Date:   Tue May 27 20:08:59 2025 +0800

    enable fsdp2 benchmark on XPU (#3590)

    * enable fsdp2 benchmark on XPU

    Signed-off-by: Matrix YAO <[email protected]>

    * add deterministic

    Signed-off-by: Matrix YAO <[email protected]>

    ---------

    Signed-off-by: Matrix YAO <[email protected]>

commit 43526c5
Author: Yao Matrix <[email protected]>
Date:   Tue May 27 17:44:50 2025 +0800

    add device-agnostic GradScaler (#3588)

    * add device-agnostic GradScaler

    Signed-off-by: Matrix YAO <[email protected]>

    * fix bug

    Signed-off-by: Matrix YAO <[email protected]>

    * fix review comments

    Signed-off-by: Matrix YAO <[email protected]>

    * fix

    Signed-off-by: Matrix YAO <[email protected]>

    * format

    Signed-off-by: Matrix YAO <[email protected]>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <[email protected]>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 07f2392
Author: Yao Matrix <[email protected]>
Date:   Tue May 27 17:17:18 2025 +0800

    change to use torch.device (#3594)

    Signed-off-by: Matrix YAO <[email protected]>

commit ee2f48c
Author: Fanli Lin <[email protected]>
Date:   Tue May 27 17:16:42 2025 +0800

    [docs] no hard-coded cuda in the ddp documentation (#3589)

    * make device-agnostic

    * refactor

commit 4f3abb7
Author: jiqing-feng <[email protected]>
Date:   Mon May 26 21:55:10 2025 +0800

    Set ccl and KMP param in simple launch (#3575)

    * Even 1 CPU mechine can also run multi process

    Signed-off-by: jiqing-feng <[email protected]>

    * fix ccl and kml param setting

    Signed-off-by: jiqing-feng <[email protected]>

    * set master addr only when processes > 1

    Signed-off-by: jiqing-feng <[email protected]>

    * fix num process check

    Signed-off-by: jiqing-feng <[email protected]>

    * fix ccl args check

    Signed-off-by: jiqing-feng <[email protected]>

    ---------

    Signed-off-by: jiqing-feng <[email protected]>

commit db536cb
Author: Yuanzhou Cai <[email protected]>
Date:   Mon May 26 21:08:13 2025 +0800

    Fix: Defer Tracker Initialization to Prevent Premature Distributed Setup (#3581)

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Add test for bug #3550

    * Improve test for #3550

    * Remove redundant code

    Co-authored-by: Marc Sun <[email protected]>

    * fix style

    ---------

    Co-authored-by: Marc Sun <[email protected]>

commit 4e9d0de
Author: Yao Matrix <[email protected]>
Date:   Mon May 26 21:05:42 2025 +0800

    enable regional_compilation benchmark on xpu (#3592)

    * enable regional_compilation benchmark on xpu

    Signed-off-by: Matrix YAO <[email protected]>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <[email protected]>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 8cb3ace
Author: Luiz F. G. dos Santos <[email protected]>
Date:   Thu May 22 10:21:54 2025 -0500

    Add kwargs to optimizer, scheduler and dataloader using function `accelerator().load_state()` (#3540)

    * Added artifacts and figure tracking at MLFlow tracker

    * Added `log_artifact` to the MLFlowTracker

    * Remove changes

    * Added kwargs when loading state.

    * added doc string

    * Adjusted correct default types of kwargs

    * Changed the load kwargs to a single one

    * removed None value from kwargs

    * fix kwargs for loading the model

    * removed load_kwargs from optimizer state dict

    * make load_kwargs a dictionary

    * revert last changes

    * reverted load_kwargs

    * fix docstring

    * added dict initiation

    * Fix quality error during PR

commit b6d97cb
Author: Emmanuel Ferdman <[email protected]>
Date:   Thu May 22 17:26:31 2025 +0300

    Resolve logger warnings (#3582)

    Signed-off-by: Emmanuel Ferdman <[email protected]>

commit 33967d4
Author: Francesco Laiti <[email protected]>
Date:   Tue May 20 12:29:53 2025 +0200

    Add support for standalone mode when default port is occupied on single node (#3576)

    * add standalone mode and replace ConnectionError with a warning when the main process port is in use, allowing for automatic port selection

    * address review feedback: warn on port conflict only for single-node; raise error for multi-node

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 5b1fcda
Author: Yao Matrix <[email protected]>
Date:   Tue May 20 18:04:24 2025 +0800

    enable test_cli & test_example cases on XPU (#3578)

    * enable test_cli & test_example cases on XPU

    Signed-off-by: Matrix Yao <[email protected]>

    * fix style

    Signed-off-by: Matrix Yao <[email protected]>

    * fix style

    Signed-off-by: Matrix Yao <[email protected]>

    * remove print

    Signed-off-by: Matrix Yao <[email protected]>

    * fix ci issue

    Signed-off-by: YAO Matrix <[email protected]>

    ---------

    Signed-off-by: Matrix Yao <[email protected]>
    Signed-off-by: YAO Matrix <[email protected]>

commit f55f053
Author: Yao Matrix <[email protected]>
Date:   Tue May 20 18:02:14 2025 +0800

    goodbye torch_ccl (#3580)

    Signed-off-by: Matrix Yao <[email protected]>

commit 1ec99f0
Author: Yao Matrix <[email protected]>
Date:   Mon May 19 17:27:40 2025 +0800

    enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU (#3579)

    * enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU

    Signed-off-by: Matrix Yao <[email protected]>

    * fix style

    Signed-off-by: Matrix Yao <[email protected]>

    * Update test_load_checkpoint_and_dispatch_with_broadcast.py

    ---------

    Signed-off-by: Matrix Yao <[email protected]>
S1ro1 added a commit that referenced this pull request Jul 9, 2025
commit 2f8fd72
Author: Simon <[email protected]>
Date:   Tue Jun 10 13:50:34 2025 +0100

    Remove device_count (#3587)

commit d2e6b03
Author: Matej Sirovatka <[email protected]>
Date:   Tue Jun 10 05:26:48 2025 -0700

    [FSDP2] Refactor + FP8 (#3585)

    * Fix double wrap

    * Clocking off, ~equal to torch baseline

    * works?

    * Working version

    * Partial rewrite

    * FSDP2 path works

    * Fix back prepare

    * Almost done, proper AC left

    * Feat: should work, cleanup + test more benchmarks left

    * Style+quality

    * Feat: fp8 example

    * Feat: better example

    * Feat: add readme

    * Docs + should be done

    * Fix: typos

    * Fix: protect imports

    * Feat: address comments

    * Feat: add flops image

commit b9fee48
Author: Ilyas Moutawwakil <[email protected]>
Date:   Tue Jun 10 13:24:43 2025 +0100

    better handle FP8 with and without deepspeed (#3611)

    * use the state mixed precision which has undergone all preprocessing

    * Update src/accelerate/accelerator.py

    Co-authored-by: Marc Sun <[email protected]>

    * Update src/accelerate/accelerator.py

    * accelerator state sets the mixed precision for deepspeed and fp8_enabled

    * fix

    * fix

    ---------

    Co-authored-by: Marc Sun <[email protected]>

commit 3a82b05
Author: Marc Sun <[email protected]>
Date:   Tue Jun 10 11:29:59 2025 +0200

    Fix bf16 training with TP  (#3610)

    * fix

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 6b61a37
Author: Ilyas Moutawwakil <[email protected]>
Date:   Fri Jun 6 13:48:43 2025 +0100

    fix deepspeed regional compilation (#3609)

commit 682691d
Author: Ilyas Moutawwakil <[email protected]>
Date:   Tue Jun 3 12:36:56 2025 +0200

    Update Gaudi Runners (#3593)

    * test

    * fix

    * push

    * in the morning

    * fix backend

    * run first

    * set habana modules

    * dynamo backend

    * trigger

    * remove on pr

    * remove on file change

commit 791055b
Author: Matej Sirovatka <[email protected]>
Date:   Tue Jun 3 12:24:20 2025 +0200

    Fix: list object has no attribute keys (#3603)

commit 16bf1d8
Author: Yao Matrix <[email protected]>
Date:   Fri May 30 23:36:34 2025 +0800

    enable torchao and pippy test cases on XPU (#3599)

    * enable torchao and pippy test cases on XPU

    Signed-off-by: Matrix YAO <[email protected]>

    * fix style

    Signed-off-by: Matrix YAO <[email protected]>

    ---------

    Signed-off-by: Matrix YAO <[email protected]>

commit ab3c604
Author: Yao Matrix <[email protected]>
Date:   Fri May 30 23:23:26 2025 +0800

    enable big_model_inference on xpu (#3595)

    * enable big_model_inference on XPU

    Signed-off-by: Matrix YAO <[email protected]>

    * fix style

    Signed-off-by: Matrix YAO <[email protected]>

    * fix quality

    Signed-off-by: Matrix YAO <[email protected]>

    ---------

    Signed-off-by: Matrix YAO <[email protected]>

commit 273799c
Author: Yao Matrix <[email protected]>
Date:   Tue May 27 20:08:59 2025 +0800

    enable fsdp2 benchmark on XPU (#3590)

    * enable fsdp2 benchmark on XPU

    Signed-off-by: Matrix YAO <[email protected]>

    * add deterministic

    Signed-off-by: Matrix YAO <[email protected]>

    ---------

    Signed-off-by: Matrix YAO <[email protected]>

commit 43526c5
Author: Yao Matrix <[email protected]>
Date:   Tue May 27 17:44:50 2025 +0800

    add device-agnostic GradScaler (#3588)

    * add device-agnostic GradScaler

    Signed-off-by: Matrix YAO <[email protected]>

    * fix bug

    Signed-off-by: Matrix YAO <[email protected]>

    * fix review comments

    Signed-off-by: Matrix YAO <[email protected]>

    * fix

    Signed-off-by: Matrix YAO <[email protected]>

    * format

    Signed-off-by: Matrix YAO <[email protected]>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <[email protected]>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 07f2392
Author: Yao Matrix <[email protected]>
Date:   Tue May 27 17:17:18 2025 +0800

    change to use torch.device (#3594)

    Signed-off-by: Matrix YAO <[email protected]>

commit ee2f48c
Author: Fanli Lin <[email protected]>
Date:   Tue May 27 17:16:42 2025 +0800

    [docs] no hard-coded cuda in the ddp documentation (#3589)

    * make device-agnostic

    * refactor

commit 4f3abb7
Author: jiqing-feng <[email protected]>
Date:   Mon May 26 21:55:10 2025 +0800

    Set ccl and KMP param in simple launch (#3575)

    * Even 1 CPU mechine can also run multi process

    Signed-off-by: jiqing-feng <[email protected]>

    * fix ccl and kml param setting

    Signed-off-by: jiqing-feng <[email protected]>

    * set master addr only when processes > 1

    Signed-off-by: jiqing-feng <[email protected]>

    * fix num process check

    Signed-off-by: jiqing-feng <[email protected]>

    * fix ccl args check

    Signed-off-by: jiqing-feng <[email protected]>

    ---------

    Signed-off-by: jiqing-feng <[email protected]>

commit db536cb
Author: Yuanzhou Cai <[email protected]>
Date:   Mon May 26 21:08:13 2025 +0800

    Fix: Defer Tracker Initialization to Prevent Premature Distributed Setup (#3581)

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Add test for bug #3550

    * Improve test for #3550

    * Remove redundant code

    Co-authored-by: Marc Sun <[email protected]>

    * fix style

    ---------

    Co-authored-by: Marc Sun <[email protected]>

commit 4e9d0de
Author: Yao Matrix <[email protected]>
Date:   Mon May 26 21:05:42 2025 +0800

    enable regional_compilation benchmark on xpu (#3592)

    * enable regional_compilation benchmark on xpu

    Signed-off-by: Matrix YAO <[email protected]>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <[email protected]>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 8cb3ace
Author: Luiz F. G. dos Santos <[email protected]>
Date:   Thu May 22 10:21:54 2025 -0500

    Add kwargs to optimizer, scheduler and dataloader using function `accelerator().load_state()` (#3540)

    * Added artifacts and figure tracking at MLFlow tracker

    * Added `log_artifact` to the MLFlowTracker

    * Remove changes

    * Added kwargs when loading state.

    * added doc string

    * Adjusted correct default types of kwargs

    * Changed the load kwargs to a single one

    * removed None value from kwargs

    * fix kwargs for loading the model

    * removed load_kwargs from optimizer state dict

    * make load_kwargs a dictionary

    * revert last changes

    * reverted load_kwargs

    * fix docstring

    * added dict initiation

    * Fix quality error during PR

commit b6d97cb
Author: Emmanuel Ferdman <[email protected]>
Date:   Thu May 22 17:26:31 2025 +0300

    Resolve logger warnings (#3582)

    Signed-off-by: Emmanuel Ferdman <[email protected]>

commit 33967d4
Author: Francesco Laiti <[email protected]>
Date:   Tue May 20 12:29:53 2025 +0200

    Add support for standalone mode when default port is occupied on single node (#3576)

    * add standalone mode and replace ConnectionError with a warning when the main process port is in use, allowing for automatic port selection

    * address review feedback: warn on port conflict only for single-node; raise error for multi-node

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 5b1fcda
Author: Yao Matrix <[email protected]>
Date:   Tue May 20 18:04:24 2025 +0800

    enable test_cli & test_example cases on XPU (#3578)

    * enable test_cli & test_example cases on XPU

    Signed-off-by: Matrix Yao <[email protected]>

    * fix style

    Signed-off-by: Matrix Yao <[email protected]>

    * fix style

    Signed-off-by: Matrix Yao <[email protected]>

    * remove print

    Signed-off-by: Matrix Yao <[email protected]>

    * fix ci issue

    Signed-off-by: YAO Matrix <[email protected]>

    ---------

    Signed-off-by: Matrix Yao <[email protected]>
    Signed-off-by: YAO Matrix <[email protected]>

commit f55f053
Author: Yao Matrix <[email protected]>
Date:   Tue May 20 18:02:14 2025 +0800

    goodbye torch_ccl (#3580)

    Signed-off-by: Matrix Yao <[email protected]>

commit 1ec99f0
Author: Yao Matrix <[email protected]>
Date:   Mon May 19 17:27:40 2025 +0800

    enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU (#3579)

    * enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU

    Signed-off-by: Matrix Yao <[email protected]>

    * fix style

    Signed-off-by: Matrix Yao <[email protected]>

    * Update test_load_checkpoint_and_dispatch_with_broadcast.py

    ---------

    Signed-off-by: Matrix Yao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants