Add support for TE MXFP8 recipe in accelerate #3688

pstjohn · 2025-07-21T20:16:06Z

What does this PR do?

Adds support for the MXFP8 format in TE. See the TE docs pages for more background:
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/examples/fp8_primer.html#MXFP8-and-block-scaling

This adds an additional fp8_recipe argument, use_mxfp8_block_scaling, that switches the recipe from the DelayedScaling recipe to MXFP8BlockScaling.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

pstjohn · 2025-07-22T14:56:42Z

This is outside the initial scope for this PR, but there's some oddity when using Deepspeed + FP8 + the HF Trainer.

If you set bf16: True in your TrainingArguments, the trainer will override the fp8 parameter you pass to Accelerate (manually setting ACCELERATE_MIXED_PRECISION=bf16) here:

And if you omit it, ACCELERATE_MIXED_PRECISION stays as FP8 but you then get an error raised here about a config mismatch:

ValueError: Please correct the following DeepSpeed config values that mismatch TrainingArguments values:
- ds bf16.enabled=True vs hf bf16|bf16_full_eval=False
The easiest method is to set these DeepSpeed config values to 'auto'.

Interestingly it's still possible to use FP8 with deepspeed currently? But it seems like a bug. This check:
https://github.com/huggingface/accelerate/blame/2f075c724ccb4e38fade64db3b0627ca167b5fd2/src/accelerate/accelerator.py#L2046-L2047
will trigger when you pass an fp8 backend with deepspeed, because it only checks for fp8_backend, not fp8_enabled. But you wont create the TERecipeKwargs object, so it will just create the fp8 autowrap context with the default recipe.

There have been a number of "FP8 + deepspeed" PRs here in the past, I'm wondering if the cleanest option is to separate "mixed_precision" from fp8. fp8 typically uses bf16 for model weights and between FP8-enabled layers anyways.

pstjohn · 2025-07-22T19:33:36Z

src/accelerate/accelerator.py

+            if (
+                AcceleratorState._shared_state != {}
+                and AcceleratorState().distributed_type == DistributedType.DEEPSPEED
+            ):


formatting only change, not sure why it's changing it from main

Signed-off-by: Peter St. John <[email protected]>

S1ro1 · 2025-08-02T16:37:58Z

Do I understand correctly that this only covers DeepSpeed?

pstjohn · 2025-08-14T02:35:28Z

No, this lets you pass use_mxfp8_block_scaling in your FP8 config with TE to use MXFP8 dtypes.

I think ultimately the complication with deepspeed is that there's a single mixed_precision accelerator state, and when we use deepspeed, we want to let deepspeed control that variable. But fp8 and bf16 are almost orthogonal concepts in their current implementations -- you can "use" both bf16 and fp8 (see some of the benchmarks scripts here where we use both autowrap contexts:

accelerate/benchmarks/fp8/transformer_engine/ddp.py

Lines 70 to 71 in 23cf4ef

    
           with te.fp8_autocast(enabled=True, fp8_recipe=fp8_recipe): 
        
               with torch.autocast(device_type="cuda", dtype=torch.bfloat16):

), since fp8 is largely confined to the internals of layers, they still return fp32 / bf16 activations between layers.

pstjohn · 2025-08-14T02:37:06Z

@akakakakakaa, I'm not sure I've fully fixed the FP8 + deepspeed bugs in this PR; this is mainly focused on trying to get MXFP8 support with TE enabled

SunMarc

Thanks for this clean PR. Indeed for deepspeed and fp8, some cleaning is still required. I think it could make sense to separate fp8`` from mixed_precision`. As you said, those should be orthogonal

HuggingFaceDocBuilderDev · 2025-08-26T14:44:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

pstjohn changed the title ~~Add support for MXFP8 recipe in accelerate~~ Add support for TE MXFP8 recipe in accelerate Jul 22, 2025

pstjohn commented Jul 22, 2025

View reviewed changes

pstjohn force-pushed the pstjohn/te-mxfp8-recipe branch from edbe9d5 to 46ebf27 Compare July 22, 2025 19:37

pstjohn marked this pull request as ready for review July 30, 2025 22:11

pstjohn added 4 commits July 30, 2025 15:13

Add support for MXFP8 recipe in accelerate

3f80aad

ruff reformat

4339894

add and fix test for deepspeed / fp8 from config

131b116

minor lints

1fb8f76

Signed-off-by: Peter St. John <[email protected]>

pstjohn force-pushed the pstjohn/te-mxfp8-recipe branch from 46ebf27 to 1fb8f76 Compare July 30, 2025 22:13

akakakakakaa mentioned this pull request Aug 6, 2025

fix AcceleratorState deepspeed fp8 setting #3714

Closed

5 tasks

SunMarc requested review from IlyasMoutawwakil, S1ro1 and SunMarc August 7, 2025 15:00

SunMarc approved these changes Aug 26, 2025

View reviewed changes

SunMarc merged commit 3b67c21 into huggingface:main Aug 27, 2025
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for TE MXFP8 recipe in accelerate #3688

Add support for TE MXFP8 recipe in accelerate #3688

Uh oh!

pstjohn commented Jul 21, 2025

Uh oh!

pstjohn commented Jul 22, 2025 •

edited

Loading

Uh oh!

pstjohn Jul 22, 2025

Uh oh!

S1ro1 commented Aug 2, 2025

Uh oh!

pstjohn commented Aug 14, 2025

Uh oh!

pstjohn commented Aug 14, 2025

Uh oh!

SunMarc left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Aug 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add support for TE MXFP8 recipe in accelerate #3688

Add support for TE MXFP8 recipe in accelerate #3688

Uh oh!

Conversation

pstjohn commented Jul 21, 2025

What does this PR do?

Before submitting

Uh oh!

pstjohn commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pstjohn Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

S1ro1 commented Aug 2, 2025

Uh oh!

pstjohn commented Aug 14, 2025

Uh oh!

pstjohn commented Aug 14, 2025

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Aug 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pstjohn commented Jul 22, 2025 •

edited

Loading