switch XPU ccl backend to torch-builtin xccl in test_zero3_integration #3773

yao-matrix · 2025-09-11T16:58:38Z

switch to use torch-builtin xccl on XPU in test_zero3_integration to make pytest -rA tests/deepspeed/test_deepspeed.py::DeepSpeedIntegrationTest::test_zero3_integration pass since PT 2.8.
remove xpu workaround in RegressionModel, we are OK now
rename test_multigpu to test_multidevice to reflect the fact that they are MultiDeviceTester.

@S1ro1 , pls help review, thx very much.

remove xpu workaround in RegressionModel, we are OK now rename test_multigpu to test_multidevice to reflect the fact Signed-off-by: Yao, Matrix <[email protected]>

yao-matrix · 2025-09-15T18:14:57Z

@SunMarc @S1ro1 , pls help review, thx very much

S1ro1

LGTM

S1ro1 · 2025-09-16T10:28:57Z

Can you fix tests + quality please? Seems to be breaking.

HuggingFaceDocBuilderDev · 2025-09-16T10:29:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

Nice thanks ! please fix the CI

Signed-off-by: Yao, Matrix <[email protected]>

yao-matrix · 2025-09-17T17:53:21Z

@S1ro1 , @SunMarc done, thx very much.

Signed-off-by: Yao, Matrix <[email protected]>

SunMarc

Nice

switch XPU ccl backend to torch-builtin xccl in test_zero3_integration

484ffc9

remove xpu workaround in RegressionModel, we are OK now rename test_multigpu to test_multidevice to reflect the fact Signed-off-by: Yao, Matrix <[email protected]>

S1ro1 approved these changes Sep 16, 2025

View reviewed changes

SunMarc approved these changes Sep 17, 2025

View reviewed changes

fix ci issues

246b72c

Signed-off-by: Yao, Matrix <[email protected]>

xx

2d1ffec

Signed-off-by: Yao, Matrix <[email protected]>

SunMarc approved these changes Sep 18, 2025

View reviewed changes

SunMarc merged commit fe795fd into huggingface:main Sep 18, 2025
23 of 25 checks passed

yao-matrix deleted the issue-515 branch September 18, 2025 18:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

switch XPU ccl backend to torch-builtin xccl in test_zero3_integration #3773

switch XPU ccl backend to torch-builtin xccl in test_zero3_integration #3773

Uh oh!

yao-matrix commented Sep 11, 2025 •

edited

Loading

Uh oh!

yao-matrix commented Sep 15, 2025

Uh oh!

S1ro1 left a comment

Uh oh!

S1ro1 commented Sep 16, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 16, 2025

Uh oh!

SunMarc left a comment •

edited

Loading

Uh oh!

yao-matrix commented Sep 17, 2025

Uh oh!

SunMarc left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

switch XPU ccl backend to torch-builtin xccl in test_zero3_integration #3773

switch XPU ccl backend to torch-builtin xccl in test_zero3_integration #3773

Uh oh!

Conversation

yao-matrix commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yao-matrix commented Sep 15, 2025

Uh oh!

S1ro1 left a comment

Choose a reason for hiding this comment

Uh oh!

S1ro1 commented Sep 16, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 16, 2025

Uh oh!

SunMarc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yao-matrix commented Sep 17, 2025

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yao-matrix commented Sep 11, 2025 •

edited

Loading

SunMarc left a comment •

edited

Loading