add support for tensor learning rate (vs scalar) #7633

NirSonnenschein · 2025-10-16T11:41:22Z

This change is intended to help enable support for using a tensor learning rate value vs a scalar ones.
We found this helpful in cases where the optimizer is torch.compiled (in such cases changing the scalar LR value could cause recompilation degrading the performance).
The implementation allows the model script to determine the type of LR value used by setting the initial value.

This change is intended to help enable support for using a tensor learning rate value vs a scalar ones. We found this helpful in cases where the Optimizer is torch.compiled (in such cases changing the scalar LR value could cause recompilation degrading the performance). The implementation allows the model script to determine the type of LR value used , by setting the initial value.

NirSonnenschein · 2025-10-19T14:13:02Z

Thanks @sfc-gh-truwase
small question: the CI failure doesn't seem to be related to the commit:
FAILED tests/unit/v1/zero/test_zero.py::TestZero3RepeatForwardLoop::test[True] - AttributeError: 'int' object has no attribute 'pt_reserved_cores_perc'
is this a known issue?

eternalNight · 2025-10-20T05:07:28Z

Thanks @sfc-gh-truwase small question: the CI failure doesn't seem to be related to the commit: FAILED tests/unit/v1/zero/test_zero.py::TestZero3RepeatForwardLoop::test[True] - AttributeError: 'int' object has no attribute 'pt_reserved_cores_perc' is this a known issue?

#7634 attempts to fix that, but is blocked because the CI seems not testing the right branch (yet).

NirSonnenschein requested review from tjruwase and tohtana as code owners October 16, 2025 11:41

sfc-gh-truwase approved these changes Oct 17, 2025

View reviewed changes

Merge branch 'master' into add_tensor_LR_support

3486331

tohtana enabled auto-merge (squash) October 20, 2025 05:08

tohtana merged commit 407708c into deepspeedai:master Oct 20, 2025
11 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add support for tensor learning rate (vs scalar) #7633

add support for tensor learning rate (vs scalar) #7633

Uh oh!

NirSonnenschein commented Oct 16, 2025

Uh oh!

NirSonnenschein commented Oct 19, 2025

Uh oh!

eternalNight commented Oct 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

add support for tensor learning rate (vs scalar) #7633

add support for tensor learning rate (vs scalar) #7633

Uh oh!

Conversation

NirSonnenschein commented Oct 16, 2025

Uh oh!

NirSonnenschein commented Oct 19, 2025

Uh oh!

eternalNight commented Oct 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants