Deepspeed Ulysses/ALST integration
Deepspeed Ulysses/ALST is an efficient way of training on long sequences by employing sequence parallelism and attention head parallelism. You can learn more about this technology in this paper https://arxiv.org/abs/2506.13996 or this deepspeed tutorial https://www.deepspeed.ai/tutorials/ulysses-alst-sequence-parallelism/.
To enable Deepspeed Ulysses, you first need to create ParallelismConfig and setting sp related args:
parallelism_config = ParallelismConfig(
sp_backend="deepspeed",
sp_size=2,
sp_handler=DeepSpeedSequenceParallelConfig(...),
)Then, you need to make sure to compute the correct loss as described on our docs
...
losses_per_rank = torch.distributed.nn.functional.all_gather(loss, group=sp_group)
good_tokens = (shift_labels != -100).view(-1).sum()
good_tokens_per_rank = torch.distributed.nn.functional.all_gather(good_tokens, group=sp_group)
total_loss = sum(
losses_per_rank[rank] * good_tokens_per_rank[rank]
for rank in range(sp_world_size)
if good_tokens_per_rank[rank] > 0
)
total_good_tokens = sum(good_tokens_per_rank)
loss = total_loss / max(total_good_tokens, 1)Thanks @S1ro1 for starting this work and for @stas00 for finishing this work. Also thanks @kashif for adding docs and reviewing/testing this PR !
This feature will also be available in HF Trainer thanks for this PR from @stas00: huggingface/transformers#41832
Minor changes
- Remove warning for
cpu_ram_efficient_loadingby @SunMarc in #3816 - update typo in bnb quantisation 4bit flag docstring by @hbraith in #3828
- ArXiv -> HF Papers by @qgallouedec in #3834
- Fix typo in broadcast_object_list docstring by @wsntxxn in #3823
- [Bug] Update torch.optim.Optimizer parameter states after tensor parallelism by @naomili0924 in #3835
- use self hosted runner by @SunMarc in #3841
- device type helper by @kashif in #3843
New Contributors
- @hbraith made their first contribution in #3828
- @wsntxxn made their first contribution in #3823
- @naomili0924 made their first contribution in #3835
Full Changelog: v1.11.0...v1.12.0