[DeepSpeed-Chat] Fix OOM issue in dataloader by youkaichao · Pull Request #841 · deepspeedai/DeepSpeedExamples

youkaichao · 2024-01-01T07:52:26Z

Currently, DeepSpeed-Chat directly saves tokenized tensors on disk, which consumes hundreds GB of memory. For each string, it will be converted to max_seq_len of attention_mask and input_ids, stored in int32 or int64.

If we count about 2~3 char per token, then tokenized tensors can take on average hundreds of byte in storage. This is very problematic, and when the prompt dataset becomes larger (say 1GB), the on-disk dataset can be hundreds of GB.

What's worse, DeepSpeed-Chat will load these data in memory, which can require hundreds of GB of memory.

Per my personal experience, my 1.1GB prompt dataset incurs OOM in a 512GB machine, even if I'm just using 512 as max_seq_len. If I want to use 2048 as max_seq_len, that would be four times more memory, i.e. 2TB :(

This PR only saves the string, and tokenizes the string on-the-fly. The saved data are about the same size of the input dataset.

youkaichao · 2024-01-01T07:53:09Z

@microsoft-github-policy-service agree

youkaichao · 2024-01-03T12:44:53Z

Hi, team, any feedback on this 👀

loadams · 2025-01-24T22:25:17Z

Hi, team, any feedback on this 👀

Hi @youkaichao - sorry we didn't get to this until now. Would you want to fix the merge conflicts?

youkaichao · 2025-02-04T08:35:28Z

feel free to take it over if you think this is still useful. I'm not working on this thread any more :)

use online tokenizer to avoid oom

17c8243

youkaichao requested review from ShadenSmith, arashb, awan-10, conglongli, duli2012, eltonzheng, minjiaz, mrwyattii, tjruwase and xiaoxiawu-microsoft as code owners January 1, 2024 07:52

loadams removed request for ShadenSmith, arashb, conglongli, duli2012, eltonzheng, minjiaz, mrwyattii and xiaoxiawu-microsoft January 24, 2025 22:11

loadams self-assigned this Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DeepSpeed-Chat] Fix OOM issue in dataloader#841

[DeepSpeed-Chat] Fix OOM issue in dataloader#841
youkaichao wants to merge 1 commit intodeepspeedai:masterfrom
youkaichao:fix_oom

youkaichao commented Jan 1, 2024 •

edited

Loading

Uh oh!

youkaichao commented Jan 1, 2024

Uh oh!

youkaichao commented Jan 3, 2024

Uh oh!

loadams commented Jan 24, 2025

Uh oh!

youkaichao commented Feb 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

youkaichao commented Jan 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

youkaichao commented Jan 1, 2024

Uh oh!

youkaichao commented Jan 3, 2024

Uh oh!

loadams commented Jan 24, 2025

Uh oh!

youkaichao commented Feb 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

youkaichao commented Jan 1, 2024 •

edited

Loading