Add new example to fine tune llama-2 70b with lora#80
Add new example to fine tune llama-2 70b with lora#80lu-wang-dl wants to merge 8 commits intomasterfrom
Conversation
es94129
left a comment
There was a problem hiding this comment.
Thanks for adding this, looks very cool!
Wondering why deepspeed is required, is it for the memory optimization?
| # MAGIC | ||
| # MAGIC # Fine tune llama-2-70b with deepspeed | ||
| # MAGIC | ||
| # MAGIC [Llama 2](https://huggingface.co/meta-llama) is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. It is trained with 2T tokens and supports context length window upto 4K tokens. [Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) is the 7B pretrained model, converted for the Hugging Face Transformers format. |
There was a problem hiding this comment.
nit
| # MAGIC [Llama 2](https://huggingface.co/meta-llama) is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. It is trained with 2T tokens and supports context length window upto 4K tokens. [Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) is the 7B pretrained model, converted for the Hugging Face Transformers format. | |
| # MAGIC [Llama 2](https://huggingface.co/meta-llama) is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. It is trained with 2T tokens and supports context length window upto 4K tokens. [Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) is the 70B pretrained model, converted for the Hugging Face Transformers format. |
There was a problem hiding this comment.
deepspeed is used for multi-GPU training with LORA.
There was a problem hiding this comment.
Since 07 is used for AI gateway, maybe other indices
There was a problem hiding this comment.
Sure. Let's design a proper orders after.
|
|
||
| # MAGIC %sh | ||
| # MAGIC deepspeed \ | ||
| # MAGIC --num_gpus 2 \ |
There was a problem hiding this comment.
--num_gpus is probably not needed because deepspeed can use all the GPUs on the machine
There was a problem hiding this comment.
Good point. Let me remove it.
| MODEL_PATH = 'meta-llama/Llama-2-70b-hf' | ||
| TOKENIZER_PATH = 'meta-llama/Llama-2-70b-hf' | ||
| DEFAULT_TRAINING_DATASET = "mosaicml/dolly_hhrlhf" | ||
| CONFIG_PATH = "../../config/a10_config_zero2.json" |
There was a problem hiding this comment.
Maybe rename the file to a100_...?
There was a problem hiding this comment.
Maybe rename the file to a100_...?
| # MAGIC deepspeed \ | ||
| # MAGIC --num_gpus 2 \ | ||
| # MAGIC scripts/fine_tune_lora.py \ | ||
| # MAGIC --output_dir="/local_disk0/output" |
There was a problem hiding this comment.
Q: What is the difference between --output_dir and /local_disk0/final_model, is the latter just the LoRA weights?
| # COMMAND ---------- | ||
|
|
||
| # MAGIC %sh | ||
| # MAGIC ls /local_disk0/final_model |
There was a problem hiding this comment.
Could you also add instructions or code for how to load this for inference?
Co-authored-by: Ying Chen <ying.chen@databricks.com>
Tested on: https://adb-7064161269814046.2.staging.azuredatabricks.net/?o=7064161269814046#notebook/94670986903573/command/94670986903574