Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ table th:nth-of-type(4) {

| Use case | Quality-optimized | Balanced | Speed-optimized |
|----------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|
| Text generation following instructions | [MPT-30B-Instruct](llm-models/mpt/mpt-30b/) <br> <br> [Llama-2-70b-chat-hf](llm-models/llamav2/llamav2-70b) | [MPT-7B-Instruct](llm-models/mpt/mpt-7b) <br> [MPT-7B-8k-Instruct](llm-models/mpt/mpt-7b-8k) <br> <br> [Llama-2-7b-chat-hf](llm-models/llamav2/llamav2-7b) <br> [Llama-2-13b-chat-hf](llm-models/llamav2/llamav2-13b) | |
| Text generation following instructions | [MPT-30B-Instruct](llm-models/mpt/mpt-30b/) <br> <br> [Llama-2-70b-chat-hf](llm-models/llamav2/llamav2-70b) | [mistral-7b](llm-models/mistral/mistral-7b) <br><br> [MPT-7B-Instruct](llm-models/mpt/mpt-7b) <br> [MPT-7B-8k-Instruct](llm-models/mpt/mpt-7b-8k) <br> <br> [Llama-2-7b-chat-hf](llm-models/llamav2/llamav2-7b) <br> [Llama-2-13b-chat-hf](llm-models/llamav2/llamav2-13b) | |
| Text embeddings (English only) | | [bge-large-en(0.3B)](llm-models/embedding/bge/bge-large) <br> [e5-large-v2 (0.3B)](llm-models/embedding/e5-v2) <br> [instructor-xl (1.3B)](llm-models/embedding/instructor-xl)* | [bge-base-en (0.1B)](llm-models/embedding/bge) <br> [e5-base-v2 (0.1B)](llm-models/embedding/e5-v2) |
| Transcription (speech to text) | | [whisper-large-v2](llm-models/transcription/whisper)(1.6B) <br> [whisper-medium](llm-models/transcription/whisper) (0.8B) | |
| Image generation | | [stable-diffusion-xl](llm-models/image_generation/stable_diffusion) | |
Expand Down
47 changes: 47 additions & 0 deletions llm-models/config/a10_config_zero2.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
{
"fp16": {
"enabled": false
},
"bf16": {
"enabled": true
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"betas": "auto",
"eps": "auto",
"weight_decay": "auto"
}
},
"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": "auto",
"warmup_max_lr": "auto",
"warmup_num_steps": "auto"
}
},
"zero_optimization": {
"stage": 2,
"overlap_comm": true,
"contiguous_gradients": true,
"sub_group_size": 5e7,
"reduce_bucket_size": "auto",
"reduce_scatter": true,
"offload_param": {
"device": "cpu",
"pin_memory": true
},
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
}
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 50,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false
}
74 changes: 74 additions & 0 deletions llm-models/llamav2/llamav2-70b/07_fine_tune_lora.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 07 is used for AI gateway, maybe other indices

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Let's design a proper orders after.

Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Databricks notebook source
# MAGIC %md
# MAGIC
# MAGIC # Fine tune llama-2-70b with LoRA and deepspeed on a single node
# MAGIC
# MAGIC [Llama 2](https://huggingface.co/meta-llama) is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. It is trained with 2T tokens and supports context length window upto 4K tokens. [Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) is the 7B pretrained model, converted for the Hugging Face Transformers format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
# MAGIC [Llama 2](https://huggingface.co/meta-llama) is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. It is trained with 2T tokens and supports context length window upto 4K tokens. [Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) is the 7B pretrained model, converted for the Hugging Face Transformers format.
# MAGIC [Llama 2](https://huggingface.co/meta-llama) is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. It is trained with 2T tokens and supports context length window upto 4K tokens. [Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) is the 70B pretrained model, converted for the Hugging Face Transformers format.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deepspeed is used for multi-GPU training with LORA.

# MAGIC
# MAGIC This is to fine-tune [llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) models on the [dolly_hhrlhf](https://huggingface.co/datasets/mosaicml/dolly_hhrlhf) dataset.
# MAGIC
# MAGIC Environment for this notebook:
# MAGIC - Runtime: 14.0 GPU ML Runtime
# MAGIC - Instance: `Standard_NC48ads_A100_v4` on Azure with 2 A100-80GB GPUs, `p4d.24xlarge` on AWS with 8 A100-40GB GPUs
# MAGIC
# MAGIC Requirements:
# MAGIC - To get the access of the model on HuggingFace, please visit the [Meta website](https://ai.meta.com/resources/models-and-libraries/llama-downloads) and accept our license terms and acceptable use policy before submitting this form. Requests will be processed in 1-2 days.
# MAGIC

# COMMAND ----------

# MAGIC %md
# MAGIC Install the missing libraries

# COMMAND ----------

# MAGIC %pip install deepspeed==0.9.5 xformers
# MAGIC %pip install git+https://github.com/huggingface/peft.git
# MAGIC %pip install bitsandbytes==0.40.1 einops==0.6.1 trl==0.4.7
# MAGIC %pip install -U torch==2.0.1 accelerate==0.21.0 transformers==4.31.0
# MAGIC dbutils.library.restartPython()

# COMMAND ----------

import os
os.environ["HF_HOME"] = "/local_disk0/hf"
os.environ["HF_DATASETS_CACHE"] = "/local_disk0/hf"
os.environ["TRANSFORMERS_CACHE"] = "/local_disk0/hf"

# COMMAND ----------

from huggingface_hub import notebook_login

# Login to Huggingface to get access to the model
notebook_login()

# COMMAND ----------

# MAGIC %md
# MAGIC ## Fine tune the model with `deepspeed`
# MAGIC
# MAGIC The fine tune logic is written in `scripts/fine_tune_deepspeed.py`. The dataset used for fine tune is [databricks-dolly-15k ](https://huggingface.co/datasets/databricks/databricks-dolly-15k) dataset.
# MAGIC
# MAGIC

# COMMAND ----------

# MAGIC %sh
# MAGIC deepspeed \
# MAGIC --num_gpus 2 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--num_gpus is probably not needed because deepspeed can use all the GPUs on the machine

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Let me remove it.

# MAGIC scripts/fine_tune_lora.py \
# MAGIC --output_dir="/local_disk0/output"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: What is the difference between --output_dir and /local_disk0/final_model, is the latter just the LoRA weights?


# COMMAND ----------

# MAGIC %md
# MAGIC Model checkpoint is saved at `/local_disk0/final_model`.

# COMMAND ----------

# MAGIC %sh
# MAGIC ls /local_disk0/final_model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add instructions or code for how to load this for inference?


# COMMAND ----------


Loading