Official code for "Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis"
# Install
pip install habibi-tts
# Launch the GUI TTS interface
habibi-tts_infer-gradioImportant
Read the F5-TTS documentation for (1) Detailed installation guidance; (2) Best practice for inference; etc.
# Default using the Unified model (recommanded)
habibi-tts_infer-cli \
--ref_audio "assets/MSA.mp3" \
--ref_text "كان اللعيب حاضرًا في العديد من الأنشطة والفعاليات المرتبطة بكأس العالم، مما سمح للجماهير بالتفاعل معه والتقاط الصور التذكارية." \
--gen_text "أهلًا، يبدو أن هناك بعض التعقيدات، لكن لا تقلق، سأرشدك بطريقة سلسة وواضحة خطوة بخطوة."
# Assign the dialect ID, rather than inferred from given reference prompt (UNK, by default)
# (best use matched dialectal content with ID: MSA, SAU, UAE, ALG, IRQ, EGY, MAR, OMN, TUN, LEV, SDN, LBY)
habibi-tts_infer-cli --dialect MSA
# Alternatively, use `.toml` file to config, see `src/habibi_tts/infer/example.toml`
habibi-tts_infer-cli -c YOUR_CUSTOM.toml
# Check more CLI features with
habibi-tts_infer-cli --helpNote
Some dialectal audio samples are provided under src/habibi_tts/assets, see the relevant README.md for usage and more details.
See #2.
# Example template for benchmark use:
python src/habibi_tts/eval/0_benchmark.py -d MSA
# --dialect DIALECT (MSA | SAU | UAE | ALG | IRQ | EGY | MAR)# Zero-shot TTS performance evaluation:
accelerate launch src/habibi_tts/eval/1_infer_habibi.py -m Unified -d MAR
# --model MODEL (Unified | Specialized)
# --dialect DIALECT (MSA | SAU | UAE | ALG | IRQ | EGY | MAR)
# Use single prompt, to compare with 11Labs model:
accelerate launch src/habibi_tts/eval/1_infer_habibi.py -m Specialized -d IRQ -s
# --single (<- add this flag)
# Use single prompt, call ElevenLabs Eleven v3 (alpha) API:
pip install elevenlabs
python src/habibi_tts/eval/1_infer_11labs.py -a YOUR_API_KEY -d MSA
# --api-key API_KEY (your 11labs account API key)
# --dialect DIALECT (MSA | SAU | UAE | ALG | IRQ | EGY | MAR)# Evaluate WER-O with Meta Omnilingual-ASR-LLM-7B v1:
pip install omnilingual-asr
python src/habibi_tts/eval/2_cal_wer-o.py -w results/Habibi/IRQ_Specialized_single -d IRQ
# --wav-dir WAV_DIR (the folder of generated samples)
# --dialect DIALECT (MSA | SAU | UAE | ALG | IRQ | EGY | MAR)
# --batch-size BATCH_SIZE (set smaller if OOM, default 64)
# Evaluate WER-S with dialect-specific ASR models:
python src/habibi_tts/eval/2_cal_wer-s.py -w results/Habibi/MAR_Unified -d MAR
# --wav-dir WAV_DIR (the folder of generated samples)
# --dialect DIALECT (EGY | MAR)Download WavLM Model from Google Drive, then
python src/habibi_tts/eval/3_cal_spksim.py -w results/Habibi/MAR_Unified -d MAR -c YOUR_WAVLM_PATH
# --wav-dir WAV_DIR (the folder of generated samples)
# --dialect DIALECT (MSA | SAU | UAE | ALG | IRQ | EGY | MAR)
# --ckpt CKPT (the path of download WavLM model)
python src/habibi_tts/eval/3_cal_spksim.py -w results/Habibi/IRQ_Specialized_single -d IRQ -c YOUR_WAVLM_PATH -s
# --single (if eval single prompt or 11labs results)python src/habibi_tts/eval/4_cal_utmos.py -w results/11Labs_3a/MSA
# --wav-dir WAV_DIR (the folder of generated samples)Note
If conflicts after omnilingual-asr installation, e.g. flash-attn, try re-install
pip uninstall -y flash-attn && pip install flash-attn --no-build-isolation
All code is released under MIT License.
The unified, SAU, and UAE models are licensed under CC-BY-NC-SA-4.0, restricted by SADA and Mixat.
The rest specialized models (ALG, EGY, IRQ, MAR, MSA) are released under Apache 2.0 license.
