Skip to content

LUMII-AILab/LATE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LATE: Toolkit for Private Speech Transcription

The LATE toolkit consists of three components:

  1. Front-end - a single-page web application built with vanilla JavaScript and several third-party libraries (Silero VAD, FFmpeg, ProseMirror, WaveSurfer).
  2. Back-end - a statically compiled and linked binary that includes a web server, the SQLite database engine, the whisper.cpp engine, and optionally, the CUDA library.
  3. Speech models - a Whisper-based ASR (automatic speech recognition) model in the GGML format, compatible with the whisper.cpp engine, and a VAD (voice activity detection) model.

Running LATE locally

Currently, a precompiled LATE binary is available for macOS on Apple Silicon (M1, M2, M3, or M4).

A binary release for Linux/x86_64 systems is in progress.

First, download the latest binary release (late-<date>-darwin-universal.tgz) from the releases page, then unpack it.

In a Terminal application, change to the directory where you unpacked LATE. Then download the ASR and VAD models by executing the following command line:

bash download_models.sh <lang>

Replace <lang> with lv or ltg to download the fine-tuned Latvian or Latgalian model (Q8-quantized), respectively. If the <lang> parameter is omitted, the original Whisper large-v3 model (Q5-quantized) will be downloaded.

Note that downloading the ASR model may take some time.

Finally, run the LATE front-end and back-end by executing:

bash run_late.sh

The front-end will open in your default web browser. Please, wait a few seconds while the back-end loads and the front-end automatically reloads.

Acknowledgements

This work was funded by the EU Recovery and Resilience Facility's project Language Technology Initiative (2.3.1.1.i.0/1/22/I/CFLA/002) in synergy with the State Research Programme's project Research on Modern Latvian Language and Development of Language Technology (VPP-LETONIKA-2021/1-0006).

About

Open Source Toolkit for Private Speech Transcription

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •