ESpeechServer is a lightweight Flask-based backend server designed for speech-to-text conversion using the ESpeech Library. With support for audio uploads, it seamlessly transcribes speech using Google’s Speech Recognition engine. Deployed easily on platforms like Render, this project is perfect for integrating voice interfaces into web and IoT applications.
- 🔊 Accepts raw audio data (
.wav) via HTTP POST requests - 🧠 Uses Google Speech Recognition for high-accuracy transcription
- ⚙️ Built with Flask and SpeechRecognition library
- ☁️ Easily deployable on Render or similar cloud platforms
- 🔁 Simple API endpoint for quick integration
- 🎧 Ready to integrate with ESpeech client libraries or custom ESP32 IoT devices
- Flask – Lightweight WSGI web application framework
- SpeechRecognition – Python library for performing speech recognition
- Pydub – Audio handling made easy
- Gunicorn – Production WSGI server for Python apps
git clone https://github.com/yourusername/ESpeechServer.git
cd ESpeechServerpip install -r requirements.txtpython app.pyThe server will start at http://0.0.0.0:8888.
POST /uploadAudio
Uploads an audio file (in .wav format) and returns a JSON response with the transcribed text.
Content-Type: audio/wav
curl -X POST http://localhost:8888/uploadAudio --data-binary "@yourfile.wav"{
"transcription": "Hello, how are you?"
}- Create a new Web Service on Render.
- Link your GitHub repo.
- Set the environment:
- Build Command:
pip install -r requirements.txt - Start Command:
gunicorn app:app
- Build Command:
- Deploy and you're done!
You can modify the speech_to_text() function in app.py to use other engines like:
- Sphinx (Offline)
- Azure Speech
- IBM Speech to Text
Need a visual walkthrough?
🎥 Coming soon: Watch the Tutorial Video
Have suggestions or improvements? Feel free to open an issue or submit a PR!
This project is licensed under the MIT License. See the LICENSE file for details.