The whisperX API is a tool for enhancing and analyzing audio content. This API provides a suite of services for processing audio and video files, including transcription, alignment, diarization, and combining transcript with diarization results.
Swagger UI is available at /docs for all the services, dump of OpenAPI definition is awailable in folder app/docs as well. You can explore it directly in Swagger Editor
See the WhisperX Documentation for details on whisperX functions.
- in
.envyou can define default LanguageDEFAULT_LANG, if not defined en is used (you can also set it in the request) .envcontains defintion of Whisper model usingWHISPER_MODEL(you can also set it in the request)
Status and result of each tasks are stored in db using ORM Sqlalchemy, db connection is defined by enviroment variable DB_URL if value is not specified db.py sets default db as sqlite:///records.db
See documentation for driver definition at Sqlalchemy Engine configuration if you want to connect other type of db than Sqlite.
Structure of the of the db is described in DB Schema
To get started with the API, follow these steps:
- Create virtual enviroment
- Install pytorch See for more details
- Install whisperX
pip install git+https://github.com/m-bain/whisperx.git
- Install the required dependencies:
pip install -r requirements.txt
- Create
.envfile
define your Whisper Model and token for Huggingface
HF_TOKEN=<<YOUR HUGGINGFACE TOKEN>>
WHISPER_MODEL=<<WHISPER MODEL SIZE>>
- Run the FastAPI application:
uvicorn app.main:app --reload
The API will be accessible at http://127.0.0.1:8000.
- Create
.envfile
HF_TOKEN=<<YOUR HUGGINGFACE TOKEN>>
WHISPER_MODEL=<<WHISPER MODEL SIZE>>
- Build Image
using docker-compose.yaml
#build and start the image using compose file
docker-compose upalternative approach
#build image
docker build -t whisperx-service .
# Run Container
docker run -d --gpus all -p 8000:8000 --env-file .env whisperx-serviceThe API will be accessible at http://127.0.0.1:8000.
The models used by whisperX are stored in root/.cache, if you want to avoid downloanding the models each time the container is starting you can store the cache in persistant storage. docker-compose.yaml defines a volumne whisperx-models-cache to store this cache.
- faster-whisper cache:
root/.cache/huggingface/hub - pyannotate and other models cache:
root/.cache/torch