Open
Conversation
Update alignment.py - added alignment for sk and sl languages
local vad model
move model to assets
Co-authored-by: Barabazs <31799121+Barabazs@users.noreply.github.com>
Co-authored-by: Barabazs <31799121+Barabazs@users.noreply.github.com>
Updated Norwegian Bokmål and Norwegian Nynorsk models Co-authored-by: Barabazs <31799121+Barabazs@users.noreply.github.com>
Force ctranslate to version 4.4.0 due libcudnn_ops_infer.so.8: SYSTRAN/faster-whisper#729 Co-authored-by: Icaro Bombonato <ibombonatosites@gmail.com>
* Update faster-whisper to 1.0.2 to enable model distil-large-v3 * feat: add hotwords option to default_asr_options --------- Co-authored-by: Barabazs <31799121+Barabazs@users.noreply.github.com>
--------- Co-authored-by: Barabazs <31799121+Barabazs@users.noreply.github.com>
* chore: bump faster-whisper to 1.1.0 * chore: bump pyannote to 3.3.2 * feat: add multilingual option in load_model function --------- Co-authored-by: Barabazs <31799121+Barabazs@users.noreply.github.com>
--------- Co-authored-by: Abhishek Sharma <abhishek@zipteams.com> Co-authored-by: Barabazs <31799121+Barabazs@users.noreply.github.com>
…mode (#867) Adds the parameter local_files_only (default False for consistency) to whisperx.load_model so that the user can avoid downloading the file and return the path to the local cached file if it exists. --------- Co-authored-by: Barabazs <31799121+Barabazs@users.noreply.github.com>
feat: restrict Python versions to 3.9 - 3.12
* docs: add troubleshooting guide for cuDNN loading errors * docs: add cuDNN version incompatibility troubleshooting
The audio_path attribute that the __call__ method of the ResultWriter class takes is a str, not TextIO
* feat: add language-aware sentence tokenization * feat: add missing punkt languages --------- Co-authored-by: pulkit <129310466+p1kit@users.noreply.github.com> Co-authored-by: Barabazs <31799121+Barabazs@users.noreply.github.com>
* fix: pin huggingface-hub<1.0.0 for pyannote-audio compatibility pyannote-audio uses the deprecated `use_auth_token` parameter which was removed in huggingface-hub v1.0.0 * fix: upgrade yanked dependencies * chore: update version to 3.7.5
* chore: drop python 3.9 support - Update requires-python to >=3.10 - Remove onnxruntime constraint (only needed for 3.9) - Simplify numpy (remove version markers and upper bound) - Remove pandas upper bound (<2.3.0 was for 3.9 compat) - Remove av direct dependency (transitive via faster-whisper) * chore(ci): remove python 3.9 from workflows - Update build-and-release to use Python 3.10 - Remove 3.9 from python-compatibility matrix * chore: bump version to 3.7.6
Replace O(n*m) pandas operations with O(n log m) interval tree queries for speaker assignment, where n = words/segments and m = diarization segments. Performance improvement: - 7-minute video (1185 words, 147 segments): 73.9s -> 0.32s (228x faster) - 3-hour podcast: Minutes of processing -> Seconds Changes: - Add IntervalTree class using sorted array + binary search - Refactor assign_word_speakers to use interval tree for overlap queries - Maintain backward compatibility with same function signature - Identical output to original implementation The interval tree uses numpy arrays for efficient storage and binary search (np.searchsorted) for O(log n) candidate finding, then filters candidates for actual overlaps. Fixes #1335
…ssignment Optimize assign_word_speakers with interval tree for 228x speedup
Fix: pass no_repeat_ngram_size and repetition_penalty to CTranslate2 generate()
[BugFix] The variable I removed was not being used anyhwere.
[BugFix] Type hint fix in decode_batch List[str] not str:
* fix: derive SRT/VTT cue times from word-level timestamps (#1315) Subtitle cue start/end times were sourced from VAD segment boundaries instead of word-level timestamps from forced alignment. This caused cues to appear prematurely and could produce backwards chronological ordering when VAD segments overlap. Use min(word starts) / max(word ends) for cue timing, falling back to segment-level times only when all words are unalignable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version to 3.7.7 in pyproject.toml --------- Co-authored-by: Claude-Assistant <noreply@anthropic.com>
…-1 (#1349) * feat: upgrade pyannote-audio dependency to v4 * fix: rename use_auth_token to token for pyannote-audio v4 compatibility * fix: add omegaconf dep * fix: use structured output API for pyannote-audio v4 diarization pyannote-audio 4.x no longer returns a plain Annotation (or tuple when> return_embeddings=True). It now returns a structured output with speaker_diarization and speaker_embeddings attributes. * feat: switch default diarization model to speaker-diarization-community-1 Update default from pyannote/speaker-diarization-3.1 to pyannote/speaker-diarization-community-1 (pyannote-audio v4), add CC-BY-4.0 attribution, and update README for v4 API changes. * fix: correct markdown link formatting for silero-vad in README.md * chore: update version to 3.8.0 Co-authored-by: Giorgio Azzinnaro <giorgio@azzinna.ro>
…g paths (#1285) - Add `model_cache_only` param to `load_align_model()`, pass as `local_files_only` to HuggingFace `from_pretrained` calls - Forward `model_dir` and `model_cache_only` to both `load_align_model` call sites (initial load and language-change reload) - Add `cache_dir` param to `DiarizationPipeline.__init__`, forward to pyannote `Pipeline.from_pretrained` - Pass `--model_dir` as `cache_dir` when constructing `DiarizationPipeline` in CLI Previously only the ASR model respected these flags. Alignment and diarization models would always download from HuggingFace to the default cache, breaking offline and custom-cache workflows. --------- Co-authored-by: Barabazs <31799121+Barabazs@users.noreply.github.com>
Forward the existing --hf_token CLI argument to faster-whisper's WhisperModel via a new use_auth_token parameter on load_model(), enabling downloads of gated/private HuggingFace models.
It works with the initial prompt added. Ran pdb to make sure and check output. Long audio works. Existing Logic is correct without flag.
added and condition before streams, existing logic is not chnaged.
[New File] benchmark testing
Batch wrap
Revert "Batch wrap"
Pass through the average log probability (transcription confidence score) from ctranslate2 to the final segment output. The field is NotRequired so existing code constructing segments without it remains valid. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.