speech-diarization-lab: live output

Speaker-attributed transcription from the clustered backend (silero VAD + ECAPA-TDNN + agglomerative clustering) aligned to faster-whisper words. Click any word or the timeline to seek; the transcript follows the audio. Speaker count is estimated, not given. Toggle show reference to see the ground-truth turns outlined under the predictions; gaps between filled and outlined bands are the diarization errors the benchmark scores.

Audio is synthetic conversation built from LibriSpeech dev-clean (read speech, no overlap), the same seeded mixtures the benchmark scores. Serve locally with python -m http.server -d demo if not viewing on GitHub Pages.