Whisper is an open-source automatic speech recognition (ASR) system developed by OpenAI, trained on a large, diverse dataset of multilingual and multitask supervised data collected from the web. It supports robust speech-to-text transcription, translation, and language identification across many languages and is designed to be particularly resilient to accents, background noise, and technical language.
Multilingual ASR and translation model optimized for NVIDIA GPUs and integrated with the Open ASR Leaderboard.
Self-supervised speech representation models often fine-tuned for specific languages or domains.