@Frando is working on this as well.
KALDI/VOSK is usually a good starting point.
It gets tricky when you want:
- audio preprocessing
- post processing (speaker diarisation, de-duplicate, capitalization…)
- user interface for manual correction
- seamless integration with podcasting platforms
- easy to install (we probably need to provide a Docker image)