I am currently working on creating a voice bot, where entities are
extracted after transcription using STT models(have tried various models
in agent settings - phone call, default, video etc). The STT is the most
important piece of the puzzle.I wante...