I am currently working on creating a voice bot, where entities are extracted after transcription using STT models(have tried various models in agent settings - phone call, default, video etc). The STT is the most important piece of the puzzle.
I wanted to explore the other open source STT API like AssemblyAI/Deepgram which can be done if can get recording URL specific to the user input.
Is there a way to get the URL in the same page of the flow?
Hi @ritikUber,
Welcome to the community and we appreciate you reaching out for help.
There is no specific guide for your use case, however, there is a possible way through Google Cloud Storage URI but it comes with certain limitations. GCS URI only records and saves the Dialogflow response audio, you can check the Speech and IVR settings documentation for more details.
You can also look into the Telephony integrations, there are features that allows recording of end-user queries but please do note that such audio queries are converted to text before it is sent to Dialogflow.
Here are some other related resources that can be of help:
Hi @lsolatorio
Thanks for your response!
Could you pleases share more insights on STT models used - Does Dialogflow CX use streaming API or does it perform similar to using the pre-recorded audio with models.
Furthermore, do we have any numbers on benchmark of Dialogflow CX STT( with auto speech adaptation) against any other sources - OpenAI Whisper etc.
Additionally, will the results from Dialogflow CX particular model(giving user voice input) be similar to offline transcription using API call(passing the recorded user input in mp3 file) with the same model?