Speech-to-Text: .JSON to CONTINUOUS TEXT

Airen · 09-22-2023 11:14 AM

I used Google´s AI to convert an mp3 file to a .json file. The .json file starts with one paragraph of continuous text, but then gives me lots of code, and only one word at a time:

{"results":[{"alternatives":[{"transcript":" nice to meet you nice to meet you too how's it going good yeah very good thank you uh we this interview is is for a german magazine","words":[{"startOffset":"0.120s","endOffset":"0.280s","word":"nice"},{"startOffset":"0.280s","endOffset":"0.360s","word":"to"},{"startOffset":"0.360s","endOffset":"0.560s","word":"meet"},{"startOffset":"0.560s","endOffset":"1.240s","word":"you"},{"startOffset":"1.240s","endOffset":"1.400s","word":"nice"},{"startOffset":"1.400s","endOffset":"1.480s","word":"to"},{"startOffset":"1.480s","endOffset":"1.640s","word":"meet"},

I would like to have the whole transcript in continuous text. How can I do this?

lsolatorio

Hi @Airen,

Welcome and thank you for reaching out to our community.

I understand that you wanted simpler transcription results, omitting the offset timestamps of each word. You might want to review your codes and look for the "enable_word_time_offsets". when this is set to TRUE, it will capture start and end time of the word/s that is being transcribed.

If true, the top result includes a list of words and the start and end time offsets (timestamps) for those words. If false, no word-level time offset information is returned. The default is false.

If in case you need to know more about these concepts, here are some useful references for you.

I hope this information is helpful.