Hello.
I'm someone who's trying to make speech-to-text work without being a coder in any way whatsoever. I have let's say hundreds of individual audio files and they go from 30 seconds to a minute and a half. The problem is that uploading them to the bucket makes it so there's hundreds of individual ones. And I need to create a transcriptions individually. what do I do? can I not just transcribe everything in one folder?
It requires coding - you load files in a bucket (say in 'input' folder) and run a background job to produce a "txt" file for each using speech-to-text API (say in 'output' folder). If you have files formats such as mp4 then use transcoding. This is the step roughly.
Understood. But do you have any idea how I'd code that? Of course I am unfamiliar with how to code... but anyway no problem at all if you cannot help there.
Coding it will not be easy if you are not a programmer. It requires a bunch of technology and tools. Roughly steps would be:
1. Using gsutil tool of the GCP, upload files to a bucket.
2. Write a program to read file from the bucket and invoke Speech-to-Text API. It will require you to acquire an access-token (OAuth2).
3. If files are small then you could do 2 without uploading files to the bucket.
You can find the example programs here: https://cloud.google.com/speech-to-text/docs/samples