Error: Empty String in transcription Response.
May be the issue is with how I am using the credentials.json file (which I downloaded from google cloud console speech to text api, and loaded in the c# code i.e., c# 6, dot net 4.8), but not sure whether it is causing issue or not as it does not throw any error
Here is the code to authenticate credentials.json
public LeadController() { // Get the absolute path to the root directory of the project var rootDirectory = Directory.GetParent(AppContext.BaseDirectory).FullName; var credentialPath = Path.Combine(rootDirectory, "credentials.json"); Environment.SetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS", credentialPath); _speechClient = SpeechClient.Create(); // Initialize the SpeechClient }
I created a method to send the mic-recorded audio and transcript it to text. But unfortunately, the result is always Empty string.
Then I verified whether the recorded voice was correct or not. For that, I created a method to save the mic-recorded voice into a .wav file in the project folder. For testing, I clicked the mic icon, recorded voice, and the recording was successfully saved as a .wav file. So, the audio is correct
It means the recording is correct, and the mic-recorded parameter is successfully sending the audio to the backend as a byte array. But when I transfer this byte array to google speech to text api, it returns Success status, but the result is always the empty string.
This is the front end code from where I am passing the mic audio to backend, it works fine
async function startRecording() { const stream = await navigator.mediaDevices.getUserMedia({ audio: true }); mediaRecorder = new MediaRecorder(stream); mediaRecorder.ondataavailable = event => { audioChunks.push(event.data); }; mediaRecorder.onstop = async () => { // Create a Blob from recorded chunks const audioBlob = new Blob(audioChunks, { type: 'audio/wav' }); // Add audio file to FormData const formData = new FormData(); formData.append('audio', audioBlob, 'audio.wav'); // Send to backend try { const response = await fetch('/api/speech-to-text/stream3', { method: 'POST', body: formData, // Content-Type is set automatically }); if (response.ok) { const result = await response.json(); console.log(result) console.log('Transcription:', result.transcription); } else { console.error('Error:', response.statusText); } } catch (error) { console.error('Error uploading audio:', error); } }; mediaRecorder.start();
and it is the backend c# code from where I am receiving the audio from frontend, it correctly receives the audio but when sends to api, returns the empty response
[HttpPost] [Route("api/speech-to-text/stream3")] public async Task<IHttpActionResult> StreamAudioToText3() { try { var httpRequest = HttpContext.Current.Request; // Convert the incoming audio stream into a Google Cloud Speech recognition request var audioFile = httpRequest.Files["audio"]; byte[] audioData; using (var memoryStream = new MemoryStream()) { await audioFile.InputStream.CopyToAsync(memoryStream); audioData = memoryStream.ToArray(); } File.WriteAllBytes("D:\\Projects\\soldster GIT Google Speech to Text\\soldster GIT\\output.wav", audioData); // Create the recognition config var config = new RecognitionConfig { Encoding = RecognitionConfig.Types.AudioEncoding.Linear16, // Assuming linear 16-bit PCM encoding SampleRateHertz = 16000, // Modify based on the audio recording configuration LanguageCode = "en-US", EnableAutomaticPunctuation = true, // Optional: improves readability Model = "default" }; // Create the streaming recognize request var streamingCall = _speechClient.StreamingRecognize(); // Start the stream and send the initial request for config await streamingCall.WriteAsync(new StreamingRecognizeRequest { StreamingConfig = new StreamingRecognitionConfig { Config = config, InterimResults = true // Optionally set to true for interim results } }); // Process each audio chunk foreach (var chunk in audioData) { var recognitionAudio = new RecognitionAudio { Content = ByteString.CopyFrom(chunk) }; // Send the audio content to the API await streamingCall.WriteAsync(new StreamingRecognizeRequest { AudioContent = recognitionAudio.Content }); } // Close the request stream await streamingCall.WriteCompleteAsync(); // Process the responses from the stream string transcription = string.Empty; // Use MoveNext() to handle the response asynchronously var responseStream = streamingCall.GetResponseStream(); while (await responseStream.MoveNextAsync()) // Use MoveNextAsync to check for next response { var response = responseStream.Current; // Check if there are results in the response if (response.Results.Count > 0) { var result = response.Results[0]; // Check if there are alternatives and transcriptions available if (result.Alternatives.Count > 0) { var alternative = result.Alternatives[0]; transcription += alternative.Transcript + " "; // Append the transcription } } } // Return the transcription return Ok(new { transcription }); } catch (Exception ex) { // Log any errors here (using a logging service, etc.) return InternalServerError(ex); }
}
I researched about it further, created many versions of the c# code but still have the same issue.
If I go to google console and upload the recorded .wav file and click on transcript button, it successfully transcripts the file. But when using the API, it returns an empty response. Here is google cloud console response
I would really appreciate any guidance. Thanks
Hi @TheAliHaider,
Welcome to Google Cloud Community!
The problem is that you're transmitting audio data byte to byte (foreach (var chunk in audioData)) to the streaming API. The Google Cloud Speech-to-Text API requires audio to be sent in chunks, with each chunk representing a portion of the audio stream, not individual bytes.
Here are some potential ways to address your issue:
In addition, after making the changes, test with different audio files, check the error messages and network logs, and use the commented-out File.WriteAllBytes line to save and verify the WAV file for debugging.
You may refer to the following documentation, which can help you understand the required conceptual information and API specifications:
I hope the above information is helpful.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |