Google Speech to Text Empty Transcription Response - Page 2

TheAliHaider · 11-28-2024 02:11 AM

Error: Empty String in transcription Response.

May be the issue is with how I am using the credentials.json file (which I downloaded from google cloud console speech to text api, and loaded in the c# code i.e., c# 6, dot net 4.8), but not sure whether it is causing issue or not as it does not throw any error

Here is the code to authenticate credentials.json

public LeadController()
 {
    // Get the absolute path to the root directory of the project
    var rootDirectory = 
    Directory.GetParent(AppContext.BaseDirectory).FullName;
    var credentialPath = Path.Combine(rootDirectory, 
    "credentials.json");

     Environment.SetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS", credentialPath);
     _speechClient = SpeechClient.Create(); // Initialize the 
     SpeechClient
}

I created a method to send the mic-recorded audio and transcript it to text. But unfortunately, the result is always Empty string.

Then I verified whether the recorded voice was correct or not. For that, I created a method to save the mic-recorded voice into a .wav file in the project folder. For testing, I clicked the mic icon, recorded voice, and the recording was successfully saved as a .wav file. So, the audio is correct

It means the recording is correct, and the mic-recorded parameter is successfully sending the audio to the backend as a byte array. But when I transfer this byte array to google speech to text api, it returns Success status, but the result is always the empty string.

This is the front end code from where I am passing the mic audio to backend, it works fine

async function startRecording() {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
mediaRecorder = new MediaRecorder(stream);

mediaRecorder.ondataavailable = event => {
    audioChunks.push(event.data);
};

mediaRecorder.onstop = async () => {
    // Create a Blob from recorded chunks
    const audioBlob = new Blob(audioChunks, { type: 'audio/wav' });

    // Add audio file to FormData
    const formData = new FormData();
    formData.append('audio', audioBlob, 'audio.wav');

    // Send to backend
    try {
        const response = await fetch('/api/speech-to-text/stream3', {
            method: 'POST',
            body: formData, // Content-Type is set automatically
        });

        if (response.ok) {
            const result = await response.json();
            console.log(result)
            console.log('Transcription:', result.transcription);
        } else {
            console.error('Error:', response.statusText);
        }
    } catch (error) {
        console.error('Error uploading audio:', error);
    }
};


mediaRecorder.start();

and it is the backend c# code from where I am receiving the audio from frontend, it correctly receives the audio but when sends to api, returns the empty response

[HttpPost]
[Route("api/speech-to-text/stream3")]
public async Task<IHttpActionResult> StreamAudioToText3()
{
  try
  {
    var httpRequest = HttpContext.Current.Request;
    // Convert the incoming audio stream into a Google Cloud Speech recognition request
    var audioFile = httpRequest.Files["audio"];
    byte[] audioData;
    using (var memoryStream = new MemoryStream())
    {
        await audioFile.InputStream.CopyToAsync(memoryStream);
        audioData = memoryStream.ToArray();
    }
    File.WriteAllBytes("D:\\Projects\\soldster GIT Google Speech to Text\\soldster GIT\\output.wav", audioData);

    // Create the recognition config
    var config = new RecognitionConfig
    {
        Encoding = RecognitionConfig.Types.AudioEncoding.Linear16, // Assuming linear 16-bit PCM encoding
        SampleRateHertz = 16000, // Modify based on the audio recording configuration
        LanguageCode = "en-US",
        EnableAutomaticPunctuation = true, // Optional: improves readability
        Model = "default"
    };

    // Create the streaming recognize request
    var streamingCall = _speechClient.StreamingRecognize();

    // Start the stream and send the initial request for config
    await streamingCall.WriteAsync(new StreamingRecognizeRequest
    {
        StreamingConfig = new StreamingRecognitionConfig
        {
            Config = config,
            InterimResults = true // Optionally set to true for interim results
        }
    });

    // Process each audio chunk
    foreach (var chunk in audioData)
    {
        var recognitionAudio = new RecognitionAudio
        {
            Content = ByteString.CopyFrom(chunk)
        };

        // Send the audio content to the API
        await streamingCall.WriteAsync(new StreamingRecognizeRequest
        {
            AudioContent = recognitionAudio.Content
        });
    }

    // Close the request stream
    await streamingCall.WriteCompleteAsync();

    // Process the responses from the stream
    string transcription = string.Empty;

    // Use MoveNext() to handle the response asynchronously
    var responseStream = streamingCall.GetResponseStream();
    while (await responseStream.MoveNextAsync())  // Use MoveNextAsync to check for next response
    {
        var response = responseStream.Current;

        // Check if there are results in the response
        if (response.Results.Count > 0)
        {
            var result = response.Results[0];

            // Check if there are alternatives and transcriptions available
            if (result.Alternatives.Count > 0)
            {
                var alternative = result.Alternatives[0];
                transcription += alternative.Transcript + " "; // Append the transcription
            }
        }
    }

    // Return the transcription
    return Ok(new { transcription });
}
catch (Exception ex)
{
    // Log any errors here (using a logging service, etc.)
    return InternalServerError(ex);
}
}

I researched about it further, created many versions of the c# code but still have the same issue.

If I go to google console and upload the recorded .wav file and click on transcript button, it successfully transcripts the file. But when using the API, it returns an empty response. Here is google cloud console response

I would really appreciate any guidance. Thanks