google-speech-to-text API transcription response g...

AdivHadar · 10-01-2023 02:03 AM

0

I use the code: which works fine with a sound file in the English language but why in a sound file in another language when I update the code to recognize another language accordingly I get the results in the response in numbers and not words, can you please help, I am a new student in this subject?, thanks

from google.cloud import speech

speech_client = speech.SpeechClient.from_service_account_file('key.json')
media_file_name = 't2.wav'
with open (media_file_name,'rb') as f:
byte_data_wav = f.read()
audio_wav =speech.RecognitionAudio(content=byte_data_wav)
config_wav = speech.RecognitionConfig(
sample_rate_hertz=44100,
enable_automatic_punctuation = True,
language_code ='iw-IL',
audio_channel_count=1
)
response_wav = speech_client.recognize(
config = config_wav,
audio  =  audio_wav
)
print(response_wav)




---------------------------------------------------------------------------------------------------

results {
alternatives {
transcript: "\\327\\244\\327\\250\\327\\251\\327\\252 \\327\\221\\327\\250\\327\\220\\327\\251\\327\\231\\327\\252 \\327\\221\\327\\250\\327\\220\\327\\251\\327\\231\\327\\252 \\327\\221\\327\\250\\327\\220 \\327\\220\\327\\234\\327\\225\\327\\224\\327\\231\\327\\235 \\327\\220\\327\\252 \\327\\224\\327\\251\\327\\236\\327\\231\\327\\231\\327\\235 \\327\\225\\327\\220\\327\\252 \\327\\224\\327\\220\\327\\250\\327\\245 \\327\\225\\327\\224\\327\\220\\327\\250\\327\\245 \\327\\224\\327\\231\\327\\231\\327\\252\\327\\224 \\327\\252\\327\\225\\327\\224\\327\\225 \\327\\225\\327\\221\\327\\225\\327\\224\\327\\225 \\327\\225\\327\\227\\327\\225\\327\\251\\327\\232 \\327\\242\\327\\234 \\327\\244\\327\\240\\327\\231 \\327\\252\\327\\224\\327\\225\\327\\235"
confidence: 0.9430100917816162
}
result_end_time {
seconds: 24
nanos: 540000000
}
language_code: "he-il"
}
total_billed_time {
seconds: 25
}
request_id: 2840050288344987275

nceniza

This appears to be a common behaviour for not an roman alphabet text (Possible missing classes for proper text to transcribe) Are you perhaps following any documentation for this task? for possible replication.

If you are just starting to explore the API I would suggest visiting this article for Google's STT API: https://cloud.google.com/speech-to-text/docs/quickstart

nceniza

from google.cloud import speech

speech_client = speech.SpeechClient()

media_file_name_wav = '/home/ivyjoyce/output.flac'

with open(media_file_name_wav, 'rb') as f1:
    byte_data_wav = f1.read()
audio_wav = speech.RecognitionAudio(content=byte_data_wav)

config_wav = speech.RecognitionConfig(
    #encoding='WAV', 
    #sample_rate_hertz=44100,
    #enable_automatic_punctuation=True,
    language_code ='iw-IL',
    audio_channel_count=1
)

response_standard_wav = speech_client.recognize(
    config=config_wav,
    audio=audio_wav
)

#print(response_standard_wav)

for result in response_standard_wav.results:
    print(f"Transcript: {result.alternatives[0].transcript}")

I was able to replicate your inquiry, I just have modified the way it prints the output please try this:

google-speech-to-text API transcription response gets numbers instead words