I use the code: which works fine with a sound file in the English language but why in a sound file in another language when I update the code to recognize another language accordingly I get the results in the response in numbers and not words, can you please help, I am a new student in this subject?, thanks
from google.cloud import speech speech_client = speech.SpeechClient.from_service_account_file('key.json') media_file_name = 't2.wav' with open (media_file_name,'rb') as f: byte_data_wav = f.read() audio_wav =speech.RecognitionAudio(content=byte_data_wav) config_wav = speech.RecognitionConfig( sample_rate_hertz=44100, enable_automatic_punctuation = True, language_code ='iw-IL', audio_channel_count=1 ) response_wav = speech_client.recognize( config = config_wav, audio = audio_wav ) print(response_wav) --------------------------------------------------------------------------------------------------- results { alternatives { transcript: "\\327\\244\\327\\250\\327\\251\\327\\252 \\327\\221\\327\\250\\327\\220\\327\\251\\327\\231\\327\\252 \\327\\221\\327\\250\\327\\220\\327\\251\\327\\231\\327\\252 \\327\\221\\327\\250\\327\\220 \\327\\220\\327\\234\\327\\225\\327\\224\\327\\231\\327\\235 \\327\\220\\327\\252 \\327\\224\\327\\251\\327\\236\\327\\231\\327\\231\\327\\235 \\327\\225\\327\\220\\327\\252 \\327\\224\\327\\220\\327\\250\\327\\245 \\327\\225\\327\\224\\327\\220\\327\\250\\327\\245 \\327\\224\\327\\231\\327\\231\\327\\252\\327\\224 \\327\\252\\327\\225\\327\\224\\327\\225 \\327\\225\\327\\221\\327\\225\\327\\224\\327\\225 \\327\\225\\327\\227\\327\\225\\327\\251\\327\\232 \\327\\242\\327\\234 \\327\\244\\327\\240\\327\\231 \\327\\252\\327\\224\\327\\225\\327\\235" confidence: 0.9430100917816162 } result_end_time { seconds: 24 nanos: 540000000 } language_code: "he-il" } total_billed_time { seconds: 25 } request_id: 2840050288344987275
This appears to be a common behaviour for not an roman alphabet text (Possible missing classes for proper text to transcribe) Are you perhaps following any documentation for this task? for possible replication.
If you are just starting to explore the API I would suggest visiting this article for Google's STT API: https://cloud.google.com/speech-to-text/docs/quickstart
from google.cloud import speech
speech_client = speech.SpeechClient()
media_file_name_wav = '/home/ivyjoyce/output.flac'
with open(media_file_name_wav, 'rb') as f1:
byte_data_wav = f1.read()
audio_wav = speech.RecognitionAudio(content=byte_data_wav)
config_wav = speech.RecognitionConfig(
#encoding='WAV',
#sample_rate_hertz=44100,
#enable_automatic_punctuation=True,
language_code ='iw-IL',
audio_channel_count=1
)
response_standard_wav = speech_client.recognize(
config=config_wav,
audio=audio_wav
)
#print(response_standard_wav)
for result in response_standard_wav.results:
print(f"Transcript: {result.alternatives[0].transcript}")
I was able to replicate your inquiry, I just have modified the way it prints the output please try this:
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |