Hello,
Can someone help me with this problem ? I'm really struggling with this.
Hi @VitorBoldrin,
Welcome and appreciate you reaching out to our community for help.
I understand that you are having issues with your transcript having mixed up speaker dialogues. I have encountered a somewhat similar case and have suggested exploring Speaker diarization to detect different speakers. A Python code sample is also available to tag the speakers accordingly and get better transcriptions.
from google.cloud import speech_v1p1beta1 as speech
client = speech.SpeechClient()
speech_file = "resources/commercial_mono.wav"
with open(speech_file, "rb") as audio_file:
content = audio_file.read()
audio = speech.RecognitionAudio(content=content)
diarization_config = speech.SpeakerDiarizationConfig(
enable_speaker_diarization=True,
min_speaker_count=2,
max_speaker_count=10,
)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
language_code="en-US",
diarization_config=diarization_config,
)
print("Waiting for operation to complete...")
response = client.recognize(config=config, audio=audio)
# The transcript within each result is separate and sequential per result.
# However, the words list within an alternative includes all the words
# from all the results thus far. Thus, to get all the words with speaker
# tags, you only have to take the words list from the last result:
result = response.results[-1]
words_info = result.alternatives[0].words
# Printing out the output:
for word_info in words_info:
print(f"word: '{word_info.word}', speaker_tag: {word_info.speaker_tag}")
return result
Hope this helps.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |