Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Speech to Text is not showing for the complete audio file

Hi There,

I have an audio file(Language: Malayalam, India). This audio file has drama with more than one speaker(not at same time). when i try to Transcribe(v1/v2), only few dialogues are getting transcribed. I need to transcribe content of entire audio file. my file is with a length of 60 seconds.(short) can anyone please assist?.

3 8 497
8 REPLIES 8

It is possible that the file has a quality issues, I recommend following this documentation for best practices when sending files to the API( factors my vary from file types to encodings etc etc) 

Thanks. But, The words which are already transcribed have the same quality of other words which are not being transcribed. is diarization(multi speakers) not incorporated for malayalam language in Google speech to Text service?

It is incorporated but as of the moment is in preview stage: (which currently has a limited support ) Can you try following these steps found here for enabling speaker diarization? 

I tried to enable Speaker Diarization. I am able to see only "speaker_tag:1". When i refer with the actual audio file, words from other speakers are missing. As per the documentation here , i have changed only the file name, AudioEncoding.MP3 and language_code="ml-IN" only. Please see below the response. Kindly let me know if you need my audio which i am testing.



response: results {
alternatives {
transcript: "\340\264\241\340\265\215\340\264\260\340\265\210\340\264\265\340\265\274 \340\264\265\340\264\243\340\265\215\340\264\237\340\264\277 \340\264\250\340\264\277\340\265\274\340\264\244\340\265\215\340\264\244\340\264\277"
confidence: 0.8085001
words {
start_time {
seconds: 2
}
end_time {
seconds: 4
nanos: 300000000
}
word: "\340\264\241\340\265\215\340\264\260\340\265\210\340\264\265\340\265\274"
}
words {
start_time {
seconds: 4
nanos: 300000000
}
end_time {
seconds: 4
nanos: 600000000
}
word: "\340\264\265\340\264\243\340\265\215\340\264\237\340\264\277"
}
words {
start_time {
seconds: 4
nanos: 600000000
}
end_time {
seconds: 4
nanos: 800000000
}
word: "\340\264\250\340\264\277\340\265\274\340\264\244\340\265\215\340\264\244\340\264\277"
}
}
result_end_time {
seconds: 10
nanos: 200000000
}
language_code: "ml-in"
}
results {
alternatives {
transcript: " \340\264\250\340\264\277\340\264\231\340\265\215\340\264\231\340\265\276 \340\264\206\340\264\260\340\264\276\340\264\247\340\264\225\340\264\260\340\265\206 \340\264\265\340\264\260\340\265\201\340\264\250\340\265\215\340\264\250\340\264\244\340\265\215 \340\264\236\340\264\231\340\265\215\340\264\231\340\265\276 \340\264\254\340\265\213\340\264\202\340\264\254\340\265\206\340\264\257\340\264\277\340\264\262\340\264\276\340\264\257\340\264\277\340\264\260\340\265\201\340\264\250\340\265\215\340\264\250\340\265\201 \340\264\225\340\265\215\340\264\225\340\265\201\340\264\263\340\265\215\340\264\263 \340\264\265\340\264\264\340\264\277 \340\264\222\340\264\250\340\265\215\340\264\250\340\265\201 \340\264\252\340\264\261\340\264\236\340\265\215\340\264\236\340\265\201\340\264\244\340\264\260\340\265\201\340\264\256\340\265\213 \340\264\271\340\264\262\340\265\213 \340\264\222\340\264\250\340\265\215\340\264\250\340\265\215 \340\264\232\340\265\213\340\264\246\340\264\277\340\264\232\340\265\215\340\264\232\340\265\213\340\264\237\340\265\215\340\264\237\340\265\206 \340\264\265\340\264\264\340\264\277 \340\264\222\340\264\250\340\265\215\340\264\250\340\265\215 \340\264\252\340\264\261\340\264\236\340\265\215\340\264\236\340\265\201 \340\264\244\340\264\260\340\264\276\340\264\256\340\265\213 \340\264\225\340\265\212\340\264\237\340\265\215\340\264\237\340\264\276\340\264\260\340\264\244\340\265\215\340\264\244\340\264\277\340\264\262\340\265\207\340\264\225\340\265\215\340\264\225\340\265\215 \340\264\256\340\264\261\340\265\201\340\264\252\340\264\237\340\264\277 \340\264\225\340\264\277\340\264\237\340\265\215\340\264\237\340\265\201\340\264\250\340\265\215\340\264\250\340\264\277\340\264\262\340\265\215\340\264\262\340\264\262\340\265\215\340\264\262\340\265\213 \340\264\250\340\264\277\340\264\231\340\265\215\340\264\231\340\265\276\340\264\225\340\265\215\340\264\225\340\265\215 \340\264\216\340\264\231\340\265\215\340\264\225\340\264\277\340\264\262\340\265\201\340\264\202 \340\264\265\340\264\264\340\264\277 \340\264\205\340\264\261\340\264\277\340\264\257\340\264\276\340\264\256\340\265\213 \340\264\205\340\264\261\340\264\277\340\264\257\340\264\244\340\265\215\340\264\244\340\264\277\340\264\262\340\265\215\340\264\262 \340\264\265\340\264\264\340\264\277 \340\264\205\340\264\261\340\264\277\340\264\257\340\264\276\340\264\250\340\265\201\340\264\263\340\265\215\340\264\263 \340\264\257\340\264\234\340\265\215\340\264\236\340\264\202"
confidence: 0.7363452
words {
start_time {
seconds: 13
}
end_time {
seconds: 13
nanos: 600000000
}
word: "\340\264\250\340\264\277\340\264\231\340\265\215\340\264\231\340\265\276"
}
words {
start_time {
seconds: 13
nanos: 600000000
}
end_time {
seconds: 14
nanos: 100000000
}
word: "\340\264\206\340\264\260\340\264\276\340\264\247\340\264\225\340\264\260\340\265\206"
}
words {
start_time {
seconds: 14
nanos: 100000000
}
end_time {
seconds: 14
nanos: 700000000
}
word: "\340\264\265\340\264\260\340\265\201\340\264\250\340\265\215\340\264\250\340\264\244\340\265\215"
}
words {
start_time {
seconds: 14
nanos: 700000000
}
end_time {
seconds: 15
nanos: 200000000
}
word: "\340\264\236\340\264\231\340\265\215\340\264\231\340\265\276"
}
words {
start_time {
seconds: 15
nanos: 200000000
}
end_time {
seconds: 15
nanos: 400000000
}
word: "\340\264\254\340\265\213\340\264\202\340\264\254\340\265\206\340\264\257\340\264\277\340\264\262\340\264\276\340\264\257\340\264\277\340\264\260\340\265\201\340\264\250\340\265\215\340\264\250\340\265\201"
}
words {
start_time {
seconds: 15
nanos: 400000000
}
end_time {
seconds: 17
}
word: "\340\264\225\340\265\215\340\264\225\340\265\201\340\264\263\340\265\215\340\264\263"
}
words {
start_time {
seconds: 17
}
end_time {
seconds: 17
nanos: 100000000
}
word: "\340\264\265\340\264\264\340\264\277"
}
words {
start_time {
seconds: 17
nanos: 100000000
}
end_time {
seconds: 17
nanos: 200000000
}
word: "\340\264\222\340\264\250\340\265\215\340\264\250\340\265\201"
}
words {
start_time {
seconds: 17
nanos: 200000000
}
end_time {
seconds: 17
nanos: 400000000
}
word: "\340\264\252\340\264\261\340\264\236\340\265\215\340\264\236\340\265\201\340\264\244\340\264\260\340\265\201\340\264\256\340\265\213"
}
words {
start_time {
seconds: 17
nanos: 400000000
}
end_time {
seconds: 30
nanos: 500000000
}
word: "\340\264\271\340\264\262\340\265\213"
}
words {
start_time {
seconds: 30
nanos: 500000000
}
end_time {
seconds: 30
nanos: 700000000
}
word: "\340\264\222\340\264\250\340\265\215\340\264\250\340\265\215"
}
words {
start_time {
seconds: 30
nanos: 700000000
}
end_time {
seconds: 31
}
word: "\340\264\232\340\265\213\340\264\246\340\264\277\340\264\232\340\265\215\340\264\232\340\265\213\340\264\237\340\265\215\340\264\237\340\265\206"
}
words {
start_time {
seconds: 31
}
end_time {
seconds: 33
nanos: 100000000
}
word: "\340\264\265\340\264\264\340\264\277"
}
words {
start_time {
seconds: 33
nanos: 100000000
}
end_time {
seconds: 33
nanos: 200000000
}
word: "\340\264\222\340\264\250\340\265\215\340\264\250\340\265\215"
}
words {
start_time {
seconds: 33
nanos: 200000000
}
end_time {
seconds: 33
nanos: 300000000
}
word: "\340\264\252\340\264\261\340\264\236\340\265\215\340\264\236\340\265\201"
}
words {
start_time {
seconds: 33
nanos: 300000000
}
end_time {
seconds: 33
nanos: 700000000
}
word: "\340\264\244\340\264\260\340\264\276\340\264\256\340\265\213"
}
words {
start_time {
seconds: 33
nanos: 700000000
}
end_time {
seconds: 34
nanos: 800000000
}
word: "\340\264\225\340\265\212\340\264\237\340\265\215\340\264\237\340\264\276\340\264\260\340\264\244\340\265\215\340\264\244\340\264\277\340\264\262\340\265\207\340\264\225\340\265\215\340\264\225\340\265\215"
}
words {
start_time {
seconds: 34
nanos: 800000000
}
end_time {
seconds: 37
nanos: 300000000
}
word: "\340\264\256\340\264\261\340\265\201\340\264\252\340\264\237\340\264\277"
}
words {
start_time {
seconds: 37
nanos: 300000000
}
end_time {
seconds: 38
nanos: 200000000
}
word: "\340\264\225\340\264\277\340\264\237\340\265\215\340\264\237\340\265\201\340\264\250\340\265\215\340\264\250\340\264\277\340\264\262\340\265\215\340\264\262\340\264\262\340\265\215\340\264\262\340\265\213"
}
words {
start_time {
seconds: 38
nanos: 200000000
}
end_time {
seconds: 38
nanos: 600000000
}
word: "\340\264\250\340\264\277\340\264\231\340\265\215\340\264\231\340\265\276\340\264\225\340\265\215\340\264\225\340\265\215"
}
words {
start_time {
seconds: 38
nanos: 600000000
}
end_time {
seconds: 38
nanos: 900000000
}
word: "\340\264\216\340\264\231\340\265\215\340\264\225\340\264\277\340\264\262\340\265\201\340\264\202"
}
words {
start_time {
seconds: 38
nanos: 900000000
}
end_time {
seconds: 39
}
word: "\340\264\265\340\264\264\340\264\277"
}
words {
start_time {
seconds: 39
}
end_time {
seconds: 39
nanos: 400000000
}
word: "\340\264\205\340\264\261\340\264\277\340\264\257\340\264\276\340\264\256\340\265\213"
}
words {
start_time {
seconds: 39
nanos: 400000000
}
end_time {
seconds: 40
nanos: 700000000
}
word: "\340\264\205\340\264\261\340\264\277\340\264\257\340\264\244\340\265\215\340\264\244\340\264\277\340\264\262\340\265\215\340\264\262"
}
words {
start_time {
seconds: 40
nanos: 700000000
}
end_time {
seconds: 41
nanos: 300000000
}
word: "\340\264\265\340\264\264\340\264\277"
}
words {
start_time {
seconds: 41
nanos: 300000000
}
end_time {
seconds: 41
nanos: 800000000
}
word: "\340\264\205\340\264\261\340\264\277\340\264\257\340\264\276\340\264\250\340\265\201\340\264\263\340\265\215\340\264\263"
}
words {
start_time {
seconds: 41
nanos: 800000000
}
end_time {
seconds: 56
nanos: 100000000
}
word: "\340\264\257\340\264\234\340\265\215\340\264\236\340\264\202"
}
}
result_end_time {
seconds: 60
nanos: 670000000
}
language_code: "ml-in"
}
results {
alternatives {
words {
start_time {
seconds: 2
}
end_time {
seconds: 4
nanos: 300000000
}
word: "\340\264\241\340\265\215\340\264\260\340\265\210\340\264\265\340\265\274"
speaker_tag: 1
}
words {
start_time {
seconds: 4
nanos: 300000000
}
end_time {
seconds: 4
nanos: 600000000
}
word: "\340\264\265\340\264\243\340\265\215\340\264\237\340\264\277"
speaker_tag: 1
}
words {
start_time {
seconds: 4
nanos: 600000000
}
end_time {
seconds: 4
nanos: 800000000
}
word: "\340\264\250\340\264\277\340\265\274\340\264\244\340\265\215\340\264\244\340\264\277"
speaker_tag: 1
}
words {
start_time {
seconds: 13
}
end_time {
seconds: 13
nanos: 600000000
}
word: "\340\264\250\340\264\277\340\264\231\340\265\215\340\264\231\340\265\276"
speaker_tag: 1
}
words {
start_time {
seconds: 13
nanos: 600000000
}
end_time {
seconds: 14
nanos: 100000000
}
word: "\340\264\206\340\264\260\340\264\276\340\264\247\340\264\225\340\264\260\340\265\206"
speaker_tag: 1
}
words {
start_time {
seconds: 14
nanos: 100000000
}
end_time {
seconds: 14
nanos: 700000000
}
word: "\340\264\265\340\264\260\340\265\201\340\264\250\340\265\215\340\264\250\340\264\244\340\265\215"
speaker_tag: 1
}
words {
start_time {
seconds: 14
nanos: 700000000
}
end_time {
seconds: 15
nanos: 200000000
}
word: "\340\264\236\340\264\231\340\265\215\340\264\231\340\265\276"
speaker_tag: 1
}
words {
start_time {
seconds: 15
nanos: 200000000
}
end_time {
seconds: 15
nanos: 400000000
}
word: "\340\264\254\340\265\213\340\264\202\340\264\254\340\265\206\340\264\257\340\264\277\340\264\262\340\264\276\340\264\257\340\264\277\340\264\260\340\265\201\340\264\250\340\265\215\340\264\250\340\265\201"
speaker_tag: 1
}
words {
start_time {
seconds: 15
nanos: 400000000
}
end_time {
seconds: 17
}
word: "\340\264\225\340\265\215\340\264\225\340\265\201\340\264\263\340\265\215\340\264\263"
speaker_tag: 1
}
words {
start_time {
seconds: 17
}
end_time {
seconds: 17
nanos: 100000000
}
word: "\340\264\265\340\264\264\340\264\277"
speaker_tag: 1
}
words {
start_time {
seconds: 17
nanos: 100000000
}
end_time {
seconds: 17
nanos: 200000000
}
word: "\340\264\222\340\264\250\340\265\215\340\264\250\340\265\201"
speaker_tag: 1
}
words {
start_time {
seconds: 17
nanos: 200000000
}
end_time {
seconds: 17
nanos: 400000000
}
word: "\340\264\252\340\264\261\340\264\236\340\265\215\340\264\236\340\265\201\340\264\244\340\264\260\340\265\201\340\264\256\340\265\213"
speaker_tag: 1
}
words {
start_time {
seconds: 17
nanos: 400000000
}
end_time {
seconds: 30
nanos: 500000000
}
word: "\340\264\271\340\264\262\340\265\213"
speaker_tag: 1
}
words {
start_time {
seconds: 30
nanos: 500000000
}
end_time {
seconds: 30
nanos: 700000000
}
word: "\340\264\222\340\264\250\340\265\215\340\264\250\340\265\215"
speaker_tag: 1
}
words {
start_time {
seconds: 30
nanos: 700000000
}
end_time {
seconds: 31
}
word: "\340\264\232\340\265\213\340\264\246\340\264\277\340\264\232\340\265\215\340\264\232\340\265\213\340\264\237\340\265\215\340\264\237\340\265\206"
speaker_tag: 1
}
words {
start_time {
seconds: 31
}
end_time {
seconds: 33
nanos: 100000000
}
word: "\340\264\265\340\264\264\340\264\277"
speaker_tag: 1
}
words {
start_time {
seconds: 33
nanos: 100000000
}
end_time {
seconds: 33
nanos: 200000000
}
word: "\340\264\222\340\264\250\340\265\215\340\264\250\340\265\215"
speaker_tag: 1
}
words {
start_time {
seconds: 33
nanos: 200000000
}
end_time {
seconds: 33
nanos: 300000000
}
word: "\340\264\252\340\264\261\340\264\236\340\265\215\340\264\236\340\265\201"
speaker_tag: 1
}
words {
start_time {
seconds: 33
nanos: 300000000
}
end_time {
seconds: 33
nanos: 700000000
}
word: "\340\264\244\340\264\260\340\264\276\340\264\256\340\265\213"
speaker_tag: 1
}
words {
start_time {
seconds: 33
nanos: 700000000
}
end_time {
seconds: 34
nanos: 800000000
}
word: "\340\264\225\340\265\212\340\264\237\340\265\215\340\264\237\340\264\276\340\264\260\340\264\244\340\265\215\340\264\244\340\264\277\340\264\262\340\265\207\340\264\225\340\265\215\340\264\225\340\265\215"
speaker_tag: 1
}
words {
start_time {
seconds: 34
nanos: 800000000
}
end_time {
seconds: 37
nanos: 300000000
}
word: "\340\264\256\340\264\261\340\265\201\340\264\252\340\264\237\340\264\277"
speaker_tag: 1
}
words {
start_time {
seconds: 37
nanos: 300000000
}
end_time {
seconds: 38
nanos: 200000000
}
word: "\340\264\225\340\264\277\340\264\237\340\265\215\340\264\237\340\265\201\340\264\250\340\265\215\340\264\250\340\264\277\340\264\262\340\265\215\340\264\262\340\264\262\340\265\215\340\264\262\340\265\213"
speaker_tag: 1
}
words {
start_time {
seconds: 38
nanos: 200000000
}
end_time {
seconds: 38
nanos: 600000000
}
word: "\340\264\250\340\264\277\340\264\231\340\265\215\340\264\231\340\265\276\340\264\225\340\265\215\340\264\225\340\265\215"
speaker_tag: 1
}
words {
start_time {
seconds: 38
nanos: 600000000
}
end_time {
seconds: 38
nanos: 900000000
}
word: "\340\264\216\340\264\231\340\265\215\340\264\225\340\264\277\340\264\262\340\265\201\340\264\202"
speaker_tag: 1
}
words {
start_time {
seconds: 38
nanos: 900000000
}
end_time {
seconds: 39
}
word: "\340\264\265\340\264\264\340\264\277"
speaker_tag: 1
}
words {
start_time {
seconds: 39
}
end_time {
seconds: 39
nanos: 400000000
}
word: "\340\264\205\340\264\261\340\264\277\340\264\257\340\264\276\340\264\256\340\265\213"
speaker_tag: 1
}
words {
start_time {
seconds: 39
nanos: 400000000
}
end_time {
seconds: 40
nanos: 700000000
}
word: "\340\264\205\340\264\261\340\264\277\340\264\257\340\264\244\340\265\215\340\264\244\340\264\277\340\264\262\340\265\215\340\264\262"
speaker_tag: 1
}
words {
start_time {
seconds: 40
nanos: 700000000
}
end_time {
seconds: 41
nanos: 300000000
}
word: "\340\264\265\340\264\264\340\264\277"
speaker_tag: 1
}
words {
start_time {
seconds: 41
nanos: 300000000
}
end_time {
seconds: 41
nanos: 800000000
}
word: "\340\264\205\340\264\261\340\264\277\340\264\257\340\264\276\340\264\250\340\265\201\340\264\263\340\265\215\340\264\263"
speaker_tag: 1
}
words {
start_time {
seconds: 41
nanos: 800000000
}
end_time {
seconds: 56
nanos: 100000000
}
word: "\340\264\257\340\264\234\340\265\215\340\264\236\340\264\202"
speaker_tag: 1
}
}
}
total_billed_time {
seconds: 61
}

I tried to enable Speaker Diarization. I am able to see only "speaker_tag:1". When i refer with the actual audio file, words from other speakers are missing. As per the documentation here , i have changed only the file name, AudioEncoding.MP3 and language_code="ml-IN" only. Please see below the response. Kindly let me know if you need my audio which i am testing.



response: results {
alternatives {
transcript: "\340\264\241\340\265\215\340\264\260\340\265\210\340\264\265\340\265\274 \340\264\265\340\264\243\340\265\215\340\264\237\340\264\277 \340\264\250\340\264\277\340\265\274\340\264\244\340\265\215\340\264\244\340\264\277"
confidence: 0.8085001
words {
start_time {
seconds: 2
}
end_time {
seconds: 4
nanos: 300000000
}
word: "\340\264\241\340\265\215\340\264\260\340\265\210\340\264\265\340\265\274"
}
words {
start_time {
seconds: 4
nanos: 300000000
}
end_time {
seconds: 4
nanos: 600000000
}
word: "\340\264\265\340\264\243\340\265\215\340\264\237\340\264\277"
}
words {
start_time {
seconds: 4
nanos: 600000000
}
end_time {
seconds: 4
nanos: 800000000
}
word: "\340\264\250\340\264\277\340\265\274\340\264\244\340\265\215\340\264\244\340\264\277"
}
}
result_end_time {
seconds: 10
nanos: 200000000
}
language_code: "ml-in"
}
results {
alternatives {
transcript: " \340\264\250\340\264\277\340\264\231\340\265\215\340\264\231\340\265\276 \340\264\206\340\264\260\340\264\276\340\264\247\340\264\225\340\264\260\340\265\206 \340\264\265\340\264\260\340\265\201\340\264\250\340\265\215\340\264\250\340\264\244\340\265\215 \340\264\236\340\264\231\340\265\215\340\264\231\340\265\276 \340\264\254\340\265\213\340\264\202\340\264\254\340\265\206\340\264\257\340\264\277\340\264\262\340\264\276\340\264\257\340\264\277\340\264\260\340\265\201\340\264\250\340\265\215\340\264\250\340\265\201 \340\264\225\340\265\215\340\264\225\340\265\201\340\264\263\340\265\215\340\264\263 \340\264\265\340\264\264\340\264\277 \340\264\222\340\264\250\340\265\215\340\264\250\340\265\201 \340\264\252\340\264\261\340\264\236\340\265\215\340\264\236\340\265\201\340\264\244\340\264\260\340\265\201\340\264\256\340\265\213 \340\264\271\340\264\262\340\265\213 \340\264\222\340\264\250\340\265\215\340\264\250\340\265\215 \340\264\232\340\265\213\340\264\246\340\264\277\340\264\232\340\265\215\340\264\232\340\265\213\340\264\237\340\265\215\340\264\237\340\265\206 \340\264\265\340\264\264\340\264\277 \340\264\222\340\264\250\340\265\215\340\264\250\340\265\215 \340\264\252\340\264\261\340\264\236\340\265\215\340\264\236\340\265\201 \340\264\244\340\264\260\340\264\276\340\264\256\340\265\213 \340\264\225\340\265\212\340\264\237\340\265\215\340\264\237\340\264\276\340\264\260\340\264\244\340\265\215\340\264\244\340\264\277\340\264\262\340\265\207\340\264\225\340\265\215\340\264\225\340\265\215 \340\264\256\340\264\261\340\265\201\340\264\252\340\264\237\340\264\277 \340\264\225\340\264\277\340\264\237\340\265\215\340\264\237\340\265\201\340\264\250\340\265\215\340\264\250\340\264\277\340\264\262\340\265\215\340\264\262\340\264\262\340\265\215\340\264\262\340\265\213 \340\264\250\340\264\277\340\264\231\340\265\215\340\264\231\340\265\276\340\264\225\340\265\215\340\264\225\340\265\215 \340\264\216\340\264\231\340\265\215\340\264\225\340\264\277\340\264\262\340\265\201\340\264\202 \340\264\265\340\264\264\340\264\277 \340\264\205\340\264\261\340\264\277\340\264\257\340\264\276\340\264\256\340\265\213 \340\264\205\340\264\261\340\264\277\340\264\257\340\264\244\340\265\215\340\264\244\340\264\277\340\264\262\340\265\215\340\264\262 \340\264\265\340\264\264\340\264\277 \340\264\205\340\264\261\340\264\277\340\264\257\340\264\276\340\264\250\340\265\201\340\264\263\340\265\215\340\264\263 \340\264\257\340\264\234\340\265\215\340\264\236\340\264\202"
confidence: 0.7363452
words {
start_time {
seconds: 13
}
end_time {
seconds: 13
nanos: 600000000
}
word: "\340\264\250\340\264\277\340\264\231\340\265\215\340\264\231\340\265\276"
}
words {
start_time {
seconds: 13
nanos: 600000000
}
end_time {
seconds: 14
nanos: 100000000
}
word: "\340\264\206\340\264\260\340\264\276\340\264\247\340\264\225\340\264\260\340\265\206"
}
words {
start_time {
seconds: 14
nanos: 100000000
}
end_time {
seconds: 14
nanos: 700000000
}
word: "\340\264\265\340\264\260\340\265\201\340\264\250\340\265\215\340\264\250\340\264\244\340\265\215"
}
words {
start_time {
seconds: 14
nanos: 700000000
}
end_time {
seconds: 15
nanos: 200000000
}
word: "\340\264\236\340\264\231\340\265\215\340\264\231\340\265\276"
}
words {
start_time {
seconds: 15
nanos: 200000000
}
end_time {
seconds: 15
nanos: 400000000
}
word: "\340\264\254\340\265\213\340\264\202\340\264\254\340\265\206\340\264\257\340\264\277\340\264\262\340\264\276\340\264\257\340\264\277\340\264\260\340\265\201\340\264\250\340\265\215\340\264\250\340\265\201"
}
words {
start_time {
seconds: 15
nanos: 400000000
}
end_time {
seconds: 17
}
word: "\340\264\225\340\265\215\340\264\225\340\265\201\340\264\263\340\265\215\340\264\263"
}
words {
start_time {
seconds: 17
}
end_time {
seconds: 17
nanos: 100000000
}
word: "\340\264\265\340\264\264\340\264\277"
}
words {
start_time {
seconds: 17
nanos: 100000000
}
end_time {
seconds: 17
nanos: 200000000
}
word: "\340\264\222\340\264\250\340\265\215\340\264\250\340\265\201"
}
words {
start_time {
seconds: 17
nanos: 200000000
}
end_time {
seconds: 17
nanos: 400000000
}
word: "\340\264\252\340\264\261\340\264\236\340\265\215\340\264\236\340\265\201\340\264\244\340\264\260\340\265\201\340\264\256\340\265\213"
}
words {
start_time {
seconds: 17
nanos: 400000000
}
end_time {
seconds: 30
nanos: 500000000
}
word: "\340\264\271\340\264\262\340\265\213"
}
words {
start_time {
seconds: 30
nanos: 500000000
}
end_time {
seconds: 30
nanos: 700000000
}
word: "\340\264\222\340\264\250\340\265\215\340\264\250\340\265\215"
}
words {
start_time {
seconds: 30
nanos: 700000000
}
end_time {
seconds: 31
}
word: "\340\264\232\340\265\213\340\264\246\340\264\277\340\264\232\340\265\215\340\264\232\340\265\213\340\264\237\340\265\215\340\264\237\340\265\206"
}
words {
start_time {
seconds: 31
}
end_time {
seconds: 33
nanos: 100000000
}
word: "\340\264\265\340\264\264\340\264\277"
}
words {
start_time {
seconds: 33
nanos: 100000000
}
end_time {
seconds: 33
nanos: 200000000
}
word: "\340\264\222\340\264\250\340\265\215\340\264\250\340\265\215"
}
words {
start_time {
seconds: 33
nanos: 200000000
}
end_time {
seconds: 33
nanos: 300000000
}
word: "\340\264\252\340\264\261\340\264\236\340\265\215\340\264\236\340\265\201"
}
words {
start_time {
seconds: 33
nanos: 300000000
}
end_time {
seconds: 33
nanos: 700000000
}
word: "\340\264\244\340\264\260\340\264\276\340\264\256\340\265\213"
}
words {
start_time {
seconds: 33
nanos: 700000000
}
end_time {
seconds: 34
nanos: 800000000
}
word: "\340\264\225\340\265\212\340\264\237\340\265\215\340\264\237\340\264\276\340\264\260\340\264\244\340\265\215\340\264\244\340\264\277\340\264\262\340\265\207\340\264\225\340\265\215\340\264\225\340\265\215"
}
words {
start_time {
seconds: 34
nanos: 800000000
}
end_time {
seconds: 37
nanos: 300000000
}
word: "\340\264\256\340\264\261\340\265\201\340\264\252\340\264\237\340\264\277"
}
words {
start_time {
seconds: 37
nanos: 300000000
}
end_time {
seconds: 38
nanos: 200000000
}
word: "\340\264\225\340\264\277\340\264\237\340\265\215\340\264\237\340\265\201\340\264\250\340\265\215\340\264\250\340\264\277\340\264\262\340\265\215\340\264\262\340\264\262\340\265\215\340\264\262\340\265\213"
}
words {
start_time {
seconds: 38
nanos: 200000000
}
end_time {
seconds: 38
nanos: 600000000
}
word: "\340\264\250\340\264\277\340\264\231\340\265\215\340\264\231\340\265\276\340\264\225\340\265\215\340\264\225\340\265\215"
}
words {
start_time {
seconds: 38
nanos: 600000000
}
end_time {
seconds: 38
nanos: 900000000
}
word: "\340\264\216\340\264\231\340\265\215\340\264\225\340\264\277\340\264\262\340\265\201\340\264\202"
}
words {
start_time {
seconds: 38
nanos: 900000000
}
end_time {
seconds: 39
}
word: "\340\264\265\340\264\264\340\264\277"
}
words {
start_time {
seconds: 39
}
end_time {
seconds: 39
nanos: 400000000
}
word: "\340\264\205\340\264\261\340\264\277\340\264\257\340\264\276\340\264\256\340\265\213"
}
words {
start_time {
seconds: 39
nanos: 400000000
}
end_time {
seconds: 40
nanos: 700000000
}
word: "\340\264\205\340\264\261\340\264\277\340\264\257\340\264\244\340\265\215\340\264\244\340\264\277\340\264\262\340\265\215\340\264\262"
}
words {
start_time {
seconds: 40
nanos: 700000000
}
end_time {
seconds: 41
nanos: 300000000
}
word: "\340\264\265\340\264\264\340\264\277"
}
words {
start_time {
seconds: 41
nanos: 300000000
}
end_time {
seconds: 41
nanos: 800000000
}
word: "\340\264\205\340\264\261\340\264\277\340\264\257\340\264\276\340\264\250\340\265\201\340\264\263\340\265\215\340\264\263"
}
words {
start_time {
seconds: 41
nanos: 800000000
}
end_time {
seconds: 56
nanos: 100000000
}
word: "\340\264\257\340\264\234\340\265\215\340\264\236\340\264\202"
}
}
result_end_time {
seconds: 60
nanos: 670000000
}
language_code: "ml-in"
}
results {
alternatives {
words {
start_time {
seconds: 2
}
end_time {
seconds: 4
nanos: 300000000
}
word: "\340\264\241\340\265\215\340\264\260\340\265\210\340\264\265\340\265\274"
speaker_tag: 1
}
words {
start_time {
seconds: 4
nanos: 300000000
}
end_time {
seconds: 4
nanos: 600000000
}
word: "\340\264\265\340\264\243\340\265\215\340\264\237\340\264\277"
speaker_tag: 1
}
words {
start_time {
seconds: 4
nanos: 600000000
}
end_time {
seconds: 4
nanos: 800000000
}
word: "\340\264\250\340\264\277\340\265\274\340\264\244\340\265\215\340\264\244\340\264\277"
speaker_tag: 1
}
words {
start_time {
seconds: 13
}
end_time {
seconds: 13
nanos: 600000000
}
word: "\340\264\250\340\264\277\340\264\231\340\265\215\340\264\231\340\265\276"
speaker_tag: 1
}
words {
start_time {
seconds: 13
nanos: 600000000
}
end_time {
seconds: 14
nanos: 100000000
}
word: "\340\264\206\340\264\260\340\264\276\340\264\247\340\264\225\340\264\260\340\265\206"
speaker_tag: 1
}
words {
start_time {
seconds: 14
nanos: 100000000
}
end_time {
seconds: 14
nanos: 700000000
}
word: "\340\264\265\340\264\260\340\265\201\340\264\250\340\265\215\340\264\250\340\264\244\340\265\215"
speaker_tag: 1
}
words {
start_time {
seconds: 14
nanos: 700000000
}
end_time {
seconds: 15
nanos: 200000000
}
word: "\340\264\236\340\264\231\340\265\215\340\264\231\340\265\276"
speaker_tag: 1
}
words {
start_time {
seconds: 15
nanos: 200000000
}
end_time {
seconds: 15
nanos: 400000000
}
word: "\340\264\254\340\265\213\340\264\202\340\264\254\340\265\206\340\264\257\340\264\277\340\264\262\340\264\276\340\264\257\340\264\277\340\264\260\340\265\201\340\264\250\340\265\215\340\264\250\340\265\201"
speaker_tag: 1
}
words {
start_time {
seconds: 15
nanos: 400000000
}
end_time {
seconds: 17
}
word: "\340\264\225\340\265\215\340\264\225\340\265\201\340\264\263\340\265\215\340\264\263"
speaker_tag: 1
}
words {
start_time {
seconds: 17
}
end_time {
seconds: 17
nanos: 100000000
}
word: "\340\264\265\340\264\264\340\264\277"
speaker_tag: 1
}
words {
start_time {
seconds: 17
nanos: 100000000
}
end_time {
seconds: 17
nanos: 200000000
}
word: "\340\264\222\340\264\250\340\265\215\340\264\250\340\265\201"
speaker_tag: 1
}
words {
start_time {
seconds: 17
nanos: 200000000
}
end_time {
seconds: 17
nanos: 400000000
}
word: "\340\264\252\340\264\261\340\264\236\340\265\215\340\264\236\340\265\201\340\264\244\340\264\260\340\265\201\340\264\256\340\265\213"
speaker_tag: 1
}
words {
start_time {
seconds: 17
nanos: 400000000
}
end_time {
seconds: 30
nanos: 500000000
}
word: "\340\264\271\340\264\262\340\265\213"
speaker_tag: 1
}
words {
start_time {
seconds: 30
nanos: 500000000
}
end_time {
seconds: 30
nanos: 700000000
}
word: "\340\264\222\340\264\250\340\265\215\340\264\250\340\265\215"
speaker_tag: 1
}
words {
start_time {
seconds: 30
nanos: 700000000
}
end_time {
seconds: 31
}
word: "\340\264\232\340\265\213\340\264\246\340\264\277\340\264\232\340\265\215\340\264\232\340\265\213\340\264\237\340\265\215\340\264\237\340\265\206"
speaker_tag: 1
}
words {
start_time {
seconds: 31
}
end_time {
seconds: 33
nanos: 100000000
}
word: "\340\264\265\340\264\264\340\264\277"
speaker_tag: 1
}
words {
start_time {
seconds: 33
nanos: 100000000
}
end_time {
seconds: 33
nanos: 200000000
}
word: "\340\264\222\340\264\250\340\265\215\340\264\250\340\265\215"
speaker_tag: 1
}
words {
start_time {
seconds: 33
nanos: 200000000
}
end_time {
seconds: 33
nanos: 300000000
}
word: "\340\264\252\340\264\261\340\264\236\340\265\215\340\264\236\340\265\201"
speaker_tag: 1
}
words {
start_time {
seconds: 33
nanos: 300000000
}
end_time {
seconds: 33
nanos: 700000000
}
word: "\340\264\244\340\264\260\340\264\276\340\264\256\340\265\213"
speaker_tag: 1
}
words {
start_time {
seconds: 33
nanos: 700000000
}
end_time {
seconds: 34
nanos: 800000000
}
word: "\340\264\225\340\265\212\340\264\237\340\265\215\340\264\237\340\264\276\340\264\260\340\264\244\340\265\215\340\264\244\340\264\277\340\264\262\340\265\207\340\264\225\340\265\215\340\264\225\340\265\215"
speaker_tag: 1
}
words {
start_time {
seconds: 34
nanos: 800000000
}
end_time {
seconds: 37
nanos: 300000000
}
word: "\340\264\256\340\264\261\340\265\201\340\264\252\340\264\237\340\264\277"
speaker_tag: 1
}
words {
start_time {
seconds: 37
nanos: 300000000
}
end_time {
seconds: 38
nanos: 200000000
}
word: "\340\264\225\340\264\277\340\264\237\340\265\215\340\264\237\340\265\201\340\264\250\340\265\215\340\264\250\340\264\277\340\264\262\340\265\215\340\264\262\340\264\262\340\265\215\340\264\262\340\265\213"
speaker_tag: 1
}
words {
start_time {
seconds: 38
nanos: 200000000
}
end_time {
seconds: 38
nanos: 600000000
}
word: "\340\264\250\340\264\277\340\264\231\340\265\215\340\264\231\340\265\276\340\264\225\340\265\215\340\264\225\340\265\215"
speaker_tag: 1
}
words {
start_time {
seconds: 38
nanos: 600000000
}
end_time {
seconds: 38
nanos: 900000000
}
word: "\340\264\216\340\264\231\340\265\215\340\264\225\340\264\277\340\264\262\340\265\201\340\264\202"
speaker_tag: 1
}
words {
start_time {
seconds: 38
nanos: 900000000
}
end_time {
seconds: 39
}
word: "\340\264\265\340\264\264\340\264\277"
speaker_tag: 1
}
words {
start_time {
seconds: 39
}
end_time {
seconds: 39
nanos: 400000000
}
word: "\340\264\205\340\264\261\340\264\277\340\264\257\340\264\276\340\264\256\340\265\213"
speaker_tag: 1
}
words {
start_time {
seconds: 39
nanos: 400000000
}
end_time {
seconds: 40
nanos: 700000000
}
word: "\340\264\205\340\264\261\340\264\277\340\264\257\340\264\244\340\265\215\340\264\244\340\264\277\340\264\262\340\265\215\340\264\262"
speaker_tag: 1
}
words {
start_time {
seconds: 40
nanos: 700000000
}
end_time {
seconds: 41
nanos: 300000000
}
word: "\340\264\265\340\264\264\340\264\277"
speaker_tag: 1
}
words {
start_time {
seconds: 41
nanos: 300000000
}
end_time {
seconds: 41
nanos: 800000000
}
word: "\340\264\205\340\264\261\340\264\277\340\264\257\340\264\276\340\264\250\340\265\201\340\264\263\340\265\215\340\264\263"
speaker_tag: 1
}
words {
start_time {
seconds: 41
nanos: 800000000
}
end_time {
seconds: 56
nanos: 100000000
}
word: "\340\264\257\340\264\234\340\265\215\340\264\236\340\264\202"
speaker_tag: 1
}
}
}
total_billed_time {
seconds: 61
}

I tried to enable Speaker Diarization. I am able to see only "speaker_tag:1". When i refer with the actual audio file, words from other speakers are missing. As per the documentation here , i have changed only the file name, AudioEncoding.MP3 and language_code="ml-IN" only. Kindly let me know if you need my audio which i am testing.

Hi There, any suggestion on my query?

@sukumarpm I had the same issue. @kvandres 's suggestion in helped me. It might also help to see my config:

diarization_config = speech.SpeakerDiarizationConfig(
    enable_speaker_diarization=True,
    min_speaker_count=2,
    max_speaker_count=2,
    )
    metadata = speech.RecognitionMetadata(
        interaction_type = speech.RecognitionMetadata.InteractionType(1)
    )
 
config = speech.RecognitionConfig(
      encoding=speech.RecognitionConfig.AudioEncoding.WEBM_OPUS,
      enable_automatic_punctuation=True,
      language_code="en-US",
      diarization_config=diarization_config,
      metadata = metadata,
      model = "video"
    )

 # Send async recognition request
 operation = client.long_running_recognize(config=config, audio=audio)

 

Top Labels in this Space
Top Solution Authors