Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Chirp Model with Multi Channel Recordings Seems Broken

I am investigating the benefits of upgrading from the V1 Phone Call "Enhanced" model to Chirp Telephony in order to generate more accurate transcriptions that originate from phone calls. I used the Google Console UI to test these two models, and was quite surprised to find that the Chirp model appears to be broken.

I used a 52 second call recording with 2 channels to test this.

V1 Model results appear to be fine. They split out by channel and timestamp as expected. Additionally I can add punctuation as well.

image.png

Here is the output of the Chirp Telephony model with the same recording. You can see that it is remarkably worse in comparison. Beyond the no punctuation available, the model doesn't appear to be splitting things out properly by channel at all.

image (1).png

This is so bad that I have to wonder, am I doing something wrong? Am I not understanding the purpose of "Chirp" as a drop-in replacement for V1 Speech models?

1 1 322
1 REPLY 1