Questions about the speech to text api

I am taking a look a the speech to text API and I had some questions:

1. What is the difference between v1 and v1p1?

2. Does the chip model in v2 support transcribing audio from a streaming input?

1 1 83
1 REPLY 1

Hi @piratebay

Thank you for joining our community.

  1. Both v1 and v1p1 are sub versions of Cloud Speech-to-Text V1. The v1 is considered the stable one and the experimental features of this version are placed in the v1p1. Although Cloud Speech-to-Text V1 is still supported, there are no further developments being carried out for this version, all future enhancements are now focused on the even more powerful Cloud Speech-to-Text V2, which provides the most up-to-date capabilities.
  2. Chirp is capable of processing speech in larger chunks compared to other models, in this regard, Chirp is not suitable for real-time applications such as streaming.

Based on published documents, Chirp is available through the following API methods:

  • v2 Speech.Recognize (good for short audio < 1 min)
  • v2 Speech.BatchRecognize (good for long audio 1 min to 8 hrs)

Chirp is not available on the following API methods:

  • v2 Speech.StreamingRecognize
  • v1 Speech.StreamingRecognize
  • v1 Speech.Recognize
  • v1 Speech.LongRunningRecognize
  • v1p1beta1 Speech.StreamingRecognize
  • v1p1beta1 Speech.Recognize
  • v1p1beta1 Speech.LongRunningRecognize

I hope I was able to provide you with useful insights.