Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Error 499 - The operation was cancelled on STTv2, one in 10 times

Hello,

After updating to v2 of the Speech to Text python API, I get error 499 - The operation was cancelled every 10 or so calls, but there is nothing on my end that appears to be in error.

1) The implementation makes StreamingRecognizeRequest's every 20ms, with a payload length of 20ms of audio, 48 kHz Signed Linear (SLIN), Mono

2) There is no gap in my request sent, tested here on ethernet with low jitter and no packet loss

3) It doesn't seem to be consistent... Sometimes it will error out, sometime it wont. It ALWAYS errors out at the same point, after END_OF_SINGLE_UTTERANCE and SPEECH_ACTIVITY_END. Thus, there is no is_final result.

def request_generator(self, client😞
sample_rate_hertz = 48000

config = speech_v2.RecognitionConfig(
model='latest_short',
language_codes=[self.interface.language_info.code_google,],
explicit_decoding_config=speech_v2.ExplicitDecodingConfig(encoding=speech_v2.ExplicitDecodingConfig.AudioEncoding.LINEAR16, sample_rate_hertz=sample_rate_hertz, audio_channel_count=1),
features=speech_v2.RecognitionFeatures(
max_alternatives=0,
enable_word_time_offsets=True,
enable_word_confidence=True,
enable_spoken_punctuation=True,
enable_automatic_punctuation=True,
),
)

streaming_config = speech_v2.StreamingRecognitionConfig(
config=config,
streaming_features=speech_v2.StreamingRecognitionFeatures(
interim_results=True,
enable_voice_activity_events=True,
voice_activity_timeout=speech_v2.StreamingRecognitionFeatures.VoiceActivityTimeout(speech_start_timeout=Duration(seconds=3, nanos=0), speech_end_timeout=Duration(seconds=3, nanos=0))
),
)
 
if DEBUG_WRITE_AUDIO:
dof = open('/tmp/%i.wav' % randint(0, 100000000000), 'wb')
 
# Setup an empty recognizer
yield speech_v2.StreamingRecognizeRequest(recognizer='projects/xyz/locations/global/recognizers/_', streaming_config=streaming_config)
 
# Send the WAV header
yield speech_v2.StreamingRecognizeRequest(audio=bytes(WAVE_HEADER))

while self.active:
rtp_payload = self.interface.rtp_transceiver.receive_queue.get()
# if DEBUG_WRITE_AUDIO: print(f"--> Send to STT, len ", len(rtp_payload))

# Debug, use this to encapsulate it first: fmpeg -v debug -y -f s16be -ar 48000 -ac 1 -i 56918204745.wav file.wav
if DEBUG_WRITE_AUDIO: dof.write(bytes(rtp_payload))

# Content needs to be byte swapped
byte_swap(rtp_payload)

# sleep(0.010) # Seems to have a problem being hit so quickly?
yield speech_v2.StreamingRecognizeRequest(audio=bytes(rtp_payload))
 
Anyone have any ideas? We're tempted to just take the last transcription result... But this error happens so often, it feels like a problem on Google's end.
0 3 1,535
3 REPLIES 3

Hello,

You're right to be concerned about the frequent error 499 (The operation was cancelled) you're encountering with the Speech-to-Text v2 Python API. Here are some insights and potential solutions to investigate the issue:

Understanding Error 499:

This error typically indicates a problem on the server-side, potentially due to concurrency issues or resource limitations. However, it's worth exploring both client and server-side possibilities.

Here are some points to consider:
1. sleep(0.010): This might not be necessary and could be contributing to delays. The Speech-to-Text API should handle processing at its own pace.
2. Byte Swapping: Ensure your byte swapping logic is correct for converting the audio data to the expected format (LITTLE_ENDIAN for most systems).
3. yield Statements: Verify the order of yield statements. The initial setup and header should be sent before the audio data chunks.

Troubleshooting Steps:

  • Reduce Verbosity: Temporarily remove the debug code (DEBUG_WRITE_AUDIO sections) to see if it affects the error frequency.
  • Simplify Requests: Try sending larger audio chunks (e.g., 100ms or 200ms) instead of 20ms packets. This might reduce the overall number of requests and potential for concurrency issues.
  • Check Client-Side Network: Double-confirm low latency and no packet loss on your network connection. Use tools like ping or traceroute to verify.
  • Retry Logic: Implement a basic retry mechanism with exponential backoff for error 499. This can help handle transient server-side issues without impacting performance significantly.
  • Monitor API Logs: If possible, enable logging for the Speech-to-Text API calls to see any additional error messages or clues on the server-side.

Refer to the official documentation on error handling for the Speech-to-Text v2 API: https://cloud.google.com/speech-to-text
Consider using a library like google-cloud-speech that simplifies working with the Speech-to-Text API and potentially handles retries internally.

Regards,

Jai Ade

Hello,

Was that an AI generated response?

1. The sleep is commented out, so the line has no effect
2. It's byte swapped as required, because it's coming from a little endian system
3. Yields are in order

Is there a google engineer on here?

 

Hello,

Thank you for contacting the Google Cloud Community!

I have gone through your reported issue, however it seems like this is an issue observed specifically at your end. It would need more specific debugging and analysis. To ensure a faster resolution and dedicated support for your issue, I kindly request you to file a support ticket by clicking here [1]. Our Google Cloud support team will prioritize your request and provide you with the assistance you need.

We appreciate your cooperation!

[1]: https://cloud.google.com/support/docs/manage-cases#creating_cases

Regards,

Jai Ade