Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Gemini request: Microphone input that isn't just "one shot"

This is one of those small UI details that becomes a big impediment to fluid use for those using the Gemini interface via the web UI. 

So currently you can do speech to text in Gemini via the web UI by clicking on the microphone input in the prompt bay:

danielrosehill_1-1736296066948.png

It worked relatively well, although Google's speech-to-text performance still lags quite a bit behind that of Whisper. 

danielrosehill_0-1736296035152.png

The issue, however, is that it's a one-shot microphone button! Once you've used it to capture your prompt with your voice, there's no way to click the button again. 


Once you have used voice-to-text to capture your prompt, the microphone icon turns into the submit button. 

danielrosehill_2-1736296163613.png

There are lots of times when I will dictate a prompt, then stop the microphone capture because I think I've finished, but then decide that I want to add something back. 

I think that this workflow is pretty natural for people using voice to write prompts and the constraint that the microphone input can be only used once seems perhaps like an unintentional UI aspect to me. 

0 0 295
0 REPLIES 0