Is it possible to get a summary of an audio conversation with Gemini's
multimodal live API in the same session? Theoretically you can config a
session to support both audio and text generation_config = {
"responseModalities": ["TEXT", "AUDIO"],... So...