Hi! I'm using the Gemini Python SDK to transcribe a long audio file. My system prompt includes instructions for a JSON output format.
I first upload the audio file using genai.upload_file (I get 429 Resource Exhausted if I simply base64 encode it), which returns a URL. I then start a new chat conversation and get the response. If the output goes over the token limit, I send another message with the prompt "continue". This *should* continue generating from where it left off, but instead it always starts generating at the beginning! This works fine from the cloud console, and I get the full output.
I'm using the generativeai Python SDK 0.7.0. What could be going wrong??
Here is my code:
def run(self) -> str:
audio = self._upload_to_gemini(self._audio_file, mime_type="audio/wav")
model = genai.GenerativeModel(
model_name="gemini-1.5-flash",
generation_config={
"temperature": 0,
"max_output_tokens": 8192,
"response_mime_type": "text/plain",
},
system_instruction=self._system_prompt,
)
transcription = ""
chat = model.start_chat(history=[])
response = chat.send_message(["Transcribe this file", audio], stream=True)
print("*** response:")
for chunk in response:
print(chunk.text, end='')
transcription += chunk.text
while not transcription.strip().splitlines()[-1] == "}":
print("*** asking for next segment..")
print(chat)
response = chat.send_message("continue", stream=True)
print("*** response:")
for chunk in response:
print(chunk.text, end='')
transcription += chunk.text
return transcription