Hi Team, could you please provide us the formula to calculate rate limit for the prompt sent in the payload i.e., how could we analyze, what could be the rate limit for TPM (tokens per minute) approximately.
Suppose if this is our sample payload for gemini-pro model,
{
"model": "gemini-pro",
"contents": [
{
"role": "USER",
"parts": { "text": "Who is main lead in Marvel Comincs ?" }
}
],
"safety_settings": {
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_LOW_AND_ABOVE"
},
"generation_config": {
"temperature": 0.2,
"topP": 0.8,
"topK": 40
}
}
Here, promptTokenCount is inputLenght/4 as per documentation.
Then, what is the formula to predict total Tokens approximately if maxOutputTokens is not provided in the payload.
