Hi,
I've fine-tuned the gemini-1.5-flash model with some dataset and tried to use it. But the inferencing time is extremly slow. It takes about 10 seconds for the simple input. I just wrote "Hi" and received output message after many seconds.
Please let me know the reason and how to accelorate it like normal gemini-1.5-flash model.
And if there is a documentation about this issue please reply me.
Solved! Go to Solution.
Hi @jsm_llm,
Welcome to Google Cloud Community!
The slow inference time you're seeing with your fine-tuned Gemini-1.5-flash model after inputting just "Hi" could be due to a few different factors. These issues aren't directly mentioned in the official documentation because they result from your specific fine-tuning process and setup, not from the base model itself. The main point is understanding the trade-offs you made during fine-tuning and how they affect the model's performance during inference. Here's why this happens and how to potentially speed things up:
Possible Reasons for Slow Inference:
Potential Strategies to Accelerate Inference:
I hope the above information is helpful.
We're seeing the same thing. Much slower completions on fine tuned models (2x or more slower).
Hi @jsm_llm,
Welcome to Google Cloud Community!
The slow inference time you're seeing with your fine-tuned Gemini-1.5-flash model after inputting just "Hi" could be due to a few different factors. These issues aren't directly mentioned in the official documentation because they result from your specific fine-tuning process and setup, not from the base model itself. The main point is understanding the trade-offs you made during fine-tuning and how they affect the model's performance during inference. Here's why this happens and how to potentially speed things up:
Possible Reasons for Slow Inference:
Potential Strategies to Accelerate Inference:
I hope the above information is helpful.
Thank you for your answer. I understood the reason.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |