Fine tuning gemini in Vertex AI taking many hours

Riddum · 02-02-2025 12:07 AM

I'm fine tuning gemini-1.5-pro-002 to my whatsapp chats in Vertex and the training is taking several hours... 6 hours and counting. The status is stuck at "Running".

There are only 2 whatsapp chats in there (doing it as a test).

Is this normal?

MJane

Hi @Riddum,

Welcome to the Google Cloud Community!

The fine-tuning process for Gemini 1.5 Pro on Vertex AI has been running for over 6 hours with only two short WhatsApp chats as the dataset. This duration is not typical for such a small dataset. Below are potential troubleshooting steps to resolve the issue:

Cancel the Job and Restart - Consider stopping the training job and restarting it with adjusted parameters or resources.
Simplify the Dataset - Try creating an even simpler dataset to isolate the issue. Instead of using actual WhatsApp chats, create a basic text file containing just a few lines of simple question-and-answer pairs. This approach will help determine if the problem is specific to the WhatsApp data.
Check Configuration - Review and optimize the training parameters and settings. Make any necessary adjustments to ensure efficient training.
Check the logs - Review any logs generated during the job in case there is more information about the problem.

In addition, you can check this documentation to know more about best practices for supervised fine tuning for Gemini.

If the issue persists, I suggest contacting Google Cloud Support as they can provide more insights to see if the behavior you've encountered is a known issue or specific to your project

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.