I have been using Gemini pro 1.5 for extracting data from invoices. With the help of prompts, I get the data extracted. There are certain documents where the model fails to generate a correct output. In order to improve upon this, I am planning to fine tune the model. The problem scenario is as follows:
Model generates an answer which is incorrect. Extracts a different value for invoice number.
I read the training dataset documentation. If I create a training dataset for the problematic documents .
Example:
{"contents":[{"role":"user","parts":[{"fileData":{"mimeType":"application/pdf","fileUri":"sample.pdf"}},{"text":"Extract invoice number"}]},{"role":"model","parts":[{"text":"Invoice number is 34941097."}]}]}
If I create such kind of dataset and fine tune the model. Will the model be able to generate correct values?
Hi @aldred,
Welcome to Google Cloud Community!
Yes, fine tuning can improve the model's quality and efficiency. The more data you have that is high-quality and representative, the better the results will be. As best practice for training datasets:
For more information about fine tuning, you may check out this documentation. To practice fine-tuning, you can run the Fine-Tuning Large Language Models: How Vertex AI Takes LLMs to the Next Level codelab to get hands-on experience.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |