I’m currently working on a project where I need to fine-tune the LLaMA 3.1 model using Google Cloud’s Vertex AI. However, my training data is in unstructured formats like PDFs, and my goal is to create a chatbot for this custom data. I'm exploring two potential methods but facing some issues:
Agent Builder: This approach allows me to upload files to Cloud Storage and fine-tune through Agent Builder, possibly integrating with Dialogflow for chatbot creation.
Vertex AI Model Garden: Using the notebook interface, I can fine-tune models, but the approach requires data in structured formats like JSON. I do not have a data in conversation form.
My Questions:
I’d greatly appreciate any guidance, suggestions, or relevant documentation to help me effectively use these methods for my project.
Thank you!
Hi @rhirani,
Welcome to Google Cloud Community!
Here’s the answer to your questions:
LLaMA 3.1 model is not included in Vertex AI Agent option in Vertex AI Agent Builder. This approach might not be suitable unless Google Cloud updates its model options. Below image indicates the available model to choose from:
Additionally, you are right that in Vertex AI Model Garden, it requires structured data formats like JSON. This approach is more flexible for fine-tuning LLaMA 3.1, provided you preprocess your PDF data.
For PDF Data Preprocessing:
Extraction: Begin by extracting text from your PDFs.
Chunking & Structuring:
Chunking: Divide extracted text into smaller, contextually relevant chunks. This could be paragraphs, sections, or even sentences depending on the document structure and desired granularity.
Structuring: Convert each chunk into a format suitable for conversational training. A common approach is the "Question-Answer" format.
Here’s some documentations that you may check:
Fine-tuning LLaMA 3.1 with Vertex AI Model Garden:
Deployment & Integration:
I hope the above information is helpful.
I am using notebook that opens from model garden while enabling Llama 3.1
So, it fine if I fine tune here using my dataset ?
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |