How to input own dataset for generative ai?
@xylobrite I am searching for the same answer. What if we feed input file from cloud storage and read it in context of model using python?
I think Gen APP Builder can be solution but its difficult to get access for me.
I had no idea about Gen APP builder, thanks. It is only for allowlisted customer and in development phase I guess?
This is the foundational general available fine tuning is for the prompt itself, right?
How to send own custom data eg. reading from docs, pdfs, sql as context?
I'll give an example with the dataset alpaca, from huggingface:
import torch
from datasets import load_dataset
import pandas as pd
import json
train_dataset = load_dataset("tatsu-lab/alpaca", split="train")
# Transform this pytorch dataset train_dataset in a pandas dataframe
df = train_dataset.to_pandas()
df["input_text"]=df.text.astype(str)+': '+df.instruction.astype(str)
df["output_text"]=df.output.astype(str)
df=df[["input_text","output_text"]]
data_list = df.to_dict(orient='records')
with open('output_alpaca_k.jsonl', 'w') as file:
for example in data_list:
file.write(json.dumps(example) + '\n')
The result is a JSON like this, one example per line:
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |