Does anyone know if there's a way to setup a parameter efficient tuning job when your prompt uses chain of thought reasoning (i.e. it outputs its working), but your ground truth data only has the desired JSON output and no reasoning output? My prompt is more complex than I'd like and chain of thought helps a lot, but I don't want to be prescriptive about what the LLM's working should look like.
Running the prompt normally I just use regex to get the JSON part of the response, so ideally I'd be able to insert a function into the PET processes to do that before the output is evaluated against the ground truth. I read setting the mime-type to JSON in the model doesn't effect PET.
I'm hoping I'm not the first person hoping to use this tuning setup.