Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Efficiency and price for fine-tuning Palm

I'd like to fine-tun a chat-bison model using a fairly large dataset with 5,008 lines of training data over 10 epochs, for benchmarking purposes (so the data and run are large, but this can't be changed for what I'm trying to do).

I can successfully use either the Python SDK or Vertex AI Studio to initiate a pipeline, load the data, and start a large-language-model-tuner node.  By specifying "us-central1" for "tuning_job_location" and "tuned_model_location", I assign the job to {"machine_type": "a2-ultragpu-8g"} and {"accelerator_type": "NVIDIA_A100_80GB"} with {"accelerator_count": 8}.  I indicate 6260 training steps, which by my count should work out to 5,008 x 10 epochs of training with 8 cores.  This is all fine.

What I'm trying to figure out is that the job, once initiated, seems like it's going to take a very long time.  Keeping an eye on the logs for the large-language-model-tuner node, it seems like every 50 steps in running for 20-odd minutes.  In addition to this meaning the whole job will take something like 50 hours, there are two things I'm worried about:

  • Is this going to cost a lot?  Looking at the pricing for custom models, an a2-ultragpu-8g costs $46.25/hour, which then gets multiplied by $4.52/hour for the accelerator, which over 50 hours comes to $10,452.50!  I'm not completely clear that I'm building a "custom model" here, but I'd like to be sure of what I'm spending before going too far down this road.
  • If training is slow, then inference will probably also be slow, and model speed is a factor.  Will the model I end up with take a long time to process an input?

So what I'm wondering is, first of all, is there a way to use PEFT for a fine-tuning job through either the SDK or Vertex Studio?  This post suggests that PEFT is automatically applied through Studio, but what I'm seeing makes me think the entire model is being adjusted.

And then have I understood the pricing structure correctly?  Would running this job as-is really cost over ten thousand dollars?

7 3 4,319
3 REPLIES 3