Announcements
This site is in read only until July 22 as we migrate to a new platform; refer to this community post for more details.
Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

text-bison@001 tuned model serving

Hi! I would like to tune a model based on text-bison@001 and have it run online inferences. The documentation about how to tune is very clear. However, I can't figure out how Vertex serves my model for inference.

Do I need to deploy the tuned model to an endpoint and pay hourly? If so, what instance type is necessary to support the tuned model?

Alternatively, is the tuned model hosted "serverlessly" and I pay the same (or different) per-character rate as for regular requests to the base text-bison@001 model?

1 18 6,022