Hi! I would like to tune a model based on text-bison@001 and have it run online inferences. The documentation about how to tune is very clear. However, I can't figure out how Vertex serves my model for inference.
Do I need to deploy the tuned model to an endpoint and pay hourly? If so, what instance type is necessary to support the tuned model?
Alternatively, is the tuned model hosted "serverlessly" and I pay the same (or different) per-character rate as for regular requests to the base text-bison@001 model?