We are trying to deploy the model in Vertex Endpoint with GPU support.
Here we are facing two problems, GPU memory is fully reserved by a single model but GPU power
is underutilize.
So can we deploy multiple Workers in the Same Node and also how to allow the worker to reserve VRAM only up to it required?
You can deploy more than one model to the same endpoint (documentation), however the resources are associated with the model rather than the endpoint.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |