Currently Vertex AI supports deploying embedding models like BGE and GTE using HuggingFace text-embedding-inference (example). However, we can't seem to use the same technology stack for deploying a reranker model. This is something supported in text-embedding-inference (TEI) -- https://github.com/huggingface/text-embeddings-inference?tab=readme-ov-file#using-re-rankers-models
I was able to get a reranker model deployed this way, but trying to invoke the resulting endpoint resulted in errors. It seems Vertex AI always invokes the embed URL exposed by TEI, when it should switch to the rerank URL. Is this a known issue? Has anybody had any luck getting a similar use case to work?
Hi,
Currently only the models specified in this documentation https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings#supported-model... are supported.
My question is with respect to HuggingFace models that are supported on Vertex AI. We can already run models like GTE, BGE and E5 embeddings on Vertex (with Text Embedding Inference). Ideally we should be able to run reranking models with Text Embedding Inference as well.
Hello,
Thank you for contacting the Google Cloud Community.
I have gone through your reported issue, however it seems like this is an issue observed specifically at your end. It would need more specific debugging and analysis. To ensure a faster resolution and dedicated support for your issue, I kindly request you to file a support ticket by clicking here[1]. Our support team will prioritize your request and provide you with the assistance you need.
For individual support issues, it is best to utilize the support ticketing system. We appreciate your cooperation!
[1]: https://cloud.google.com/support/docs/manage-cases#creating_cases
Hi,
Yes, you can deploy reranker models supported by the TEI container on Vertex AI. Please follow the text embedding example notebook and change the model ID to the reranker model you want to deploy, and update machine specs if necessary. Then, you can invoke the reranker endpoint by including `"type": "rerank"` in the request.
Example request:
{
"instances": [{
"query":"What is Deep Learning?",
"texts": ["Deep Learning is not...", "Deep learning is..."],
"raw_scores": false,
"type": "rerank"
}]
}
Example response:
{"predictions":[[{"index":1,"score":0.9976404},{"index":0,"score":0.12474516}]]}
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |