Currently Vertex AI supports deploying embedding models like BGE and GTE using HuggingFace text-embedding-inference (example). However, we can't seem to use the same technology stack for deploying a reranker model. This is something supported in text-embedding-inference (TEI) -- https://github.com/huggingface/text-embeddings-inference?tab=readme-ov-file#using-re-rankers-models
I was able to get a reranker model deployed this way, but trying to invoke the resulting endpoint resulted in errors. It seems Vertex AI always invokes the embed URL exposed by TEI, when it should switch to the rerank URL. Is this a known issue? Has anybody had any luck getting a similar use case to work?
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |