So multimodal embeddings are resized by the Embeddings API to 512x512 pixels as stated here:
The maximum image size accepted is 20 MB. To avoid increased network latency, use smaller images. Additionally, the model resizes images to 512 x 512 pixel resolution. Consequently, you don't need to provide higher resolution images. ( https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-multimodal-embeddings )
Is there any logic that uses the detected dominant object in the image to resize it? Or should I be using Cloud Vision Crop Hints API, then resize to 512x512 ensuring the dominant object is in the bounding box? Also is it possible to get the resized image/coordinates from the model to check what it is processing? If not, this would be very useful to add in a future release.
I'm guessing the answer will be "No, yes, no" but just checking.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |