Multimodal embeddings resized to 512x512 - any log... - Page 2

So multimodal embeddings are resized by the Embeddings API to 512x512 pixels as stated here:

The maximum image size accepted is 20 MB. To avoid increased network latency, use smaller images. Additionally, the model resizes images to 512 x 512 pixel resolution. Consequently, you don't need to provide higher resolution images. ( https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-multimodal-embeddings )

Is there any logic that uses the detected dominant object in the image to resize it? Or should I be using Cloud Vision Crop Hints API, then resize to 512x512 ensuring the dominant object is in the bounding box? Also is it possible to get the resized image/coordinates from the model to check what it is processing? If not, this would be very useful to add in a future release.

I'm guessing the answer will be "No, yes, no" but just checking.

7 1 598

1 REPLY 1

never-displayed

Multimodal embeddings resized to 512x512 - any logic to resize around the dominant object?