Image-to-Image Translation Possible w/ Google Tran...

yhjc2692 · 06-04-2023 01:30 AM

Hi. I plan to utilize a multimodal vision-language transformer that only (reasonably) supports English. The image translation feature in the web version of Google almost perfectly suits my needs. Is this available for use via an API? Thanks!

kvandres

Good day @yhjc2692,

Welcome to Google Cloud Community!

Yes, every pre-trained model in Google Cloud can be accessed thru API. Here are the following documentation that you can use for vision API and translation API: https://cloud.google.com/vision/docs
https://cloud.google.com/translate/docs/overview
You can also check the following documentation to get started with the client libraries of vision and translate:
https://cloud.google.com/vision/docs/reference/libraries
https://cloud.google.com/translate/docs/reference/libraries

However, the general use case for Google Translate API is to translate languages from one to another (e.g. English to Japanese) while for the Google Vision API it is used for classifying, detecting objects, assigning labels, reading printed and written text. For your use case, You might need to create an AutoML or a Custom model.

AutoML: Create and train models with minimal technical knowledge and effort. To learn more about AutoML, see AutoML beginner's guide.
Custom training: Create and train models at scale using any ML framework. To learn more about custom training on Vertex AI, see Custom training overview.

You can check this link for more information: https://cloud.google.com/vertex-ai/docs/training-overview#image

Hope this helps!

yhjc2692

Hi @kvandres, thank you for the answer. However, my initial question is left unanswered; I understand that training a custom model (using AutoML or Vertex) is possible, but I want to know whether the end-to-end, image-to-image translation feature currently deployed in Google Translate is accessible via an API endpoint—because to implement a generalized img2img translation pipeline, including scene text detection, translation, and text replacement (with text size, orientation, color estimation) would easily exceed the scope of my project and/or capabilities.

From my understanding, achieving this with current suite of vision and translate APIs would be a nontrivial task, which is why I wanted to know whether it has a dedicated API. Have you already tested out the feature that I am referring to? If not, please feel free to check it out at: https://translate.google.com by pressing the document/image button on the top bar.

geeforce

Hi @kvandres,

Thanks for the reply, but I agree with @yhjc2692 's point...you don't answer his question about an E2E pipe via API. Can you please answer this explicitly?

JollyToday

We have the image translation api that supports 100+languages.

https://github.com/JollyToday/AI_Image_Translator_Translate_Images

yhjc2692

This is cool! I was able to learn about the general gist of the API and it looks good. Do you have any disclosable technical documentation on how it works under the hood?

samikadze

Did you manage to find a solution for your request?

yhjc2692

Not really. This was not on top of my priority list so I didn’t end up spending more time on it. However, I think it is definitely feasible to implement a solution.

If you are interested in custom implementation, please reach out! I have some ideas on how to make it work.

Image-to-Image Translation Possible w/ Google Tranlsate API ?