Re: AutoML deprecation: Upgrade custom translation...

elgonher · 02-06-2025 05:29 AM

Hello,

I have some custom legacy machine translation models that I'd like to upgrade since AutoML API will be deprecated in September this year.

I noticed that only models that don't have an associated legacy dataset are available to upgrade by themselves. All my legacy models have an associated legacy dataset so I was wondering what would be the best practice to upgrade these models.

Thanks in advance.

ibaui

Hi @elgonher,

Welcome to Google Cloud Community!

You're correct that the limitation of upgrading legacy models with associated datasets presents a significant challenge when migrating from AutoML Translation due to its deprecation.

The issue arises because the modern Translation API (often using the Advanced Translation API and generally referred to as "Custom Translation" in the Cloud Console) handles datasets and models separately. In contrast, legacy AutoML Translation tightly coupled them. The ideal workflow now is:

Prepare your data as a Translation dataset independently from the model.

Train a custom translation model using that dataset.

Since your legacy models are intrinsically tied to their datasets, you'll essentially need to recreate the models using the modern approach.

Here are some strategies you can consider to upgrade your custom legacy machine translation models from AutoML Translation, given the associated legacy datasets:

1. Export Data: Export the parallel data (source & target language pairs) from your legacy AutoML Translation datasets.

Format: Ensure you export the data in a format compatible with the Advanced Translation API.
Data Cleaning: Before importing, carefully examine your data. Clean up any inconsistencies, errors, or formatting issues. Low-quality data will negatively impact the performance of your new model.

2. Create New Datasets: Create new Translation Datasets within the Cloud Translation API, and import your exported data into these datasets.

Dataset Types: You'll likely create separate datasets for training, validation, and testing.
Dataset Size: If your dataset is extremely large, consider sampling or using techniques to manage the data effectively. Google Cloud Storage can be a useful tool for storing large datasets.

3. Train New Models: Train new custom translation models using the Advanced Translation API, selecting the new Translation Datasets you created.

Model Settings: Explore different model settings during training. Adjust parameters to optimize performance for your specific language pair and domain.

4. Evaluate: Thoroughly evaluate the performance of your new models against your legacy models.

Metrics: Use appropriate evaluation metrics, such as BLEU score, to compare the performance of the new and legacy models.

You can also refer to the following documents for more details:

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

elgonher

Hi ibaui,

Thank you for your answer.

I was investigating a bit further and was wondering if I could just upgrade the legacy datasets first and then the legacy models. I found this option within the datasets section:

Select existing AutoML (legacy) datasets to manage through the Cloud Translation API instead of the AutoML API. During the upgrade process, a new dataset is created that is a copy of your existing dataset with a new ID. Models that are associated with the upgraded datasets are also upgraded. Your existing legacy datasets and models remain accessible and unchanged during and after the upgrade process.

As I have quite a few models, I guess this would prevent manual creation of new datasets and training of new models, am I correct?

Thanks,

Elvira

AutoML deprecation: Upgrade custom translation models to be managed by Cloud Translation API