Google translate v3 api is ignoring my glossaries

maurice85 · 08-18-2024 03:47 PM

I have a spring boot java application that uses the Java SDK for Google translate v3.

I am able to send translation requests via this SDK and i'm also able to create glossaries based on csv files that are stored in Google storage. Lastly I am able to retrieve a list of the glossaries in google cloud using the API.

However when i try to make a translate request with a glossary configured, Google translate will flatout ignore the glossary and translate the terms as it sees fit instead. This problem occurs for all glossaries i've made.

I have verified that the service account that is in the json authentication file has the required roles. In fact the assigned role is Owner, so it has all glossary related permissions by default.

I have verified that this isn't caused by a case-sensitivity issue. The terms that i use for testing have the exact same case as the terms stored in the glossary.

I have also verified that the glossaries were all stored in the location us-central1. There is nothing that i'm doing wrong as far as i can tell, but still the api ignores the glossary configuration. When i change the configuration to a non-existing glossary id, then the code will throw an exception. Meaning, the configuration for the existing glossary must be valid.

Lastly I have also validated the csv files before uploading them to the google storage bucket using an online validation tool.

Can anyone tell me what i'm doing wrong? This is the code that I have now.

When making the request:

    private String translateDocumentWithV3(Language sourceLanguage, Language targetLanguage, String textToTranslate, TranslateContentType translateContentType) {
            String projectId = "myproject";
            String location = "us-central1";
            LocationName locationName = LocationName.of(projectId, location);
            GlossaryName glossaryName = GlossaryName.of(projectId, location, String.format("glossary-%s", targetLanguage.getLanguageCode()));


            TranslateTextGlossaryConfig glossaryConfig = TranslateTextGlossaryConfig.newBuilder().setIgnoreCase(true)
                    .setGlossary(glossaryName.toString())
                    .build();



            log.info("The glossary name is {}", glossaryName.toString());

            TranslateTextRequest request =
                    TranslateTextRequest.newBuilder()
                            .setParent(locationName.toString())
                            .setMimeType(translateContentType.toString())
                            .setSourceLanguageCode(sourceLanguage.getLanguageCode())
                            .setTargetLanguageCode(targetLanguage.getLanguageCode())
                            .setGlossaryConfig(glossaryConfig)
                            .addContents(textToTranslate)
                            .build();

            TranslateTextResponse response = googleTranslateV3.translateText(request);
        String translatedText = response.getTranslationsList().get(0).getTranslatedText();
        return translatedText;
    }

For creating the glossary i've tried this approach (without using LanguageCodeSet)

        String projectId = "myproject";
        String location = "us-central1";
        LocationName locationName = LocationName.of(projectId, location);

        languageCodes.stream().forEach(languageCode -> {

            GlossaryName glossaryName = GlossaryName.of(projectId, location, String.format("glossary-%s", languageCode));

            GcsSource gcsSource = GcsSource.newBuilder().setInputUri(String.format("gs://gp-glossary-bucket/glossary-%s.csv", languageCode)).build();
            GlossaryInputConfig inputConfig = GlossaryInputConfig.newBuilder().setGcsSource(gcsSource).build();

            Glossary glossary = Glossary.newBuilder()
                    .setName(glossaryName.toString())
                    .setLanguagePair(Glossary.LanguageCodePair.newBuilder()
                            .setSourceLanguageCode("en")
                            .setTargetLanguageCode(languageCode)
                            .build())
                    .setInputConfig(inputConfig)
                    .build();

            CreateGlossaryRequest createRequest =
                    CreateGlossaryRequest.newBuilder()
                            .setParent(locationName.toString())
                            .setGlossary(glossary)
                            .build();

and the other approach that does use Glossary.LanguageCodeSet (as used in the coding example of the documentation)

            Glossary.LanguageCodesSet languageCodesSet =
                    Glossary.LanguageCodesSet.newBuilder().addAllLanguageCodes(List.of("en", languageCode)).build();

            // Configure the source of the file from a GCS bucket

            Glossary glossary =
                    Glossary.newBuilder()
                            .setName(glossaryName.toString())
                            .setLanguageCodesSet(languageCodesSet)
                            .setInputConfig(inputConfig)
                            .build();

            CreateGlossaryRequest createRequest =
                    CreateGlossaryRequest.newBuilder()
                            .setParent(locationName.toString())
                            .setGlossary(glossary)
                            .build();

The source languagecode and targetlanguage code used always match the glossary source and target language.

But still despite all this it still ignores the glossaries. Can anyone help?

EDIT: I tried to see if it would work if i'd change the package from v3 to v3beta1 but the same thing is happening. But when i make the request using cURL i get both a translation back that respects the glossary terms, as well as one that doesn't. Both are in one JSON object. Example below:

{
  "translations": [
    {
      "translatedText": "groeiende gids, jaarlijks, rapportvlag, inzendingen, open bestoven"
    }
  ],
  "glossaryTranslations": [
    {
      "translatedText": "kweekgids , Eenjarige plant , meldingsvlag , Inzendingen , open pollinated",
      "glossaryConfig": {
        "glossary": "projects/993347816880/locations/us-central1/glossaries/glossary-nl",
        "ignoreCase": true
      }
    }
  ]
}

The correctly translated text is the value of 'glossaryTranslations'. And I see that the TranslateTextResponse response object returned by the java SDK's translate client also exposes this json property with the method

getGlossaryTranslations(index)

So i guess this solves my case.

McMaco

Hello maurice85,

Welcome to Google Cloud Community!

I understand that the correctly translated text is found within the glossaryTranslations property of the API response. It's great that you've identified this.

The getGlossaryTranslations(index) method in the Java SDK's TranslateTextResponse class allows you to access a specific glossary translation based on its index in the list of translations.

By using this method, you can efficiently retrieve the desired glossary translation from the API response and process it according to your needs.

I hope the above information is helpful.