Speech to text V2 Java Help? - Page 2

aPeaceOfAdam · 05-25-2024 11:52 PM

Hi Everyone,

I'm tearing my hair out trying to get my first java speech to text transcript going. The code is below. The wav file uploads to the storage bucket fine, if I do the transcript manually using the web front end the transcript works, but my code below gives the error INVALID_ARGUMENT: Invalid resource field value in the request at the line speech.batchRecognizeOperationCallable().call(request);. Unfortunately I don't get any more information than that so I'm kinda debugging blind.

Any help would be greatly appreciated - I've hit a wall on this one.

private static String getTranscript(byte[] audio) throws IOException, ExecutionException, InterruptedException {
    InputStream credentialsStream = TestCompressor.class.getResourceAsStream("/keys/google.json");
    GoogleCredentials credentials = GoogleCredentials.fromStream(credentialsStream);

    Storage storage = StorageOptions.newBuilder().setCredentials(credentials).build().getService();
    BlobId blobId = BlobId.of("isaidusaid", "testfile.wav");
    BlobInfo blobInfo = BlobInfo.newBuilder(blobId).setContentType("audio/wav").build();
    Blob blob = storage.create(blobInfo, audio);

    FixedCredentialsProvider credentialsProvider = FixedCredentialsProvider.create(credentials);

    SpeechSettings speechSettings =
            SpeechSettings.newBuilder()
                    .setCredentialsProvider(credentialsProvider)
                    .build();

    String gcsUri = "gs://isaidusaid/testfile.wav";

    SpeechClient speech = SpeechClient.create(speechSettings);


    String parent = "projects/isaidusaid/locations/global";

    RecognitionConfig recognitionConfig = RecognitionConfig.newBuilder()
            .setExplicitDecodingConfig(ExplicitDecodingConfig.newBuilder().setEncoding(ExplicitDecodingConfig.AudioEncoding.LINEAR16).setSampleRateHertz(16000).build())
            .addLanguageCodes("en-US")
            .setModel("long").build();
    BatchRecognizeFileMetadata metadata = BatchRecognizeFileMetadata.newBuilder().setUri(gcsUri).build();
    RecognitionOutputConfig outputConfig = RecognitionOutputConfig.newBuilder().setInlineResponseConfig(
            InlineOutputConfig.newBuilder().build()
    ).build();
    BatchRecognizeRequest request = BatchRecognizeRequest.newBuilder()
            .setConfig(recognitionConfig)
            .addFiles(metadata)
            .setRecognitionOutputConfig(outputConfig)
            .build();

    BatchRecognizeResponse response = speech.batchRecognizeOperationCallable().call(request);

    StringBuilder builder = new StringBuilder();
    for (SpeechRecognitionResult result : response.getResultsMap().get(gcsUri).getInlineResult().getTranscript().getResultsList()) {
        // There can be several alternative transcripts for a given chunk of speech. Just use the
        // first (most likely) one here.
        if (result.getAlternativesCount() > 0) {
            SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
            builder.append(alternative.getTranscript());
        }
    }

    storage.delete(blobId);
    return builder.toString();
}