The speech generated by Google Text-to-Speech has ... - Page 2

thanhdx · 04-27-2024 04:04 AM

I development an AI voice assistant project with TTS feature use Google Text to Speech. However synthesizeSpeech's ouput contain background noise like a mouse click.

Has anyone had a similar problem or is there any workaround for this problem. Thanks a lot.

This is current config.

            const request = {
                input: { text: text },
                voice: {
                    languageCode: "en-US",
                    name: "en-US-Studio-O",
                    ssmlGender: protos.google.cloud.texttospeech.v1.SsmlVoiceGender.FEMALE
                },
                audioConfig: {
                    audioEncoding: protos.google.cloud.texttospeech.v1.AudioEncoding.LINEAR16,
                    sampleRateHertz: 24000,
                    effectsProfileId: ['handset-class-device'],
                    speakingRate: 1.45
                },
            };

            const [response] = await this.client.synthesizeSpeech(request);

Audio sample: https://jmp.sh/s/SQM0UJsOBlGikL5aFuJG

I tried with other params config about audioEncoding, sampleRateHertz, effectsProfileId however it still error.

Updated 27/4: I tried with both ElevenLab & Google TTS service with input text as "Hello". When emitting audio, TTS will contain click noise while ElevenLab does not.
Audio base64 string here: https://drive.google.com/file/d/1DG5KxvllqaQHJj6FK0L5Ovj3zHiLreIU/view?usp=sharing

The speech generated by Google Text-to-Speech has background noise that sounds like a mouse click