I development an AI voice assistant project with TTS feature use Google Text to Speech. However synthesizeSpeech's ouput contain background noise like a mouse click.
Has anyone had a similar problem or is there any workaround for this problem. Thanks a lot.
This is current config.
const request = {
input: { text: text },
voice: {
languageCode: "en-US",
name: "en-US-Studio-O",
ssmlGender: protos.google.cloud.texttospeech.v1.SsmlVoiceGender.FEMALE
},
audioConfig: {
audioEncoding: protos.google.cloud.texttospeech.v1.AudioEncoding.LINEAR16,
sampleRateHertz: 24000,
effectsProfileId: ['handset-class-device'],
speakingRate: 1.45
},
};
const [response] = await this.client.synthesizeSpeech(request);
Audio sample: https://jmp.sh/s/SQM0UJsOBlGikL5aFuJG
I tried with other params config about audioEncoding, sampleRateHertz, effectsProfileId however it still error.
Updated 27/4: I tried with both ElevenLab & Google TTS service with input text as "Hello". When emitting audio, TTS will contain click noise while ElevenLab does not.
Audio base64 string here: https://drive.google.com/file/d/1DG5KxvllqaQHJj6FK0L5Ovj3zHiLreIU/view?usp=sharing