Google Text-to-Speech
Google Text-to-Speech AI
Google Text-to-Speech AI
Convert text into natural-sounding speech using an API powered by the best of Google’s AI technologies.
Google Cloud Text-to-Speech enables developers to synthesize natural-sounding speech with 100+ voices, available in multiple languages and variants.
It applies DeepMind’s groundbreaking research in WaveNet and Google’s powerful neural networks to deliver the highest fidelity possible. As an easy-to-use API, you can create lifelike interactions with your users, across many applications and devices.
Features and Demo:
All features
Custom Voice (beta)
Train a custom speech synthesis model using your own audio recordings to create a unique and more natural-sounding voice for your organization. You can define and choose the voice profile that suits your organization and quickly adjust to changes in voice needs without needing to record new phrases. Learn more.
Voice and language selection
Choose from an extensive selection of 220+ voices across 40+ languages and variants, with more to come soon.
WaveNet voices
Take advantage of 90+ WaveNet voices built based on DeepMind’s groundbreaking research to generate speech that significantly closes the gap with human performance.
Text and SSML support
Customize your speech with SSML tags that allow you to add pauses, numbers, date and time formatting, and other pronunciation instructions.
Pitch tuning
Personalize the pitch of your selected voice, up to 20 semitones more or less than the default.
Speaking rate tuning
Adjust your speaking rate to be 4x faster or slower than the normal rate.
Volume gain control
Increase the volume of the output by up to 16db or decrease the volume up to -96db.
Integrated REST and gRPC APIs
Easily integrate with any application or device that can send a REST or gRPC request including phones, PCs, tablets, and IoT devices (e.g., cars, TVs, speakers).
Audio format flexibility
Convert text to MP3, Linear16, OGG Opus, and a number of other audio formats.
Audio profiles
Optimize for the type of speaker from which your speech is intended to play, such as headphones or phone lines.
Text-to-Speech pricing
Text-to-Speech is priced based on the number of characters sent to the service to be synthesized into audio each month. You must enable billing to use Text-to-Speech, and will be automatically charged if your usage exceeds the number of free characters allowed per month. For information about how to keep track of your character totals, see Monitoring API usage. Price is calculated per character.
The total number of characters in the input string are counted for billing purposes, including spaces. All Speech Synthesis Markup Language (SSML) tags except mark
are also included in the character count. For example, this input string counts as 79 characters, including the SSML tags, newlines, and spaces:
Last updated