Text to Speech (TTS)

Text-to-Speech (TTS) technologies, focus on the advancements brought by Generative AI in TTS synthesis. The video highlights how contemporary TTS systems, using neural networks, have achieved a remarkable ability to emulate human speech. These advancements find applications in a wide range of areas, including accessibility, entertainment, and customer service.

Text to Speech ecosystem has been booming lately:

Chat TTS - English + Chinese TTS model optimised for daily conversations/ dialogues + Voice Cloning
MARS5 TTS - English only but gives insane prosodic control paired with voice cloning
Parler TTS - Smol but powerful text prompt controlled TTS (we’re scaling it up right now)
Toucan - Massively Multilingual TTS in 4000+ languages (works even on CPU)
MetaVoice - 1B param model with deep voice cloning control. English only.

We’re only half way through the year, pumped to see what the rest has in store for us!

What else am I missing from this year?

Below is a list of desired features that we have put together that a good AI Text-to-Speech Tool should have. Of course this is a high bar as it is unlikely that any one tool will have all these features, but as technology evolves, it is not unreasonable to expect most tools having these features;

1. Naturalness: The voice should sound human-like, without robotic intonations. The peaking styles should be expressive emotional speaking styles and not ‘robotic’. This includes the choice of emphasizing specific words to be able to express a range of emotions like happiness, excitement, sadness, etc.

2. Voice Customization: This means options to modify pitch, speed, tone, accents and more.

3. Pronunciation Editor: The tool corrects Pronunciation, emphasis, and pitch control.

4. Both Audio and Text input support: This means to receive Audio as input in addition to text and convert both to your chosen Speech profile.

5. Language & Accent Variety: The availability of multiple languages and regional accents.

6. Large library offering: A diverse and large collection of AI voices across languages.

7. AI Voice-Over features: These are features for adding Speech to Videos:

8. Integration Capabilities: This means ease of integration with other software or platforms.

9. Accessibility Features: Support for users with disabilities.

10. Scalability: The ability to handle large volumes of text efficiently.

11.. Add pauses: This is for when the user wants to give the voiceovers an even more human feel.

12.Preview mode: This means to see results and apply changes without getting charged if pricing is by ‘word’ or ‘character count’.

13.The ability to Scan documents and convert printed text to speech.

14.. Easy to use and friendly user interface.

Many tools listed offer a variety of features and cater to different user needs, ranging from professional voiceover creation to enhancing accessibility, but none have all the above features.

Here are the 18 tools

NaturalReader
Microsoft Azure
Uberduck
Murf AI Generator
Google Cloud Text-to-Speech
ElevenLabs
Kapwing
Resemble AI
Synthesys
Lovo.ai
Speechify
Verbatik
Clipchamp (by Microsoft)
WellSaid Labs .
Deepbrain AI
Fliki
FineShare
Play.ht

PreviousSonicVisionLM NextMurf AI

Last updated 1 year ago