# Text to Speech (TTS)

Text-to-Speech (TTS) technologies, focus on the advancements brought by Generative AI in TTS synthesis. The video highlights how contemporary TTS systems, using neural networks, have achieved a remarkable ability to emulate human speech. These advancements find applications in a wide range of areas, including accessibility, entertainment, and customer service.

Text to Speech ecosystem has been booming lately:

1. **Chat TTS** - English + Chinese TTS model optimised for daily conversations/ dialogues + Voice Cloning
2. MARS5 TTS - English only but gives insane prosodic control paired with voice cloning
3. Parler TTS - Smol but powerful text prompt controlled TTS (we’re scaling it up right now)
4. Toucan - Massively Multilingual TTS in 4000+ languages (works even on CPU)
5. **MetaVoice** - 1B param model with deep voice cloning control. English only.

We’re only half way through the year, pumped to see what the rest has in store for us!

What else am I missing from this year?

Below is a list of desired features that we have put together that a  good AI Text-to-Speech Tool should have. Of course this is a high bar as it is unlikely that any one tool will have all these features, but as technology evolves, it is not unreasonable to expect most tools having these features;

&#x20;**1.    Naturalness:** The voice should sound human-like, without robotic intonations. The peaking styles should be expressive emotional speaking styles and not ‘robotic’. This includes the choice of emphasizing specific words to be able to express a range of emotions like happiness, excitement, sadness, etc.

**2.    Voice Customization:** This means options to modify pitch, speed, tone, accents and more.

**3.    Pronunciation Editor:** The tool corrects Pronunciation, emphasis, and pitch control.

**4.    Both Audio and Text input support:** This means to receive Audio as input in addition to text and convert both to your chosen Speech profile.

**5.    Language & Accent Variety: T**he availability of multiple languages and regional accents.

**6.    Large library offering:** A diverse and large collection of AI voices across languages.

**7.    AI Voice-Over features:** These are features for adding Speech to Videos:

**8.    Integration Capabilities:** This means ease of integration with other software or platforms.

**9.  Accessibility Features:** Support for users with disabilities.

**10. Scalability:** The ability to handle large volumes of text efficiently.

**11.. Add pauses:** This is for when the user wants to give the voiceovers an even more human feel.

**12.Preview mode:** This means to see results and apply changes without getting charged if pricing is by ‘word’ or ‘character count’.

**13.The ability to Scan documents** and convert printed text to speech.

**14.. Easy to use** and friendly user interface.

Many tools  listed offer a variety of features and cater to different user needs, ranging from professional voiceover creation to enhancing accessibility, but none have all the above features.

### Here are the 18 tools&#x20;

1. NaturalReader
2. Microsoft Azure
3. Uberduck
4. Murf AI Generator
5. Google Cloud Text-to-Speech
6. ElevenLabs
7. Kapwing
8. Resemble AI
9. Synthesys
10. Lovo.ai
11. Speechify
12. Verbatik
13. Clipchamp (by Microsoft)
14. WellSaid Labs .
15. Deepbrain AI
16. Fliki
17. FineShare
18. Play.ht
