Biggest ChatGPT Update: DALL·E 3, Voice-Chat, Image Input!
Last updated
Last updated
ChatGPT is getting a massive upgrade with the relaese of DALL-E 3, Voice-chats and image inputs.
When OpenAI unveiled ChatGPT, it took the world by storm. Ability to converse to an intelligent system felt almost like a dream. As time went on we got a big update in the form of GPT-4 but some of the features unveiled in the keynote were missing. But the dream features are finally here with the release of DALL-E 3, voice-chat and image inputs. are being rolled out over the next two weeks.
Let’s start off with one of the biggest updates that is coming to ChatGPT in the next couple of weeks and that is the direct integration of DALL-E 3, OpenAI’s next generation image model. We were expecting a major update to DALL-E 2 as other image models, such as StableDiffusion, Midjourney had easily caught up and surpassed the model from OpenAI.
From OpenAI’s blog we got a sneak peek at what are some major upgrades that are with DALL-E 3:
DALL-E 3 can significantly pick up more details and nuances compared to DALL-E 2.
Less prompt engineering required compared to other models. Making it easier to get the image you want.
Highly accurate and photorealistic images compared to the previous model. But will still need significant testing to compare to the best models out there.
Can now display text within images.
Direct integration of ChatGPT allows users to easily tailor images that are more suited to them. This is probably the biggest advantage DALL-E 3 will have compared to the competition.
Does not generate violent, hateful, adult images.
Declines request to display images of public figures.
Copyright free, allowing anyone to use, modify and sell images generated by DALL-E 3.
Below are some of the examples of image created using DALL-E 3 and their associated prompts. Thanks to Logan for providing the images.
ChatGPT is getting voice-chat feature, allowing users to go back-and-forth without the need of constant pressing a send button. Voice-chat feature is being rolled out to ChatGPT Plus and Enterprise users in the next two weeks. This feature is exclusive to mobile devices (iOS and Android).
On your mobile device head over to Settings > New Features and enable voice conversations. This will enable you to talk to ChatGPT and you can enable a different voices. Currently there are ‘Juniper’, ‘Sky’, ‘Cove’, ‘Ember’ and ‘Breeze’.
Behind the scenes, the audio is being converted to text using Whisper (OpenAI’s own open-source speech-text model), and put through GPT-4. Which isn’t that new but the major difference is the Text-to-Speech (TTS) model, which is a very high quality. We have seen similar TTS models from ElevenLabs and Microsoft, although ElevenLabs does offer multiple language support.
ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms).
ChatGPT is finally getting the image inputs which were unveiled earlier this year in March. Although not a ‘new’ feature but image inputs is going to make ChatGPT a true multi-model system.
You can upload an image directly to ChatGPT and ask questions related to it. In an example provided by OpenAI (see tweet below), a user uploads an image of a bike and asks how to lower the seat. It provides a step-by-step instruction on how to do so. But the thing that makes this amazing is that you can upload image after image and each time ChatGPT will adjust to your needs based on the questions.
As the general model is still GPT-4, it is going to refuse a lot of queries and still hallucinate. So avoid asking medical related questions and blindly following it.
These are some incredible features (https://openai.com/blog/chatgpt-can-now-see-hear-and-speak) which are being rolled out to ChatGPT Plus and Enterprise users over the coming weeks and personally for me, I simply can’t wait to see the incredible things people are going to be creating with these.