OpenAI is rolling out new voice and image features for its chatbot, ChatGPT, allowing for a more human-like interaction, the company announced Monday.
Users of ChatGPT will now be able to use their voice to engage in conversation with the chatbot. The feature is powered by a new text-to-speech model, which has the ability to generate human-like audio. OpenAI collaborated with professional voice actors to create the feature.
ChatGPT will also be able to interpret photos from users to answer questions. The image recognition service is powered by multimodal GPT-3.5 and GPT-4, which apply language reasoning skills to images, with or without text.
OpenAI will be rolling out the features on ChatGPT over the next two weeks for subscribers of the service’s Plus and Enterprise tiers. Voice will eventually come out on iOS and Android, while image recognition will be available on all platforms.
In the announcement of the new capabilities of ChatGPT, OpenAI noted some warnings about privacy and the limitations of the features. The company said they have taken “technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about people since ChatGPT is not always accurate and these systems should respect individuals’ privacy.”
“Real world usage and feedback will help us make these safeguards even better while keeping the tool useful,” OpenAI wrote in its announcement.
“We are transparent about the model’s limitations and discourage higher risk use cases without proper verification,” the company continued. “Furthermore, the model is proficient at transcribing English text but performs poorly with some other languages, especially those with non-roman script. We advise our non-English users against using ChatGPT for this purpose.”
On Monday, Spotify launched a podcast translation system that will use OpenAI technology to clone host’s voices into a different language.