The chatbot is integrated with image-generation capabilities in a major ChatGPT update of the year. In a live stream on 25 March, OpenAI CEO Sam Altman announced that
the company’s GPT-4o model can now create and modify images. Until now, the AI model could only generate and edit text.
This feature is currently available in ChatGPT and Sora for subscribers to the company’s $200-a-month Pro Plan. The company also announced that it will soon be available to Plus and free users, as well as developers using the company’s API service.
OpenAI announced the GPT-4o model in May 2024. The model has the capability to handle text, speech, and video as the ‘o’ in the name stands for ‘omni’. The company claims it spent a year using more than 100 human workers to train this model to generate realistic images. The company reports to the Wall Street Journal
“Today’s refined GPT-4o model makes it easier for consumers, and businesses, to create more life-like images and paragraphs of comprehensible text—and even company logos and slide decks,”
Replacing DALL-E 3
According to the company, the output image generated through this model ‘thinks’ a bit longer than the image-generation model. However, it replaces DALL-E 3 to make a more accurate and detailed image. This advanced feature will enable the chatbot to edit pictures with people in them. It can edit details like foreground and background.
Reinforced Learning From Human Feedback
This new feature is based on ‘reinforced learning from human feedback’ (RLHF), a technique widely used by AI companies to train their models. The chatbot has over 400 million weekly users, and these human trainers could significantly impact it.
In a review, GoDaddy’s Chief Data and Analytics Officer, Travis Muhlestein said this chatbot is
“helping us embrace AI-driven content creation.”
The company uses this platform to create stock images and logos.
Artist’s Rights
While responding to the artist’s concern over copyrights, the Chief Operating Officer of OpenAI reported that.
“We’re respecting of the artists’ rights in terms of how we do the output, and we have policies in place that prevent us from generating images that directly mimic any living artists’ work”.
Gemini 2.0
This new feature of ChatGPT 4o is preceded by Google’s launch of Gemini 2.0 that comprises image-generation feature. This feature enabled users to remove watermarks and depict copyright characters.