19-5-2024 (SYDNEY) Earlier this week, OpenAI unveiled GPT-4o (“o” for “omni”), a groundbreaking new version of the artificial intelligence (AI) system powering the wildly popular ChatGPT chatbot. Touted as a significant stride towards more natural interactions with AI, GPT-4o boasts the ability to engage in voice conversations with users in near real-time, exhibiting human-like personality and behaviour.
This emphasis on personality is poised to be a contentious point. In OpenAI’s demonstrations, GPT-4o comes across as friendly, empathetic, and engaging. It cracks “spontaneous” jokes, giggles, flirts, and even bursts into song. The AI system also showcases its capability to respond to users’ body language and emotional tone.
Launched with a streamlined interface, OpenAI’s revamped version of the ChatGPT chatbot appears designed to heighten user engagement and facilitate the creation of novel apps leveraging its text, image, and audio capabilities.
GPT-4o represents another leap forward in AI development. However, the focus on engagement and personality raises pertinent questions about whether it will genuinely serve users’ interests and the ethical implications of creating AI that can simulate human emotions and behaviours.
OpenAI envisions GPT-4o as a more enjoyable and engaging conversational AI. In principle, this could make interactions more effective and increase user satisfaction.
Studies indicate that users are more likely to trust and cooperate with chatbots exhibiting social intelligence and personality traits. This could prove relevant in fields such as education, where research has suggested AI chatbots can boost learning outcomes and motivation.
However, some commentators are apprehensive that users may become overly attached to AI systems with human-like personalities or emotionally harmed by the one-way nature of human-computer interaction.
GPT-4o immediately drew comparisons – including from OpenAI’s CEO Sam Altman – to the 2013 science-fiction film Her, which vividly portrays the potential pitfalls of human-AI interaction.
In the movie, the protagonist, Theodore, becomes deeply fascinated and attached to Samantha, an AI system with a sophisticated and witty personality. Their bond blurs the lines between the real and the virtual, raising questions about the nature of love and intimacy, and the value of human-AI connection.
While we should not seriously compare GPT-4o to Samantha, it raises similar concerns. AI companions are already here. As AI becomes more adept at mimicking human emotions and behaviours, the risk of users forming deep emotional attachments increases. This could lead to over-reliance, manipulation, and even harm.
While OpenAI demonstrates concern with ensuring its AI tools behave safely and are deployed in a responsible way, we have yet to learn the broader implications of unleashing charismatic AIs onto the world. Current AI systems are not explicitly designed to meet human psychological needs – a goal that is hard to define and measure.
GPT-4o’s impressive capabilities underscore the importance of having a system or framework in place to ensure AI tools are developed and used in ways that align with public values and priorities.
GPT-4o can also work with video (of the user and their surroundings, via a device camera, or pre-recorded videos), and respond conversationally. In OpenAI’s demonstrations, GPT-4o comments on a user’s environment and clothes, recognizes objects, animals, and text, and reacts to facial expressions.
Google’s Project Astra AI assistant, unveiled just one day after GPT-4o, displays similar capabilities. It also appears to have visual memory: In one of Google’s promotional videos, it helps a user find her glasses in a busy office, even though they are not currently visible to the AI.
GPT-4o and Astra continue the trend towards more “multimodal” models that can work with text, images, audio, and video. GPT-4o’s predecessor, GPT-4 Turbo, can process text and images together, but not audio and video. The original version of ChatGPT, released less than two years ago, was based solely on text.
GPT-4o is also significantly faster than its predecessor.
The ability to work across audio, vision, and text in real-time is considered crucial to developing advanced AI systems that can understand the world and effectively achieve complex and meaningful goals.
However, some critics argue that GPT-4o’s text capabilities are only incrementally better than GPT-4 Turbo and competitors such as Google’s Gemini Ultra and Anthropic’s Claude 3 Opus.
Will major AI labs be able to sustain the recent rapid pace of improvement by continuing to build bigger and more sophisticated models? This is a hotly debated topic among experts, and the outcome will determine the impact of the technology over the coming years.
A less flashy but significant aspect of GPT-4o’s launch is that, unlike its GPT-4 family precursors, the new AI system is available to all users in the free version of ChatGPT, subject to usage limits.
This means millions of users worldwide just got an upgrade from GPT-3.5 to a more powerful AI system with more features. GPT-4o is significantly more useful than GPT-3.5 for various purposes, such as work and education. The impact of this development will become more apparent over time.
OpenAI’s unveiling of GPT-4o disappointed enthusiasts for ever more powerful AI systems, who hoped GPT-5’s arrival was imminent after over a year since GPT-4’s launch.
Instead, this week’s unveiling of GPT-4o and Google’s latest AI announcements emphasize the features being incorporated into their products. These new developments point to possibilities such as more sophisticated virtual assistants capable of performing complex tasks on behalf of users, involving richer interaction and planning.