OpenAI’s new GPT-4o can be sarcastic, sing happy birthday and teach math

OpenAI’s new GPT-4o is here — and it can laugh at bad jokes (and crack its own), sing in tune and help hail London cabs with realistic emotion and amidst regular human interruption. 

OpenAI today released 16 videos of GPT-4o (short for GPT-4 omni) in action, showing the multimodal foundation large language model (LLM) interacting with the world in male and female voices in real time based on audio, visual and text inputs. 

For instance, after correctly identifying that the human it was speaking with was preparing to make a big announcement — based on his professional attire and the presence of studio lights and a microphone — the model was informed that it was the subject.  

A female voice replied, seemingly coyly: “The announcement is about me? Well color me intrigued. You’ve got me on the edge of my…well, I don’t really have a seat but you get the idea.”

OpenAI revealed the new free model today at its highly-anticipated Spring Updates event, with a notable 113,000 people joining the livestream. The model’s text and image input roll out today in OpenAI’s API and ChatGPT, with voice and video available in the coming weeks.

Praising math abilities, giving fashion advice

GPT-4o can identify a users’ emotional state and surroundings, realistically simulate various emotions of its own and offer advice on a multitude of topics. Models on different devices can also interact with each other. 

For instance, in one of the videos posted today by OpenAI, the model was told that it would be conversing with another version of itself. To this, a female voice responded: “Well, well well, just when I thought things couldn’t get any more interesting — talking to another AI that can see the world. This sounds like a plot twist in the AI universe.”

After being asked to be punchy, direct and describe everything in their line of vision, the models took turns describing a man who was “sleek and stylish with their black leather jacket and light colored shirt” sitting in a room with “natural and artificial” lighting that was “dramatic and modern” and featuring a plant in the background that added “a touch of green to the space.” 

When a second person entered to give bunny ears to the first, GPT-4o was asked to sing a song based on what happened — and it did, crooning, “surprise guests with a playful streak.” 

In other videos, the model laughs at dad jokes (“that’s perfectly hilarious”) performs real-time translation of Spanish to English and vice versa, sings in a lullaby about “majestic potatoes” (first replying to the prompt, “now that’s what I call a mashup”), emulates a sarcastic voice much like the droll MTV cartoon character “Daria,” correctly identifies winners of rock-paper-scissors and recognizes that it’s someone’s birthday based on the presence of a piece of cake with a candle in it. 

It also interacts with puppies — replying in a sing-songy tone in the manner people talk to dogs, “well hello there cutie, what’s your name little fluff ball?” (it was Bowser by the way) — and guided a blind man through London, identifying through video input that the King was in residence based on the presence of the Royal Standard flag and described ducks “gently gliding across the water, moving in a fairly relaxed manner, not in a hurry.”

Additionally, GPT-4o can teach math; in one video it walks a young man through a problem based on an image of a triangle. The model asked the student to identify which sides of the triangle were the opposite adjacent and hypotenuse, relative to angle alpha. When he deduced that alpha equaled 7 over 25, the female voice praised: “You did a great job identifying the sides.”

GPT-4o can give fashion advice, too. In yet another video the LLM helped a mussy-haired job candidate wearing a slouchy T-shirt determine whether he looked presentable enough for an interview. 

A female voice chuckled and advised him to run a hand through his hair. The model also remarked drolly, “You definitely have the ‘I’ve been coding all night’ look down, which actually might work in your favor.” 

Winning the internet — or an underwhelming disappointment

Not surprisingly given the diversity of the AI community, the response, at least on social media, has been all over the place. 

Some are saying that it “wins the internet,” taking ChatGPT capabilities to whole new levels (and that it quickly rival Google Translate). One user called the video of AI teaching math “insane,” adding “The future is so, so bright.” 

Nvidia senior research scientist Jim Fan — among others — noted how the assistant was “lively and even a bit flirty,” calling back to the 2013 sci-fi movie “Her.”

Still others called it “by far the most underrated OpenAI event ever.” 

Ultimately, AI advisor and investor Allie K. Miller commented, “The super techies are disappointed that they don’t have some holographic laser beam that shoots out of their phone and reads their minds, and the wider business population didn’t seem to watch and weigh in.”

But this is just day one reaction — it’ll be interesting to see and hear response once people have the chance to experiment with GPT-4o.

