Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
As anticipated based on updates and new settings in the mobile app for Elon Musk’s social network X, a new large language model (LLM) called Grok-2 from Musk’s sister company xAI landed last night — and it’s a doozy.
Integrated within X itself and available through the Premium ($7 USD/month) and Premium+ ($14/month with no ads) subscription tiers, Grok-2 comes, fittingly, in two model sizes: Grok-2 and Grok-2 mini. Grok-2 offers state-of-the-art performance in a wide range of tasks including chat, coding, reasoning, and vision-based application, while Grok-2 mini is a smaller, faster version optimized for efficiency, suitable for simpler text-based prompts requiring quicker responses.
Grok-2 not only boasts image generation capabilities based on a partnership with Black Forest Labs and its new and surprisingly photorealistic open source diffusion AI model Flux.1, but it also shockingly outperforms the AI models from leading rivals including OpenAI (GPT-4o) and Anthropic (Claude 3.5 Sonnet) and even Google (Gemini Pro 1.5) on leading third-party benchmark tests.
A new, surprising leader across multiple benchmarks
Specifically, Grok-2 and Grok-2 mini outperform all other models on the GPQA, MMLU, MMLU-Pro, MATH, HumanEval, MMMU, MathVista, and DocVQA benchmarks.
Even the lmsys-chatbot arena, where many companies covertly test their AI models under alternate names in advance of release (including xAI, where Grok-2 was initially called “sus-column-r”) congratulated xAI on the milestone.
As AI influencer and University of Pennsylvania Wharton School of Business professor Ethan Mollick observed on X, “There are now five GPT-4 class models: GPT-4o, Claude 3.5, Gemini 1.5, Llama 3.1, and now Grok 2.”
Musk congratulated his “hardworking xAI team!” on the similarly named social network.
Image generations steal the show
Even though Grok-2 boasts leading performance on all these different benchmarks related to math, writing, code, and other tasks, by far, the marquee feature capturing the most attention from the jump is its integration with Black Forest Labs’ Flux.1 image generation model.
Prior to the release of Grok-2, Flux.1 had already been making waves in AI and AI art circles more specifically the last few weeks as people discovered that they could achieve incredibly photorealistic generations from the open source model, enough to resemble familiar situations like a speaker at a TED talk, as well as adapt the model using low-rank adaptation (LoRA) to generate their own likeness in different situations.
Now that a version of Flux.1 is integrated directly into Grok-2 much in the same way OpenAI integrated its image generation model DALL-E 3 directly into ChatGPT, allowing users to simply type text prompts to the chatbot and ask it to make them images on command, users are testing this capability out in Grok-2 and finding it is notably permissive — generating controversial, compromising images even of public figures such as U.S. presidential candidates Kamala Harris and Donald Trump.
Other leading image generators including Midjourney and DALL-E 3 and Microsoft Designer have prohibitions around generating this type of content — especially in the wake of the controversy earlier this year over unauthorized explicit deepfakes of popular musician Taylor Swift (made by prompt engineering around the Designer restrictions) — so it is notable that Grok-2 is bucking that trend and allowing for more freedom, and potential risk. However, that is in keeping with Musk’s stated “free speech” ethos for X.
Yet users are raising concerns about what the capability means for the providence of deepfakes and misinformation across the web.
As user @Omiron33 put it well: “Yes, we’ve had MJ and Flux, but this is the first to make it usable and quick. Advertising, Propaganda and everything good or bad that comes with that just happened (IMO, the good outweighs the bad)”
Source link