Future Tech

OpenAI’s GPT-4o Mini is indeed small – like its lead over rivals in certain tests

Tan KW
Publish date: Fri, 19 Jul 2024, 09:50 AM
Tan KW
0 456,687
Future Tech

AI Roundup OpenAI has made available GPT-4o Mini, a smaller and cheaper version of its GPT-4o generative large language model (LLM), via its cloud.

The Microsoft-backed super lab said Thursday GPT-4o Mini is like regular GPT-4o in that it's multimodal - it can handle more than just the written word - and has a context window of 128,000 tokens and was trained on materials dated up to October 2023. The Mini can emit up to 16,000 tokens of output.

While GPT-4o, OpenAI's top-end model, costs $5 and $15 per million input and output tokens, respectively, the Mini edition costs 15 and 60 cents, again respectively. You can halve those numbers if using delayed batch processing.

We're told the cut-down version is not fully featured yet, supporting just text and vision via its API, with other input and output formats, such as audio, coming in the indeterminate future. In creating GPT-4o Mini, OpenAI emphasized how safe it had made the thing, claiming to filter out offensive data from training materials and giving it the same guardrails that GPT-4o has.

Furthermore, OpenAI claimed GPT-4o Mini is ahead of comparable LLMs in benchmarks. Indeed, compared to Google's lighter-weight Gemini Flash and Anthropic's Claude Haiku, Mini was usually between five and 15 percent more accurate in tests such as MMLU, though in two outliers it was nearly twice as accurate as the competition and in another a little worse than Gemini Flash but still ahead of Claude Haiku, allegedly.

It's particularly personal for OpenAI to seemingly beat Anthropic as the latter was co-founded and built by executives and engineers from the former plus others.

That GPT-4o Mini looks good in the graph above for sure, though it doesn't have an overall commanding lead, and that's indicative of OpenAI's recent loss of absolute leadership in the modern LLM arena. As veteran open source developer Simon Willison detailed in his keynote at the AI Engineer World's Fair last month, 2024 has seen many of OpenAI's competitors release their own GPT-4o-class models.

"The best models are grouped together: GPT-4o, the brand new Claude 3.5 Sonnet and Google Gemini 1.5 Pro," Willison said. "I would classify all of these as GPT-4 class. These are the best available models, and we have options other than GPT-4 now. The pricing isn't too bad either - significantly cheaper than in the past."

At 82 percent accuracy in MMLU and a cost of 15 cents per million tokens, GPT-4o Mini is mostly ahead of the pack. However, Willison says the LMSYS Chatbot Arena benchmark provides a more realistic evaluation of LLM quality because actual humans are asked to compare outputs and choose which is better, a brute-force but effective way of ranking different models.

GPT-4o Mini is too new to be included in the tournament-style benchmark, though he notes that full-size GPT-4o is only barely ahead of its rivals. Anthropic's flagship Claude 3.5 Sonnet currently has 1,271 points to GPT-4o's 1,287. Gemini 1.5 Pro isn't far behind at 1,267; slightly less performant but still respectable models include Nvidia and Mistral's brand-new Nemotron 4 340B Instruct at 1,209 points, and Meta's LlaMa 3 70B Instruct at 1,201.

Willison also noted the Mini is cheaper than Claude 3 Haiku and Gemini 1.5 Flash.

OpenAI may be the best, in terms of these test scores, from small to big LLMs, though it no longer has the dominating lead it once had. That's probably a good thing; between costly AI hardware and high power usage, the last thing AI needed was a monopoly on LLMs. ®

 

https://www.theregister.com//2024/07/19/openaigpt4o_mini/

Discussions
Be the first to like this. Showing 0 of 0 comments

Post a Comment