Future Tech

OpenAI develops AI model to critique its AI models

Tan KW
Publish date: Fri, 28 Jun 2024, 02:50 PM
Tan KW
0 449,929
Future Tech

To help catch code errors made by ChatGPT, OpenAI uses human AI trainers in the hope of improving the model. To help the human trainers, OpenAI has developed another AI model called CriticGPT - in case the humans don't spot the mistakes.

The Microsoft-championed super lab on Thursday issued a paper [PDF] titled, "LLM Critics Help Catch LLM Bugs," that explains the approach.

Generative AI models like GPT-4o get trained on massive amounts of data and then go through a refinement process called Reinforcement Learning from Human Feedback (RLHF).

This commonly involves human workers, often hired through crowdsourcing platforms, interacting with models and annotating their responses to various questions. When Time Magazine looked into this last year, it found OpenAI using Kenyan workers paid less than $2 per hour to improve its models.

The goal is to teach the model which answer is preferred, so it performs better. But RLHF becomes less effective as models become more capable. Human AI trainers find it harder to identify flawed answers, particularly when the chatbot reaches the point that it knows more than its teachers.

So as an aid to the people tasked with providing feedback to make its models more capable of generating programming code, OpenAI created another model - to critique those generative responses.

"We've trained a model, based on GPT-4, called CriticGPT, to catch errors in ChatGPT's code output," the AI startup explained in a blog post. "We found that when people get help from CriticGPT to review ChatGPT code they outperform those without help 60 percent of the time."

In other words, this isn't an autonomous feedback loop from one chatbot to another - it's a way to augment the knowledge of those administering reinforcement learning.

This approach apparently leads to better results than just relying on crowdsourced workers - who at $2 per hour probably aren't computer science professors or trenchant technical writers, or whatever the prevailing annotation rate happens to be.

According to the paper, the results show "that LLMs catch substantially more inserted bugs than qualified humans paid for code review, and further that model critiques are preferred over human critiques more than 80 percent of the time."

The finding that CriticGPT enables AI trainers to write better model response critiques isn't entirely surprising. Mediocre office temps presumably would write better crafted email messages with the help of generative AI too.

But AI help comes with a cost. When human contractors work in conjunction with CriticGPT, the resulting critiques of ChatGPT responses have a lower rate of hallucinations (invented bugs) than CriticGPT responses alone - but that error rate is still higher than if a human AI trainer had been left to respond without AI assistance.

"Unfortunately, it's not obvious what the right tradeoff between hallucinations and bug detection is for an overall RLHF system that uses critiques to enhance model performance," the paper concedes. ®

 

https://www.theregister.com//2024/06/28/openai_criticgpt_ai/

Discussions
Be the first to like this. Showing 0 of 0 comments

Post a Comment