A US judge has thrown out a case against ChatGPT developer OpenAI which alleged it unlawfully removed copyright management information (CMI) when building training sets for its chatbots.
Publishers Raw Story and AltNet allege that when OpenAI removed the description of the copyright status, it resulted in a "concrete injury." The plaintiffs also argued there was a substantial risk that OpenAI's systems could "provide responses to users that incorporate … material from Plaintiffs' copyright-protected work or regurgitate copyright-protected works verbatim or nearly verbatim."
In a statement to Reuters, an OpenAI spokesperson said: "We build our AI models using publicly available data, in a manner protected by fair use and related principles, and supported by longstanding and widely accepted legal precedents."
In February, Raw Story and AltNet alleged OpenAI populated their training sets with works of journalism, choosing to strip away CMI protected by the Digital Millennium Copyright Act.
However, US District Judge Colleen McMahon granted OpenAI's motion to dismiss the case.
In her ruling [PDF], she said Raw Story and AltNet had not alleged that the information in their articles was copyrighted, nor could they do so.
"When a user inputs a question into ChatGPT, ChatGPT synthesizes the relevant information in its repository into an answer. Given the quantity of information contained in the repository, the likelihood that ChatGPT would output plagiarized content from one of Plaintiffs' articles seems remote," she said.
However, the legal ruling has a bearing on whether OpenAI was allowed to develop its products using journalists' articles.
"Let us be clear about what is really at stake here. The alleged injury for which plaintiffs truly seek redress is not the exclusion of CMI from [OpenAI's] training sets, but rather [the] use of plaintiffs' articles to develop ChatGPT without compensation to plaintiffs," she said.
McMahon said that questions about these kinds of harms had not been put before the court. The judge said she would allow an amended complaint from the publishers.
The Raw Story and AltNet case against OpenAI is one among many challenging AI developers' use of copyrighted material in training sets. OpenAI also faces a suit from authors Paul Tremblay, Sarah Silverman, Michael Chabon, David Henry Hwang, and Ta-Nehisi Coates.
Another group of authors are suing Anthropic, alleging it unlawfully used their copyrighted work to train its Claude AI model.
Last year, Dan Conway, CEO of the UK's Publishers Association, told the House of Lords Communications and Digital Committee that large language models were infringing copyrighted content on an "absolutely massive scale," arguing that the Books3 database – which lists 120,000 pirated book titles – had been ingested by large language models.
However, AI developers have argued that maintaining broad access to information on the internet is important for innovation. ®
https://www.theregister.com//2024/11/08/openai_copyright_suit_dismissed/
Created by Tan KW | Dec 13, 2024
Created by Tan KW | Dec 13, 2024
Created by Tan KW | Dec 13, 2024
Created by Tan KW | Dec 13, 2024