A federal court has dismissed Raw Story’s copyright case against OpenAI, ruling no concrete harm was proven.
A lawsuit brought by Raw Story Media and Alternet Media against OpenAI over alleged misuse of their articles for training generative AI has been dismissed by a federal judge in New York. The plaintiffs had argued that OpenAI’s practice of scraping web content, without retaining copyright management information (CMI) such as author names, violated the Digital Millennium Copyright Act (DMCA). However, Judge Colleen McMahon ruled that the case failed to show actual harm, a necessary component for legal standing.
Lawsuit Based on DCMA Rules
The plaintiffs’ claims hinged on Section 1202(b) of the DMCA, which protects the integrity of CMI. CMI includes metadata like the names of authors, titles, and copyright notices attached to creative works.
Section 1202 prohibits the provision or distribution of false copyright management information with the intent to promote or conceal infringement. CMI includes specific details such as the title, author’s name, copyright owner, and usage terms of the work, which are communicated alongside copies, phonorecords, performances, or displays of the work.
Raw Story and Alternet asserted that OpenAI removed this information when training its AI models, potentially leading to future reproductions of their work without appropriate attribution. They viewed the practice as a violation that risked their content appearing in AI outputs devoid of credit.
Court’s Reasoning for Dismissal
Judge McMahon determined that the plaintiffs did not demonstrate the kind of specific injury required by Article III of the U.S. Constitution. The court noted that generative AI operates by synthesizing information rather than copying it verbatim, reducing the likelihood of exact replication.
The judge wrote that the likelihood ChatGPT would output plagiarized content from one of Plaintiffs’ articles seemed remote, emphasizing the vast scope of data the model draws from when responding to user prompts.
“I agree with Defendants. Plaintiffs allege that ChatGPT has been trained on “a scrape of
most of the internet,” […] which includes massive amounts of information from
innumerable sources on almost any given subject. Plaintiffs have nowhere alleged that the
information in their articles is copyrighted, nor could they do so. When a user inputs a question into ChatGPT, ChatGPT synthesizes the relevant inforrnation in its repository into an answer. Given the quantity of information contained in the repository, the likelihood that ChatGPT would output plagiarized content from one of Plaintiffs’ articles seems remote.
And while Plaintiffs provide third-party statistics indicating that an earlier version of ChatGPT generated responses containing significant amounts of plagiarized content, […] Plaintiffs have not plausibly alleged that there is a “substantial risk” that the current version of ChatGPT will generate a response plagiarizing one of Plaintiffs ‘ articles. Accordingly, Plaintiffs lack Article Ill standing to seek injunctive relief for their alleged injury.”
Comparison to Related Legal Cases
The ruling follows similar outcomes in cases involving AI technology, such as the Doe 1 v. GitHub lawsuit against Microsoft’s Copilot. In that case, claims of copyright infringement under Section 1202(b) were dismissed due to the nature of AI’s content generation, which reconfigures rather than duplicates source material.
However, interpretations can differ between courts; for example, a Texas court found that partial reproductions might meet the threshold for violations if CMI was deliberately removed.
The Argument on Future Risk
Raw Story and Alternet expressed concerns about potential harm, arguing that ChatGPT might reproduce their content without CMI. The court, however, did not find such arguments compelling enough.
While data suggested older versions of ChatGPT could produce outputs with substantial similarity to source material, no evidence showed the current version posed such a risk. McMahon noted that without solid proof of probable injury, speculative claims could not establish standing for an injunction.
Content Licensing and Industry Standards
The lawsuit highlighted the challenges smaller publishers face in securing compensation when their work is used for training AI. Raw Story pointed out that OpenAI has struck licensing agreements with large publishers, including Condé Nast, for similar purposes.
The absence of similar agreements for smaller outlets was seen as an inequity, but the court ruled that this concern did not meet the requirements of Section 1202(b). Discussions on how content should be licensed and compensated in the AI era continue as generative models evolve.
Legal Path Ahead and Industry Implications
Judge McMahon allowed Raw Story and Alternet the option to amend their complaint if they could present new, more compelling evidence of actual harm. Their legal representation, Loevy & Loevy, may need to reconsider their approach if they choose to pursue this route.
OpenAI, which was represented by Latham & Watkins LLP, Morrison & Foerster LLP, and Keker, Van Nest & Peters LLP, managed to avoid immediate consequences from the dismissal. The case, known as Raw Story Media Inc. v. OpenAI Inc., bears the case number 24-cv-01514 in the Southern District of New York.