Dive Brief:
- The death of a former OpenAI large language model researcher who had expressed the view that his employer and other generative artificial intelligence companies had violated U.S. copyright laws has been ruled a suicide. Police found Suchir Balaji, 26, in his San Francisco apartment on Nov. 26 after being called to the residence for a welfare check, according to media reports.
- In October, the New York Times published a story featuring Balaji and his perspective on copyright with respect to the models underlying ChatGPT and other AI tools, arguing he had to leave OpenAI because he disagreed with its methods. He also published an article on his personal website analyzing the four-factor fair use defense Congress codified in the 1976 Copyright Act, arguing that OpenAI’s use of internet content did not satisfy that test.
- Balaji was one of four additional OpenAI custodians The New York Times asked a U.S. magistrate last month to approve for its copyright infringement complaint against Microsoft and OpenAI. Balaji left OpenAI in August after nearly four years with the company, according to his LinkedIn profile page.
Dive Insight:
“We are devastated to learn of this incredibly sad news today and our hearts go out to Suchir’s loved ones during this difficult time,” a spokesperson for San Francisco-based OpenAI said Friday in an email to CNBC.
The newspaper filed its lawsuit in December 2023, with several additional news organizations, including the Chicago Tribune and San Jose Mercury News, filing their own copyright complaints over OpenAI’s use of their articles to train ChatGPT.
In its Nov. 18 custodian request to the court, The New York Times argued that Balaji would “have unique and relevant documents supporting The News Plaintiffs’ allegations of willful infringement, among other issues.” The Times attorneys had previously asked the court to designate eight other current or former OpenAI employees as custodians in the case, including Ilya Sutskever, a co-founder and former chief scientist, and Brad Lightcap, the chief operating officer.
OpenAI contends that its use of internet material is fair use, “supported by long-standing and widely accepted precedent.”
“We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness,” the company says in a statement at its website, responding to the newspaper lawsuit. OpenAI says it implemented an opt-out option for publishers, which the Times began using in August 2023.
The publishers’ lawsuits, if they go to court, will present several untested questions about copyright law that could inform how generative AI models evolve, Mason Kortz, an instructor at the Harvard Law School Cyberlaw Clinic at the Berkman Klein Center for Internet & Society, told Harvard Law Today in March.
Among them: Is OpenAI’s large language model a “derivative work” of the New York Times’ content for copyright purposes? And does ChatGPT memorize the newspapers’ article verbatim for some of its responses, constituting infringement of the company’s copyright?
There’s good reason to believe the parties may settle before trial as “it would be an incredibly expensive suit for both sides,” Kortz said.
“Because of that, and because of the uncertainty of some of the legal claims — you put 10 intellectual property lawyers in a room and you will get 11 different opinions on this — both sides are going to be facing uncertainty,” he told Harvard Law Today. “And if they are risk averse, as most entities are, when it comes to the legal system, there’s probably a relatively wide band of settlement.”
In his analysis of whether OpenAI’s use of training data met the fair use test, Balaji concluded it didn’t because, among other things, ChatGPT’s output had market impacts on the organizations whose data was scraped and the market impacts were the result of the output serving as a substitute for the original.
Balaji was from Cupertino, California, and had a computer science degree from the University of California, Berkely, according to the Mercury News.