Silicon Valley’s Big A.I. Dreams Are Headed for a Copyright Crash

In the latest Mission: Impossible film, Tom Cruise’s character must defeat his most daunting threat yet: a renegade artificial intelligence program that became sentient and quietly seized control of the world’s financial, political, and military systems. His only way to stop it is a two-piece physical key that he must insert into a computer in a sunken Russian submarine somewhere in the ocean depths.

It’s a good action movie, but there’s a glaring plot hole in it. Cruise goes through all these extensive fight sequences—fights on moving trains, chases through narrow alleys, motorcycle jumps from cliffs—to find the key and stop the A.I. program. It’s a lot of time and effort. Why didn’t he just file a copyright lawsuit against it instead?

According to Silicon Valley, those legal battles could endanger its attempts to develop sophisticated A.I. programs of their own. That, at least, is what Silicon Valley is telling the U.S. Copyright Office. That agency is currently conducting a policy review to consider how federal copyright laws apply to A.I.-generated works. Andreessen Horowitz, which ranks as the most influential venture-capital fund in the tech industry, used this as an opportunity to set off alarm bells.

“The bottom line is this: imposing the cost of actual or potential copyright liability on the creators of AI models will either kill or significantly hamper their development,” Andreessen Horowitz warned. It claimed that generative A.I. programs represent a “new computing paradigm” that could be as transformative as the microchip, and that applying existing copyright laws to it would threaten America’s place “as the leader in global AI development.”

Generative A.I. programs have become the new next big thing over the last few years. Programs like ChatGPT, which “answers” questions that you pose to it, and DALL-E, which “creates” images based on user inputs, have broken through into public consciousness. And yet the looming questions over copyright violations threaten to bring it to a grinding halt.

A.I.’s problem with copyright law starts with its programming. Despite the branding, generative A.I. programs are not really “artificial intelligence.” They are not sentient. They cannot imagine or dream or fall in love. They cannot generate an umprompted idea or “create” anything in the traditional sense. What they are is an (often impressive) simulacrum of genuine intelligence. The programs achieve this by ingesting, for lack of a better word, large amounts of raw material. Early versions of ChatGPT were seeded with a wide variety of information to serve as the raw matter into which the A.I. would shape content, including the entire English-language Wikipedia, substantial libraries of ebooks, and large swaths of Reddit.

From this raw material, ChatGPT and similar programs use algorithms to find patterns and then regurgitate them in response to user prompts. So, for example, if you were to ask a generative A.I. program for the names of all 50 U.S. states, the program isn’t reciting that song that we all learned in first grade to remember them. Instead, the program creates an answer by analyzing the database it was “trained” on and then displays it in a conversational tone.

There are pitfalls to this approach of generating information. A program is only as good as the material it ingests. And since it does not actually “look” for the answer but instead constructs it based on probabilistic appearances in a limited corpus of text, it can be disastrously wrong. Earlier this year, a lawyer used ChatGPT for research in a personal-injury lawsuit against an airline, only to discover that it gave him a bunch of fake case names and nonexistent court rulings. ChatGPT had created them out of thin air based on the appearance of similar case names and court rulings in its “language learning model,” or LLM.

Starting with how the technology actually works, instead of how it appears to work, is essential to understanding the copyright issues involved with generative A.I. programs. These programs, strictly speaking, do not create anything genuinely new or original. They only break down existing works into their constituent parts and stitch them back together again. There is no imagination or inspiration or contemplation or anything else that humans typically associate with knowledge and learning. There is just regurgitation—albeit at a very complicated level—done at computing speeds that the human mind cannot match.

Copyright law is designed to protect that human spark of creativity. Conversely, it does not apply when that spark is missing. Federal courts have previously held that copyright protections only extend to humans, most famously by ruling in 2016 that a selfie taken by a monkey with a photographer’s camera was in the public domain. The U.S. Copyright Office has taken a similar approach. Earlier this year, the office partially rescinded a copyright for a comic book after learning the author had used Midjourney, a generative image program, to illustrate it. The agency reasoned that “artificial intelligence” can’t hold a copyright.

It only gets worse for the programs from there. Companies that create generative A.I. programs are also facing a wave of lawsuits over the alleged use of unauthorized copyrighted materials to develop them. In September, for example, the Authors’ Guild and a coalition of high-profile authors sued OpenAI, arguing that its use of their works to potentially create derivative ones infringed on their copyrights and threatened their livelihoods.

“Unfairly, and perversely, without Plaintiffs’ copyrighted works on which to ‘train’ their LLMs, [OpenAI] would have no commercial product with which to damage—if not usurp—the market for these professional authors’ works,” the authors argued in their complaint. “[OpenAI’s] willful copying thus makes Plaintiffs’ works into engines of their own destruction.”

Most A.I. companies—but not all; we’ll come back to that later—have argued that their use of copyrighted material falls under a doctrine known as fair use. That doctrine is a limited exception of sorts to copyright laws. It allows for unauthorized use in a narrow set of circumstances, most often when the use is limited in nature and does not affect the copyright owner’s ability to profit from their own works.

Andreessen Horowitz, in its comment to the U.S. Copyright Office earlier this month, argued that “training” generative A.I. programs is fair use. Two of its reasons were grounded in how the programs actually work. According to the firm, the programs do not technically “store” the material they use; they merely “extract facts and statistical patterns across a broad body of examples of content—i.e., information that is not copyrightable.” Additionally, the firm claimed, it is extremely unlikely that a program will generate something that is “substantially similar” to any individual work used to “train” it, which is a test used in copyright law.

But the other two reasons the firm provided in defense of its A.I. investments were more revealing. For starters, it noted that “the only practical way generative AI models can exist is if they can be trained on an almost unimaginably massive amount of content, much of which (because of the ease with which copyright protection can be obtained) will be subject to copyright.” Since there is no right to create or use generative A.I. programs, this is not so much an argument as it is a plea for mercy. Applying copyright laws to A.I. would be, in other words, an existential threat to A.I.

It would also be an existential threat to the venture-capital firms that have funded A.I. start-ups in recent years, as Andreessen Horowitz itself acknowledged in its fourth and final reason for fair use. “Over the last decade or more, there has been an enormous amount of investment—billions and billions of dollars—in the development of AI technologies, premised on an understanding that, under current copyright law, any copying necessary to extract statistical facts is permitted,” the firm claimed. “A change in this regime will significantly disrupt settled expectations in this area.”

“Settled” is a strange choice of phrasing here. While investments in A.I. start-ups have surely been made over the last decade, generative A.I. programs did not become publicly available or widely used until within the last few years. Once they appeared, a variety of plaintiffs took those companies to court to stop them.

Multiple record publishers are suing Anthropic over its alleged use of copyrighted song lyrics. Getty Images sued Stability AI earlier this year for using its library to train an image generator. Companies like Google, Microsoft, and Adobe have promised to indemnify companies that face legal threats for using their A.I. products. The actors’ and writers’ unions in Hollywood even went on strike this summer in no small part to halt the studios’ threats to use A.I. to replace them. Silicon Valley has “disrupted” multiple industries over the years by moving fast and breaking things, so to speak. This time, the affected industries are trying to slow down and stop tech start-ups so they don’t get “settled” in the first place.

One solution to which I alluded earlier would be to simply pay to use the copyrighted material. Some major tech companies that can afford to do it already do. But Andreessen Horowitz doesn’t want to do that. “A staggering quantity of individual works is required to train AI models,” the firm told the U.S. Copyright Office. “That means that, under any licensing framework that provided for more than negligible payment to individual rights holders, AI developers would be liable for tens or hundreds of billions of dollars a year in royalty payments.” That would create an “impossibly high financial barrier to AI development” for “small businesses or individual innovators.”

That would also be bad for Andreessen Horowitz, which probably would not have invested billions of dollars in A.I. start-ups just for them to spend billions of dollars on royalties. It runs counter to the zeitgeist that animates Silicon Valley’s A.I. push. The fundamental goal of A.I. is to reap the benefits of creative or intellectual labor without having to pay a human being—writers, artists, musicians, lawyers, journalists, architects, and so on—to perform it. A.I. developers, in other words, seek to create something from nothing. But that is not how the laws of thermodynamics work. And unless the courts and federal regulators suddenly embrace the tech industry’s novel new theory of fair use, it will not be how the laws of copyright work either.