ChatGPT, for example, was “trained” with a large corpus of English-language written materials. (The precise “language learning model,” or LLM, is supposed to be a secret, but we’ll come back to that later.) It then uses algorithms to regurgitate what it thinks is the statistically likeliest answer to a query. These chatbots can be pretty convincing, as anyone who has ever used one can attest. But they also have obvious limitations.

In the last year or so, for example, a small but growing number of lawyers have been sharply criticized by courts for filing A.I.-written legal briefs that contained inaccurate legal citations. When asked a legal question, a human lawyer would likely research prior cases to get a clearer understanding of precedent. ChatGPT, on the other hand, does not have the power to research things that are outside of its LLM. Instead, it would simply create what its algorithms think is the likeliest answer, with algorithmically assembled fake case names to go along with it.

This “training” process is opaque in many of the newest chatbots. While OpenAI’s earliest iterations of ChatGPT’s language learning model were open source, the company has declined to disclose how it trained its most recent models. OpenAI has argued that the secrecy is necessary given the competitive nature of the A.I. industry these days. The newspapers argued, however, that it also serves as a convenient way to cover up what it describes as “mass copyright infringement.”