I have a confession to make. I am not a bot.
That's not the confession. That's simply an observation. The confession is that I am not as good as a bot in much of what I do for a living.
I was thinking about this in the context of the current blockbuster lawsuit of the New York Times against OpenAI for copyright infringement. Which, since I'm a historian, got me thinking about where copyright came from and what it's for.
The first copyright law in America was passed in 1783 in Connecticut. Its sponsor was Noah Webster, who was writing a dictionary he didn't want pirated.
The idea caught on. Several states followed Connecticut's example. In 1790, the U.S. Congress passed a federal copyright law.
Copyright law served two masters: the creator of the copyrighted work and the general public. The creator was granted a monopoly of the work for a limited period, originally fourteen years. During this time, only the creator could legally profit from the publication or other use of the work. But at the end of the period, the work became public property, freely available for all to use and profit from.
The logic behind copyright is that a creator incurs costs in the creation process. These can be actual costs or opportunity costs. If the creator is not granted a period of monopoly over the product, during which to recoup the costs, he or she will stop creating. The public will be deprived of what otherwise would have been created.
The trick is setting the length of the copyright monopoly correctly. Too short and the incentive to the creator is not great enough. Too long and the public is short-changed.
Originally copyright applied to books, maps and charts. Maps particularly illustrate the principle behind copyright. Making a map is an expensive project, often taking years and requiring large outlays. But once made, a map is easy to copy. If the map maker isn't guaranteed a monopoly, others will copy the map and deprive the map maker of a market. The map maker won't make any more maps, and the public won't benefit from them. Ships will wreck, and lives and cargoes will be lost.
The application to books is similar but not identical. From the beginning it was understood that books must not be copied in whole. But what about parts of books? Could a chapter be copied without violating copyright? How about a paragraph? A sentence?
Obviously, people couldn't be allowed to copyright single words. But a string of ten words? A hundred words? A thousand words?
And what about the meaning of the words? Could ideas be copyrighted? Could facts be copyrighted?
I wondered about this some years ago after I had published a book on the California gold rush. A movie producer contacted me and asked if the film rights were available. A standard book contract, including mine on this book, specifies that the book publisher has the right to publish, which is to say print, the book. But other rights, including the right to make the book into a film, are reserved to the author. I said that the film rights were available.
I was paid a modest amount of money to give the producer a year to decide whether to make the book into a movie. During that time, I couldn't sell the rights to anyone else. If the producer decided to go ahead, I would be paid a larger sum. In the end, the producer decided not to go ahead. In case anyone is interested, the rights are once again available.
But the curious thing to me in all this was what exactly I was selling the rights to. If I had written a novel, the characters and plot would have been my creations. I could understand having a property right in those. But I didn't create the characters and events in my book on the gold rush. They lived and happened long before I ever came along. If the movie producer wanted to make a book about them, he didn't need me and my book.
But in fact he did need me. Not in a legal sense, but as a matter of convenience and expertise. The producer wasn't a historian. I am. I had done the research, gathered the information, and fashioned it into a story. As a practical matter, the producer couldn't do that. Or more precisely, it was cheaper to purchase my finished product than for the producer to reproduce it from scratch.
Such considerations are at the heart of the suit of the New York Times against OpenAI. The Times says OpenAI violated the paper’s copyright by training its artificial intelligence system, ChatGPT, on articles from the Times.
At first glance, the Times would appear not to have much of a case. The paper simply reported facts available in principle to anyone. AI algorithms usually do not reproduce long strings of words from any particular article, so originality of expression is not at issue. If I read a newspaper article and summarize it in my own words, no one would accuse me of violating copyright. But what if I read ten newspaper articles and summarize them? A hundred articles?
In fact, I do something very much like this all the time. I read history books and articles, and old letters and diaries, and newspaper and magazine articles, and lots of other things. I stir them around in my head and produce my books. If I couldn't do this, I couldn't be a historian.
So what is AI doing that I am not? Very little, it seems to me. Except that it's doing it enormously faster than I ever could. It makes mistakes, but so do I. And it gets better with age. I don't. Anyway, the mistakes undercut the Times’s case. But does the difference in scale amount to a difference in kind?
I don't know what the courts will decide. I'm no legal expert. But I wouldn't be surprised if this suit is settled before it goes to a jury. Lawsuits position the parties as adversaries. In fact AI firms and media companies are better thought of as partners. OpenAI needs training data, and media are an obvious source. OpenAI can't object in principle to paying something for the data. And the media companies need customers. If AI proves as lucrative as stock markets seem to think it will, OpenAI and other AI outfits could be the best customers the Times and other media companies ever had.
I could find out most of what appears in the Times without purchasing a subscription. Facts are facts. But I do have a subscription, for the convenience and efficiency. I wouldn't get one, though, if the price weren't reasonable. OpenAI could survive without the Times. But the convenience and efficiency are worth paying for, if the price isn't unreasonable.
Expect a deal, not a verdict.
What is sad today is how, while the print book industry continues to financially reward the "creators" of work, other areas of entertainment deny the true "creators" of the work ownership of the copyright because, through corporate skullduggery, the corporations reserve that right for themselves. Those of us who know and love the material involved would prefer that the creator alone make decisions that would prevent the work in question from being tarnished through inferior reproduction.