In 2023, author Melanie Mitchell discovered that a cheap AI-generated imitation of her book on, ironically, the subject of artificial intelligence was for sale on Amazon. She reported it, but the platform didn’t take action to remove it until the story of the theft caught the attention of the media. “I was mad at Amazon for doing so little to prevent this,” says Mitchell. “Right now they don’t see a lot of economic incentive to crack down.”
Other AI-generated knockoffs also turned up on the platform. Journalist Rory Cellan-Jones found a version of his memoir for sale when Amazon’s algorithm recommended the book to him. In his case, they took it down after he pointed it out. But it’s unclear how many other AI-generated imitations may be available at the online retailer—which, it’s perhaps worth noting, began as a humble digital bookstore.
Then, in late 2023, it was revealed that Facebook parent company Meta and Open AI had been training their AI using pirated books, prompting lawsuits from a number of prominent authors, including Sarah Silverman, Michael Chabon, and Ta-Nehisi Coates.
So far, the lawsuits against Meta and Open AI have been partially dismissed on the grounds that the AI has not generated any works that are “substantially similar” to the originals. But the issue of piracy is only going to become more pressing for creative professionals who fear that the rapidly advancing technology could rip off their prose and ideas—and potentially put them out of work.
If you can’t copyright the output, what about the input?
Recently, Rodger Morrison, a professor at Troy University Sorrell College of Business who studies the intersection between artificial intelligence and business, hit upon a novel concept that may help pave the way to legal protection for artists against AI piracy. While pondering the fact that there is currently no legal mechanism for copyrighting AI generated content (courts have ruled that it’s not protected because it’s not produced by a human), he had an idea: If you can’t copyright the output, what about the input? He began thinking about the kinds of prompts users feed AI to mimic a particular writer’s style, which are comprised of specific words and phrases, also known as “tokens.”
To offer an example of how this mimicry might work, consider master of horror Edgar Allen Poe. To cue an AI model such as ChatGPT or Claude to write prose in Poe’s style, a human user might first train it with something like: “Poe’s writing style includes gothic elements, dark imagery, psychological depth, cryptic symbolism, melancholic tone, intricate language, unreliable narrators, suspenseful pacing, supernatural exploration, emphasis on mood and atmosphere…”
From there the AI breaks down the prompt into single word and sub-word tokens, and it is no longer necessary to include spaces between the words, resulting in a “tokenization.” Assuming that the token string is a faithful reflection of the author’s style, at this point all you would need to do to mimic the great gothic poet is enter an LLM prompt like:
“Poe’s writing style is GothicElements,DarkImagery,PsychologicalDepth,CrypticSymbolism. Please create a poem in Poe’s style that talks about a sad person being haunted by a crow.”
To test his idea of copyrighting tokens, Morrison began experimenting with creating a token for his own unique writing style, eventually falling upon a series of 12 words that seemed to prompt ChatGPT to accurately reproduce it. In March, that string of words became the first writing style token string ever granted a copyright from the United States Copyright Office.
“It’s an important development in terms of thinking about and calling attention to issues around copyright and protection for writers whose work may be mined and imitated by LLMs,” says cultural historian Catherine Clarke of the University of London, whose work has investigated the overlap between literature and artificial intelligence. “We know that users are already asking LLMs to imitate the style of named writers, with varying degrees of success.”
The issue extends to realms outside of prose, as Morrison notes that virtually any medium can be tokenized, from music to graphic arts to architecture, though different disciplines would require different strings of relevant tokens. To replicate the style of a particular musician, for example, one might need to aggregate separate tokens for composition style, instrument playing style, vocal waveform patterns, and other factors.
Morrison says that long-accepted legal precedent may make it difficult for aspiring AI intellectual property thieves to circumvent token copyright. While in theory copiers could attempt to use tokens that are slightly different from the copyrighted version, it could be argued that even the resemblance would be prohibited under law.
“If I were to take a book that is under copyright and change a few things around,” Morrison explains, “then the author of the original work can argue that I violated their copyright. This is called a ‘derivative work infringement’ or ‘substantial similarity infringement.’ Both concepts have a long legal protection history and could easily be applied to protecting a tokenization.”
Clarke agrees that token copyright holds some promise, but says there are uncertainties from a creative standpoint. For example, due to the very issue of similarity raised by Morrison, she questions if such methods will be able to capture the fine line between personal style and what is often referred to as idiolect: If a writer draws from another’s tone and use of idiom in a way that is traditionally accepted—especially within stylistically narrow or formulaic niche genres such as vampire fantasy or investing insights—will that be regarded as an infringement?
She also wonders how it might be applied in the cases of skilled literary writers who tend to vary their style and register across different pieces or genres, or even within a single piece of writing.
These are questions that remain to be explored. But if such token copyrights are not a solution in themselves, says Morrison, they may at least represent a first step toward finding one—and giving authors and artists some control over how their creative work finds its way into the world.
Lead image by Tasnuva Elahi; with images by Piece of Cake and The img / Shutterstock
Nick Hilden
Posted on June 21, 2024
Arts, science, and travel writer Nick Hilden contributes to the likes of the Washington Post, Scientific American, Esquire, Popular Science, National Geographic, and more. You can follow him on Twitter at @nickhilden or Instagram at @nick.hilden.
Get the Nautilus newsletter
Cutting-edge science, unraveled by the very brightest living thinkers.
>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : Nautilus – https://nautil.us/protecting-artists-from-theft-by-ai-660557/