Intellectual Property and Generative AI: Many Questions, Few Answers
October 3, 2023
October 3, 2023
In celebration of Small Business Month, Vector in collaboration with Smart & Biggar LLP, developed a series exploring the dynamic relationship between AI and IP. In this series, designed to help support Canadian startups and research professionals, Vector dives into how AI innovation intersects with intellectual property, discussing trends, challenges, and strategies that shape this ever-evolving landscape.
The sudden ubiquity of generative AI that accompanied the public introduction of ChatGPT has left many fields scrambling to adapt, and intellectual property (IP) law is no exception. The first wave of IP problems associated with generative AI, such as conflicts surrounding training data and copyright, is currently making its way through the legal system.
Traditionally, the purpose of copyright is to provide creators with the authority to control how their works are used, reproduced, distributed, and displayed, granting them some ability to protect their intellectual property and derive economic benefits from their creations.
One of the earliest questions surrounding generative AI is the debate over whether an AI can be considered the author or creator of a work. Early indications are leaning towards a negative answer, that copyright does not protect content produced by generative AI; a landmark ruling has been made in favor of this position in the U.S. case Thaler v. Perlmutter, indicating that “Human authorship is a bedrock requirement of copyright.”
This case is one of a series of pro bono legal test cases seeking intellectual property rights for AI-generated output in the absence of a traditional human inventor or author initiated by Dr. Stephen Thaler, which he dubbed the “Artificial Inventor Project”. The project is intended to promote dialogue about the social, economic, and legal impact of frontier technologies such as AI and to generate stakeholder guidance on the protectability of AI-generated output. The project’s patent application, which sought to make DABUS (Device for the Autonomous Bootstrapping of Unified Sentience) an inventor of a patent, was denied by the USPTO, who took the position that “a machine does not qualify as an inventor under the patent laws”. Almost all patent offices (with the exception of South Africa, who does not have a substantial examination process) followed the USPTO ruling.
In the world of AI and machine learning, IP is a critical asset. Stay ahead of the competition by making IP an integral part of your business strategy.
A potentially more serious conflict is brewing between copyright owners and LLM owners, such as OpenAI, regarding the use of copyright materials in training data. A recent example of direct pushback from artists is the launch of the web tool “Have I Been Trained?” by artist collective Spawning. The tool searches LAION-5B training dataset for a user’s artwork, which was used to train Stable Diffusion and Imagen (Google). Spawning wants to give artists opt-in/opt-out rights for inclusion in training data. The New York Times is also considering taking action, arguing that “OpenAI’s use of the paper’s articles to spit out descriptions of news events should not be protected by fair use” and that “it risks becoming something of a replacement for the paper’s coverage.”
Organizations like OpenAI have started commercially licensing artwork created by their models, highlighting the need to address copyright ownership. Legal disputes may arise over who owns the copyright when AI-generated works resemble copyrighted content, such as the LLM Litigation lawsuits filed against OpenAI, GitHub Copilot, and Stable Diffusion.
Another potential IP pitfall for AI-generated content is unauthorized use and/or poorly executed licensing schemes. This type of problem is exemplified by Getty’s ongoing litigation against Stability AI, the creators of Stable Diffusion. A group of artists have also sued Stability AI for using copyrighted works in their training data; OpenAI is notably excluded from this suit as they have not released information about their training dataset and have a license agreement with Shutterstock.
One aspect that adds complexity to this issue is the potential risk in training AI models using databases that may contain others’ IP. For example, a model trained on open-source code from GitHub may inadvertently reproduce portions of copyrighted code, raising concerns about unintentional infringement or plagiarism; this is the accusation being levelled at GitHub Copilot. Another example is the recent accusation by book authors that OpenAI has been trained using “shadow libraries”, websites that host a repository of pirated books and publications.
U.S. courts ruled that scraping publicly accessible data is legal in hiQ Labs v. LinkedIn. However, a subsequent ruling decided that hiQ Labs had breached LinkedIn’s User Agreement, resulting in a settlement between the parties. This demonstrates that, while data scraping is legal, it is wise to ensure that the source of the data is not protected by any agreements or IP protection.
The current legal landscape surrounding IP and generative AI is murky and fluctuating. The current indication is that AI cannot be a creator in both copyright and patents. It is as of yet unclear what rights copyright holders will be granted in relation to use of their work in training datasets. To avoid IP-related issues, it is best practice to try to establish contracts governing any training data used in AI development, either through a licensing agreement or ownership.
IP is a key business asset, especially in AI and machine learning. To gain a competitive edge, prioritize integrating IP into your business plans, including strategies for commercialization and monetization.