Before we all get too deep into using ChatGPT or other artificial intelligence tools to create things or make our lives easier, we need to address some legal questions about copyright in terms of training data principle, division of labor and output. In short, we need to understand and analyze the copyright law vis-à-vis AI tools.
To reach an accurate conclusion, therefore, it is necessary to ensure the related legal issues can be explained under the framework of China's copyright law.
To begin with, there is a need to examine the working method of ChatGPT, which can be described as a process of "input" and "output". According to Stephen Wolfram, a British-American computer scientist, the basic concept of ChatGPT is "starting from a huge sample of human-created text" from the web, books and other sources then training "a neural net to generate" text that's "like this".
This complicated internal learning process requires language training, knowledge selection and a huge corpus of texts. It can be considered as a process of mixing materials without adding human emotions, values or beliefs.
However, the output may differ due to deviations of corpus-based resources. In some cases, users may intentionally input vast corpus to guide the AI tool to generate certain results. Such users, obviously, do not care about copyright.
So it is better to expand databases for ChatGPT than to focus on the copyright of the content it creates. And it wouldn't be wise to file a lawsuit against its developer for infringing copyright, because the issue of copyright liability can often be efficiently resolved through other means.
Judging from the characteristics of the "input" process, the internal process of language training is unlikely to cause copyright infringement.
Copyright infringement may occur when anyone other than the copyright owner circumvents technological protection measures to get the content or part of the content without authorization, and use it for corpus training.
But people do not infringe upon copyright if they obtain non-technologically protected published content for language training and/or internal knowledge structure.
As for the "input" process of human-computer interaction, a questioner is dispensable, and although a question itself has no legal risk, it may be collected by the AI system. This kind of inequality exists not only in the training process of machine language, but also between the questioner and ChatGPT, which is also a cross-linguistic database.
A language contains value and value conflicts but ChatGPT cannot judge human values or express emotions. Yet the language a questioner uses can become important personal data for the AI to create a user portrait and cater to the user's likes, or even intentionally mislead the questioner.
As such, if different people ask the same question, ChatGPT may give different or even contradictory answers. And when the volume of personal data collected by ChatGPT is huge or vital enough to be used to sully an individual's reputation or mischievously smear a person, ChatGPT's developer will need to bear civil liability for it.
There are exceptions to this rule, though. In China, for instance, certain reasonable or personal use of copyrighted works without permission or remuneration is permitted if it falls within the scope of fair use. However, "fair use" is an exception that can be used as defense against any infringement claim.
The key to applying the fair use principle is commercialization. If someone is using a content or text for commercial purposes, it is less likely to be fair use. Although ChatGPT is free of charge now, Open-AI's ultimate goal of developing it is to make profits after testing the waters through a pilot subscription plan.
Perhaps ChatGPT(and other new AI products) will be mainly used for commercial purposes in the future, which will lead rampant copyright infringements, making it hard to exempt it from tort liability.
ChatGPT does not necessarily pose a challenge to copyright protection. It may not reshape or rebuild it either. Still, it is important to be aware of the pros and cons of AI tools and their achievements.