ChatGPT: Putting the “AI” in “Plagiarism” worldwide?
With the release and sudden rise of popularity of ChatGPT across the internet, some of my professors began to praise ChatGPT as a homework tool, while others branded it as a form of plagiarism. To this day, there remains no clear consensus across my classes, with some encouraging the use of it on homework problems while others ban it completely.
Plagiarism detectors are a common way to determine if an individual is paraphrasing or otherwise appropriating someone else’s content without attribution. Similar to search engines, plagiarism detectors crawl through the web and index pages to look for similar content in an individual’s work. Beyond this, different companies employ different methods to evaluate how similar the content is.
📓 Approaches to Plagiarism Detection
A popular method is the n-grams algorithm, which compares units of text, or n-grams, to a pre-existing database of these text units and looks for matches. However, a downside is that the n-grams method fails to capture if a person is copying another’s style or paraphrasing words poorly because this method can only sniff out string-for-string matches instead of evaluating the entire text.
Turnitin, a service millions of educators use to detect similarity, builds on this by “fingerprinting.” The service scans submitted documents for unique sequences of word fragments to produce a “fingerprint” of the document’s style. Each document has its unique cadence, phrasing, and tone; however, if a document has unoriginal phrasing, parts of its fingerprint will match (or at least overlap with) other documents. Comparing fingerprints supports the detection of poor paraphrasing and won’t raise needless matches like both authors using common words such as “and” or “the.”
Another approach is to use fuzzy-based methods. The fuzzy-based method evaluates text similarity and produces a measurement scored between 0 and 1. The words are grouped into sets termed “fuzzy” and are based on words of similar meaning. Each word gets an assigned degree of similarity, and correlation factors between pairs of words from different documents are used to determine the degree of similarity across sentences. Unlike the n-grams algorithm, this method can detect forms of plagiarism beyond direct copy and pasting.
Other methods utilize specific linguistic features beyond the characters, words or even syntax. Semantic-based methods determine if the set of words is the same, just reordered. Stylometric-based methods, an attempt to detect cheating if a student enlists someone else to produce their work, look to quantify the writer’s style by looking into parts of speech used or words typical from a different social or cultural background other than the student.
🤖 Bot-Written Text Detection
Plagiarism detectors and methods face the daunting challenge of scouring the World Wide Web and attempting to paraphrase, summarize or translate these web pages, especially for methods that attempt to detect plagiarism across different languages. Furthermore, access to large amounts of plagiarized data faces ethical issues.
Now, with the rise of ChatGPT and the success of its newest version, GPT-4, a bigger challenge arises: how can we differentiate between bot-written text and human-produced work? A litany of plagiarism detectors are attempting to detect this.
GPTZero, created by Edward Tian, generates a score that specifies the probability of a document being AI-generated and highlights sentences that contribute to this score. This classifier is trained on datasets pairing human-written text and AI-written text on the same topics and looks to minimize false positives, the detection of human text as AI writing. With an essay Axios tested on the history of baseball cards from GPT-3, it identified the section of text to be “likely to be written entirely by AI.”
This classifier also supposedly works on models outside of ChatGPT, such as LLaMA. GPTZero says given the probability threshold of 0.65, it can detect that 85% of AI documents are classified as AI, and 99% of human documents are classified as human.
In February, K16 Solutions, a large edtech company, is combining technologies with GPTZero to expand the reach of this classifier to universities nationwide. GPTZero also fixed a loophole where users could input text that resembled English but utilized Cyrillic to pass the algorithm.
OpenAI’s classifier utilizes a similar approach that trains on pairs of human-written and AI-written text. Through the collection of presumably human-written prompts and answers on InstructGPT, OpenAI generated answers via GPT and other LLMs as samples of bot-written text. It then trained its classifier on this dataset.
However, OpenAI’s classifier is limited in its scope. The classifier only correctly identifies 26% of AI-written text as “likely AI-written” and labels human-written text as AI-written 9% of the time.
Furthermore, like GPTZero, the longer the text, the more accurate the classification and it is far better at detecting bot-written text in English than in other languages. Also, if machine-written text were heavily modified, it would be difficult for both GPTZero and OpenAI’s classifier to detect that it was originally bot-written.
Another model created by scholars at Stanford is DetectGPT. The tool differs from GPTZero and OpenAI’s classifier because it does not train a new LLM model and instead relies on the LLM to detect its own outputs. If the LLM scores an input highly on the probability of a sequence of words appearing together, it means it “likes” it. Based on this and the intuition that LLMs greatly prefer their way of phrasing sentences in a particular way, graduate students Eric Anthony Mitchell and Alexander Khazatsky built DetectGPT, which is AI-written text with up to 95% accuracy.
Despite this plethora of tools, the scope of these classifiers remains limited in other languages or even detecting bot-written code. There is quite a ways to go before tools can accurately determine bot-produced essays. However, on the brighter side, LLMs, while greatly improving in performance, do not possess the capability to write fiction for the New Yorker.
Editor’s note: AI Poetry vs. Human Poetry
Hi! It’s the editor here for a brief comment and ✨bonus content ✨
If you’d like to see one way that we here at Deepgram have used AI to write “for” us, check out the video below. Our very own author and content-creator Jose Francisco used ChatGPT to write some haikus.
It’s actually quite difficult to tell which poems are bot-written, and which are human-written. How well will you do? Check out the video below!