Main Content Region

Rethinking GenAI Detection

This guide offers faculty information on tools that claim to detect generative artificial intelligence (GenAI) and alternatives to their use.

As you will see, due to their unreliability, we generally recommend against the use of these tools and for reconsideration of your pedagogy in the context of GenAI tools' ubiquity.

We highly recommend that faculty create and communicate a clear classroom policy about GenAI use that is in line with CSUSB FAM 803.5: Policy and Procedures Concerning Academic Dishonesty. Visit our GenAI Course Policies page for sample course policies and other resources for developing a course policy. 

Visit the CSUSB FCE Teaching with Generative AI website for additional GenAI-related resources. We invite you to schedule an appointment with an FCE Instructional Designer to discuss GenAI in the context of your course(s). 

Material in this guide was adapted from:  Generative AI Detection Tools (USD Law)
Are AI Detectors Reliable?

As GenAI use has increased in the classroom, so has faculty interest in AI detection tools. If a faculty member believes AI has been used in an unapproved way, they may choose to use an AI detector. 

 

Do AI detection tools even work?

 


In short, no. A June 2023 study of a dozen AI detectors found that detection tools were "neither accurate nor reliable.” Similar studies from the University of Maryland and University of Adelaide documented the pitfalls inherent in accurately detecting AI generated text and the ease of fooling AI detectors into believing text was human generated. 

AI detectors are problematic. We do not recommended them as a sole indicator of academic misconduct.  Given the widespread concerns about the accuracy of AI detection tools, faculty and institutions must balance preventing plagiarism with minimizing false accusations.  AI detectors should be used with caution and discernment, or not at all

Reminder: This guide is not an endorsement of any particular tool.

How AI Detectors are Supposed to Work

The details for each detection tool are proprietary. However, in general, AI detectors typically assess some combination of perplexity, burstiness, and brevity in the text of a document. Perplexity measures how accurately AI can predict the next word. Low perplexity means that AI guesses correctly most of the time. Burstiness measures how much sentences vary in structure and length. Short sentences are snappy. Longer, more elaborate sentences slow things down and require readers to take their time. Medium sentences convey information quickly but not too quickly. This variation keeps readers engaged. Burstiness also encompasses how often terms are repeated.  

Low levels of perplexity and burstiness can sometimes indicate use of a GenAI tool. This is because when compared with typical human writing, GenAI tools often produce less variety in sentence length and structure. High levels of brevity or language that seems like a vague summarization are also often red flags for detection tools. 

The links below provide more information on burstiness and perplexity. 

Challenges with AI Detectors: False Positives and False Negatives

AI Detectors: Neither Accurate Nor Reliable

In theory, AI detectors analyze a piece of writing and assess what percentage of the text is AI-generated versus human-generated. However, multiple studies have shown that AI detectors  were "neither accurate nor reliable," producing a high number of both false positives and false negatives

 

False Positives

 

False positives incorrectly flag content written by humans as having been written by a generative AI tool.  False positives and accusations of academic misconduct can have serious repercussions for a student’s academic record. False positives can also create an environment of distrust where students are treated as suspicious by default and that can undermine the  faculty-student relationship. 

False positive rates vary widely.  Turnitin has previously stated that its AI checker had a less than 1% false positive rate though a later study by the Washington Post produced a much higher rate of 50% (albeit with a much smaller sample size). Recent studies also indicate that neurodivergent students (autism, ADHD, dyslexia, etc…) and students for whom English is a second language are flagged by AI detection tools at higher rates than native English speakers due to reliance on repeated phrases, terms, and words.

 

False Negatives

 

False negatives fail to identify documents that do contain AI-generated text.  False negatives occur most often due to an AI tool’s sensitivity settings or to users intentionally using evasive techniques to make their text more human-like. 

First, AI detection companies need to balance the false positive and false negative rates in light of the serious academic ramifications that result from false positives.  For example, Turnitin’s AI checker can miss roughly 15 percent of AI-generated text in a document. “We’re comfortable with that [false negative rate] since we do not want to highlight human-written text as AI text,” says the company, noting its 1 percent false positive rate. 

Second, individuals are able to circumvent AI detection tools by simply paraphrasing, inserting emotion or anecdotes, increasing word or structure diversity, or simply using other AI tools (e.g. Writesonic’s AI Humanizer or UndetectableAI) to add human-like elements to their writing.  Cat Casey, chief growth officer at Reveal and a member of the New York State Bar AI Task Force noted,"I could pass any generative AI detector by simply engineering my prompts in such a way that it creates the fallibility or the lack of pattern in human language.” She added that she is often able to fool detectors 80-90% of the time simply by adding the single word “cheeky” to her prompt since it implies irreverent metaphors. 

 

The Arms Race

 

AI generators and AI detectors are locked in an eternal arms race, with both getting better over time. “As text-generating AI improves, so will the detectors — a never-ending back-and-forth similar to that between cybercriminals and security researchers… That’s all to say that there’s no silver bullet to solve the problems AI-generated text poses. Quite likely, there won’t ever be.” TechCrunch (January 31, 2023)

What (Not) to Do When You Suspect Unauthorized GenAI Use

Best Practices

 

As noted, AI detection tools are generally unreliable. However, they can often reliably detect lower-quality AI generated content. Use of GenAI in violation of course policy is subject to the provisions relating to Academic Dishonesty as found in the CSUSB FAM 803.5. This will work best if faculty have a clearly communicated GenAI policy in their syllabus. See the CSUSB FCE Course Policies page for more resources on developing GenAI syllabus policy.

 

AI detectors are only a single tool in the litany of ways that instructors can discern academic misconduct. There is no substitute for knowing a student, understanding their writing style and background.  

 

Don't:

 

  • Don't rush to discipline 

It is tempting to rush ahead when you suspect the use of generative AI. However, all academic misconduct issues (plagiarism, ghostwriting, cheating) must be thoroughly investigated in accordance with the Office of Student Conduct and Ethical Development procedures.

  • Don't fail the entire class (probably goes without saying but...)

In May 2023, a Texas A&M instructor falsely accused an entire class of using ChatGPT to write their essays, putting them at risk of failing. While ultimately no students flunked or were prevented from graduating, the fallout and subsequent scrutiny was irreversible and certainly quite avoidable. 

 

Do:

 

  • Do make comparisons to previous work

Compare the student's final submission with the student’s previous work. Consider requiring first drafts if you don’t already require them; this is something you can add in your syllabus or assignment requirements. Encourage students to maintain evidence of their notes and outlines.  Consider having students submit their research documentation as part of their assignment(s). 

  • Do talk with the student

If you do choose to use a detection tool, share results with students. Even though the score itself is not an indictment, talk through areas that the detector has flagged. If the writing is formulaic and word choice is repetitive (often a hallmark of AI generated text), make sure that the student understands why this detracts from their argument.  For example, you could say: "From my experience, formulaic writing and repetitive word choice can signal GenAI use. Here is what I'm seeing in your work. Let's talk about how this affects your argument.” This could then lead into a discussion of the student's writing choices, which could, in turn, lead to discussion of GenAI tools. If the work is substantially different from the student’s previous work, point out the key differences. Also consider asking the student about their process. 

  • Do offer a second chance if the evidence is inconclusive. 

If you do use a detection tool (or more than one),  and given their unreliability, consider offering students a chance to redo the work. If you have prohibited GenAI use in your course in whole or in part, and you can prove the student used it in violation of your course policy, that use is subject to the penalties or discipline as outlined in your course policy. This can include reductions in grade or withdrawal from the course. 

  • If necessary....

Please see the "Plagiarism and Cheating" section on the "Academic Regulations" page of the CSUSB Catalog. Follow the steps outlined in the Student Academic Dishonesty Form to initiate a formal inquiry. You can also contact the Office of Student Conduct and Ethical Development at student-conduct@csusb.edu or 909-537-7172.

Incorporating (Gen)AI vs. Detecting (Gen)AI 
Some Available Tools

It bears repeating here that this guide is not an endorsement of any particular tool. 

AI detection tools have been plagued by low accuracy rates and even once leading tools like OpenAI’s classifier tool have quietly shut down. Many U.S. and international universities have banned or recommended against these tools. Below is a list of available tools. 

If you choose to use detection tools in your course, we strongly recommend transparency in their use. Costs are subject to change per provider. CSUSB does not endorse or provide technical support for the use of these tools.

GPTZero launched in January 2023 and later that year announced a partnership with the American Federation of Teachers.  It has tiered pricing with the basic level available for free, essential $8.33/month, premium $12.99/month, and professional for $24.99/month. Free

Created specifically for educators and publishers. Like Turnitin and GPTZero, it incorporates a plagiarism checker and percentage score.  It also allows for the ability to scan handwritten documents. Users can try the tool for free for up to 2,000 words. After that pricing starts at $12/month (80,000 words) or $19/month (200,000 words)

Originality.AI is inexpensive and easy to use though some claim a high rate of false positives, which means that it is incorrectly labeling human-written text as AI-generated. Originality.ai relies on grammar, spelling, and syntactic errors to accurately identify human-written text. After a free trial, it costs $0.01 per credit (1 credit scans 100 words).

Copyleaks easily integrates with some of the most popular learning management systems (LMS) used by instructors, including Canvas and Blackboard.  Copyleaks claims a 99.12% detection rate accuracy but admits that the accuracy rate is typically lower than other content styles. Copyleaks is a subscription service for $7.99/month.

Turnitin, a leading academic anti-plagiarism company launched Turnitin AI checker as an add-on to its plagiarism tool in early 2023. However, CSUSB is not licensed for the Turnitin AI Checker due to its unreliability.