AI Content Detector Check: How Good They Are + Tools You Can Use

We think it’s really important for the tools that check if content was made by AI to be clear, honest, and responsible. Everyone who needs to find out if something was written by AI should easily find out which tool does the best job for what they need.

AI detection tools are important but not always right. People need to know what these tools can do and what they can’t. The people who make these tools should be clear about how well their tools work.

There’s information out there about how good the Bypass Engine tool is, including reports from different studies.

You can test the Bypass Engine tool for free by clicking here.

Check Out Our AI Detector

This guide shows you how to pick the top AI content detector. It explains their workings and gives tools for testing how accurate they are. It includes:

A dataset that we can both use to test how good AI spotters are.
A simple tool that checks how good AI is at recognizing things from various collections.
Directions and a calculator to pick the top tests for spotting.
Clear reports that show the truth, using data everyone can see.

This guide helps you learn how to spot AI detectors, check if they work well, do your own experiments, and see how different ones stack up.

Have any questions? Get in touch with us!

Short and Sweet:

Lite is now everyone’s favorite for simple AI help (like what Grammarly offers), taking the place of the Standard version.
Turbo 3.0.1 is a super smart tool that finds even the tiniest hints of AI.
Pick Turbo 3.0.1 for work without any AI.
Choose Lite for easier AI help and less mistakes.
Bypass Engine has become even better at checking work, making it the most accurate AI out there.

AI detection isn’t always right. Some think it catches everything, while others think it misses a lot. The real situation is somewhere in the middle. We’re all waiting to see if OpenAI, or anyone else, can show us something different.

– Being clear and responsible is important when developing and using AI.

– AI detectors can help stop some of the bad effects generative AI might have on society.

Making AI detection clear and responsible helps keep things fair and safe for students and workers. Bypass Engine backs using AI the right way. It pushes for openness, making sure people can believe in what they see or read.

The FTC says be careful with claims about how good AI is at finding AI-made stuff.

AI tools that check facts need to show they’re really good at it. If they say they’re good without proof, it can cause big problems for everyone.

The FTC says we shouldn’t trust tools that can’t prove they work right. The people who make these tools agree with the FTC and anyone can check their work.

There have been issues, like a teacher who mistakenly failed a whole class because of these mistakes.

The effects of AI content that people can’t tell is made by AI are serious.

We need tools that can spot this AI-made content. Right now, the tools and studies we have aren’t strong enough to deal with the big problems this content can cause.

Widely spread misleading information.
Untrue stories.
Harmful, unwanted messages.
Unfair copying in school / Copying work using AI.
False beliefs.
Writers who cheat.
Companies that cheat.
Made-up praise for products.
Made-up job forms.
Made-up college entry essays.
Made-up requests for scholarships.

Version History of Originality.ai:

November 2022 : They launched a test version. It was out before ChatGPT. It could find GPT-2, GPT-NEO, GPT-J, and GPT-3. But, with some small changes, you could get around it.

April 2023 : Got better at telling if ChatGPT made something, even if someone changed the words. Fewer mistakes now.

August 2023 : They made things more accurate and got better at finding content changed by AI. They also gave away a free set of data and started a new tool for checking AI detectors.

February 2024 : They made Turbo 3.0 better by using new AI stuff like Grok, Mixtral, GPT-4 Turbo, Gemini, and Claude 2. It got really good, being right 98.8% of the time and making fewer mistakes.

July 2024 : Launched Lite 1.0.0. It’s 98% accurate and has less than 1% mistakes. Also, it helps find out if writing uses help from AI, like Grammarly.

September 2024 : They picked Lite as the best model and stopped using the old ones.

October 2024 : They made Turbo 3.0.1. It’s almost never wrong, with less than 3% mistakes. It’s made to not let AI trick it, very strong against tricks.

How Our AI Detector Works: A Simple Guide

We have a system that can tell if something is written by a computer or a person.

The system gets better by studying lots of examples written by AI and humans. It learns to spot what makes them different by using a strong model.

Curious about spotting stuff written by computers?

How Machines Spot AI-Written Texts Machines

Here’s a simple explanation of the three main ways a tool (often called a “classifier” in the world of computer learning) can tell if a text was written by an AI or a person.

1. Using Features to Solve Problems

Finds patterns that only AI-made texts have.
Burstiness shows how much we repeat words together. People say some phrases a lot, but AI changes its words more.
Confusion measures if we can guess the next word easily. Text made by AI is usually easy to guess (less confusing), but text written by people is harder to figure out (more confusing).
Frequency Features – Looks at how much words or phrases show up. Odd patterns might show AI had a hand in it.
Studies have found that text made by AI, especially the older kinds, is usually easy to understand.
AI uses punctuation correctly, but it might not feel as natural as how humans write.
Benefits: Fast, saves money, good for older AI systems.
Downsides: Having trouble with complex models like GPT-4 and Bard.
Examples: GPTZero, Winston AI

2. A Zero-Shot Approach

AI matches text with how it writes.
Benefits: Easy to use, no extra learning needed.
Downsides: Easy to get around by rewording
Examples: GPTZero, ZeroGPT

3. Improving How We Make AI Smarter

Checks using smart programs (like BERT, RoBERTa) to find what makes AI writing different from what people write.
Benefits: It’s the best way to figure out who made the text.
Downsides: Expensive and slow to keep up with the latest AI models.
Examples: Originality.ai AI Detector, OpenAI Text Classifier (taken offline)

The test we’re talking about checks how well different detectors work using all the strategies we mentioned before.

Plan for Tests:

We checked out the new Bypass Engine AI Content Checker by doing a few things:

1. We looked at three different kinds with lots of examples.

2. We saw how well it did on a tough, small set of data.

3. We tried it with our own tools and typed in the info ourselves when we couldn’t use automatic stuff.

4. We used well-known data places to see how good Originality.ai really is.

We can also try the second test again using a tool anyone can use and special sets of data.

Check Out Our New Adversarial AI Detection Dataset:

We are sharing a special set of data to check how well AI detectors work and show what they can’t do as well. It has:

Tough robot-made stories (like ChatGPT-4 writings, or changed robot words). Stuff written by people to see the difference.

This data comes from a big group but isn’t picked for the main tests. It helps make sure tests are fair and not biased.

Check out the table below. It has a list of datasets and a short description for each one.

You can download the dataset by clicking here.

What’s the top way to check? Try using your data!

Using AI tools with your info helps make them more accurate. But, you may have to adjust them some to catch fake posts. Try our free tool to check how good the detector is with your data.

Finding Out How & Discovering Free Tools for Everyone:

People made tools to help spot AI-written stuff better. They give info, numbers, and clear details. Their best tool works with AI spotters using a special code to make checking easy and right.

Doing all the tests at nearly the same time in one day.
Making sure each tool gets the exact same text without any changes in how it looks.
Testing data sets fast as they come up.
Letting potential customers or researchers try their data to figure out the best AI detector for what they need.

New LLMs come out really fast, and so does better AI detection. This means if a study takes 4 months to get published, it’s already too old.

Features of This Tool:

Free and you can get it without paying
Can check a bunch of text with different AI checkers
Gives you the info fast
By itself, figures out how well the checkers work (like how right they are, mistakes they make, and more)

Check out the link for a cool tool: https://github.com/OriginalityAI/AI-detector-research-

tool.

We’ve also added three easy methods to use our tool with your data sets.

Look for AI-made stuff in Microsoft Excel
Search for AI-created things in Google Sheets.
Find AI content in AirTable.

Our Take on Using AI Detectors in Schools and the Issue with Mistakes

Using only AI scores to decide if someone did something wrong can be mistaken.

To stop mistakes, the makers have given a guide for writers and students plus a free tool for Chrome to check if work is original.

The new version, 1.0.0 Lite, is made for schools to help make little corrections easily with tools similar to Grammarly.

How to Check if AI Detectors Are Right

Here’s how we figure out if AI detectors for spotting AI-created content are any good. It might get a bit techy, but if you want to dive deeper, we’ve got more details ready for you.

A single number telling us how good a detector is doesn’t really help at all!

You shouldn’t believe just one “accuracy” score without knowing more about it.

Let’s check out the different ways we can see how well these detectors work…

Confusion Matrix

The confusion matrix and the F1 score show us how well an AI model can tell apart real and AI-made content. You get to see this fast in just one picture.

True Positive (TP) means it correctly spots text made by AI.
Missed (FN) – The AI checker got it wrong, marking AI stuff as made by a person.
Mistake (FP) – The AI checker goofed, saying person-made stuff was from AI.
True Negative – It correctly identifies content written by humans.

Understanding a Confusion Matrix with GPT-4 and Human Data, Easy Version 1.4

True Positive Rate — Catching AI Text

This finds AI-written text right x% of the time. True Positive Rate, or TPR, is also called sensitivity, hit rate, or recall.

True Positive Rate TPR = TP / (TP & FN)

True Negative Rate — Finding Human Writing:

This checks if content is made by a person correctly x% of the time. True Negative Rate TNR (it’s also called specificity or selectivity).

True Negative Rate TNR = TN / (TN & FP) = 1- FPR

Accuracy:

Just being right isn’t all that matters—some tools that find AI writing might say they’re really good, but they don’t show how they work.

Accuracy = True / (True + False) = (TP + TN) / (TP + TN + FB +FN)

F1:

A score that nicely mixes accuracy and noticing all important things to fairly judge AI spotters.

F1 = 2 x (PPV x TPR) / (PPV + TPR) where Precision (PPV) = TP / (TP + FP)

Metrics We Thought About But Didn’t Use:

We didn’t include ROC and AUROC because we can’t change how sensitive other tools are, and some tools don’t use percentage scores.
Precision, which means PPV =TP /(TP+FP), is also not used.

So, what exactly counts as AI content?

What counts as AI content and what does not? With more writing done by both people and AI helpers, it gets hard to tell what’s purely AI content!

Some research has made odd choices about what they call true human or AI-made writing.

For example, one study looked at texts written by humans in various languages, then translated to English with AI, and still called it genuine human work.

Dataset Description:

Saying the AI-made dataset seems like it was written by a person???

https://arxiv.org/pdf/2306.15666.pdf

AI detectors have to know when text made by AI has been changed a lot. Even little edits can hide that AI made it. So, we sort content touched by AI in different ways.

AI-made writing = Made by AI
AI writing that people improve = Made by AI
AI-made with big human changes = Made by AI
Plan by AI, but created and changed a lot by people = Made by AI
Writing by people, little AI help = It depends
Studies by AI, but writing by people = Truly made by people
Writing and checking both done by people = Truly made by people

The more we use AI in writing, the easier it is to notice it’s AI-written. Kristi Hines, a journalist, and others looked into this. They shared their findings in the Search Engine Journal.

We’re checking how unique Bypass Engine new Lite version is for its July 2024 update.

At last! It’s time for some tests

New and better LLMs keep coming, so we have to refresh our models and how we test them.

Meet Version Lite 1.0.0.

Our July 2024 release is bringing out Version Lite 1.0.0. Check out how this new Lite model did in our tests.

The Lite version is really good at spotting content written by people and makes few mistakes, even when finding small changes made by AI.
Performance of Different Datasets

Lite Version 1.0.0 can tell if content has some AI edits or is all made by AI.
It has a really low chance, just 1%, of being wrong when it says something is AI-made (that’s better than the older Version 2.0.0). It’s okay at missing AI-made stuff, with only a 2% chance.
It was the most accurate on 4 out of 10 AI tests it took.

Main Points: Very low mistake rate and great at spotting content with small AI changes.

Version Lite 1.0.0 is really good at spotting changes made by AI tools like Grammarly, and it hardly ever gets it wrong. This makes it great for schools and learning.

Reviewing 6 Tools for Spotting Sneaky AI

Next, we’re going to check how well different AI detectors work by testing them.

To do these tests again and again, we used…

Meet our brand new tool: the Open Source AI Detector Accuracy Checker.
Check out these free datasets you can use to see how the tools stack up:

Adversarial AI Detection Benchmark that’s Open Source
Bias Dataset for ChatGPT Detector

Test #2 is tricky. It uses tough settings for GPT-3, GPT-4, ChatGPT, and Paraphraser data. It’s not the same as most AI recognition tools with regular AI stuff.

Check out Test #2 — What We Found

You can see the scores, data, and details by downloading them here:

https://drive.google.com/drive/folders/14uCk9s8ypSp63Yd-M4EGYr9uDyO4U9D0?usp=drive_link

Test Number 2 – Results of Checking if AI Can Spot AI Writing:

We did this test with an older version, 2.0.0, but we’re not using it anymore.

Now, let’s look at how well our latest version, called Lite, did in these tests.

The table orders stuff by a special number, F1. This number shows how good it is at finding AI stuff and knowing what’s made by humans.
Every tool did pretty okay at not mixing stuff up, with mistakes as low as 0.8% and up to 7.6%.
How well they could tell AI stuff varied a lot, from 19.8% all the way to 98.4%.

Test #2 — How Each AI Detector Tells What’s Confusing:

Understanding Data with Bypass Engine — A Simple Look at Our Second Experiment with Challenging Data Sets

[Turbo Model 3.0.1]

[Lite Model 1.0.0]

Winston.AI — Trying Out — Second Experiment with Challenging Data

Winston.ai F1 = 52.0

Tree.AI — Mixing Up Table — Experiment #2 Challenge Data Check

Sapling.ai F1 = 37.9

GPTZero Battles Confusion – Round 2: Testing with Tough Data

GPTZero F1 = 34.2

Exploring Content at Scale – Understanding Confusion – Experiment #2: Trying Out Challenge Data

ContentatScale F1 = 33.4

CopyLeaks: Trying out Tests on Tricky Data Sets

CopyLeaks F1 = 32.9

Problems with Test #2:

Input Made Easy: We had to check some AI tools like ContentAtScale, TurnItIn, and WinstonAI by hand because computers couldn’t do it all. There might be mistakes, but we double-checked things we weren’t sure about.

Data Quality: The dataset came from a much bigger set and wasn’t all cleaned up, so some examples might not be perfect.

On February 13, 2024, we checked our new AI detector and also did checks from July 24-28, 2023. These tests show how it worked then but may not tell us how it will do later.

Our team studying AI says that just using 2,000 tests doesn’t show us everything about how it works.

Need to check how right it is using different info? Use our no-cost tool with many kinds of data.

Open-Source AI Detector Comparison Tool & Dataset: Now Available

Test #3 – Other AI Detection Datasets and 5 Additional Tests:

Here’s more data for you to try out.

We didn’t test all the tools with this data, but we did check out Bypass Engine on them. Here’s how Bypass Engine did.

All this data is from research papers that anyone can read.

We’re looking at the differences between the Lite and Turbo models below.

Test 3-A: How Similar is ChatGPT to Real People Experts?

The study shows that the model beats human judges and even the best experts by 8.8%. It also gets better scores on different tests than other models in the research.
https://arxiv.org/pdf/2301.07597.pdf
https://huggingface.co/datasets/Hello-SimpleAI/HC3

Bypass Engine Test #3-A: Turning ChatGPT’s Words into Human Talk

[Turbo Model 3.0.1]

‍

[Lite Model 1.0.0]

Test 3-B: Checking How to Spot Science Articles Written by Computers

https://aclanthology.org/2023.trustnlp-1.17.pdf.
https://huggingface.co/datasets/tum-nlp/IDMGSP
Our model is more accurate than four others according to the test scale. It doesn’t need to learn from research papers. But, if the other three models learn from almost all test data types, they do slightly better than ours.

Bypass Engine: Test #3-B on Spotting Ai-Written Essays

[Turbo Model 3.0.1]

[Lite Model 1.0.0]

‍Dataset and Results

Test 3-C: Spotting Text Written by Big AI Systems

The paper showed that being unique beat all the tested detectors.
Paper: https://arxiv.org/pdf/2305.15047.pdf
Dataset: https://github.com/vivek3141/ghostbuster (2,895 Samples)

Ghostbuster Test — Confusion Chart — Bypass Engine, Test 3-C

[Turbo Model 3.0.1]

[Lite Model 1.0.0]

‍Dataset and Results

Test 3-D — Learning How AI Can Spot Essays It Made

We got a perfect 100% score in both classes.
Articles written by Swedish students learning English show how well our model works with English learners.
We outperformed every model mentioned here.
https://www.mdpi.com/2076-3417/13/13/7901
https://github.com/rcorizzo/one-class-essay-detection

Bypass Engine — Understanding the Confusion Table — Test #3-D — Spotting Essays Written by AI

[Turbo Model 3.0.1]

‍Dataset & Results

Test 3-E — Spot the AI: Finding ChatGPT’s School Essays with CheckGPT

Our system was really good at spotting text made by AI. It got it right 96.7% of the time. This includes tough topics like physics, computer science, and social studies, which we didn’t focus on when we made the system. GPTZero scored 61.2%.
We checked 9,000 examples and got things right 94.5% of the time, almost as good as the best study like ours. Even though we tested in different ways, we did really well and were much better than GPTZero at telling if content was made by AI.
https://arxiv.org/pdf/2306.05524.pdf
https://huggingface.co/datasets/julianzy/GPABenchmark

TryMe Originality – Confusion Puzzle – Test #3-E – Can You Spot Me?

[Turbo Model 3.0.1]

‍Dataset & Results

Other Studies

The studies and data we didn’t talk about have the same problems…

Small Tests (using 100 samples is just not enough!)
Long Waits — If it takes a long time for a study to get published, it’s a problem because things change fast.
Not Sharing Data — When studies don’t share their data, it’s always a letdown.
Comparing Tools — Whenever tools or accuracy are compared, the data used should be shared.
Simple AI Tests — Making hard AI vs. Human tests is tough and costs a lot, but making easy AI tests is too simple. A test with data made by GPT-2 doesn’t show us much.
Other Tests to Look At:
HuggingChat AI Detector
Paraphrase Plagiarism Checker

3rd party groups did six more studies. They found that being Original is the most accurate thing.

Exploring how well we can spot AI-written texts
Studying how to catch text made by AI
How good are programs at finding AI-written pieces?
A big look into “RAID” study
When AI writes for medical topics
Research on AI’s role in science summaries
Checking if students use AI for their homework

We used our own data and other data everyone can access. The tests keep showing that Originality.ai is good at spotting AI work.

Every Tool That Spots AI-Made Texts

Here’s a simple guide to finding AI content checker tools. If you want to compare more AI checkers and what they do, check out this article: “22 AI Content Detection Tools.”

Tools List:

HuggingFace
GLTR.io AI
Passed.AI
Writer.com
Willieai.com
GPTZero
ContentAtScale
CopyLeaks
POE Poem of Quotes
DetectGPT
On-Page.AI
GPTRadar.com
Percent Human
Grover
KazanSEO
Sapling
CrossPlag
CheckForAI.com
Draft & Goal
GPTkit.ai
ParaphrasingTool.ai
OpenAI Text Classifier (removed)
AI Writing Check
Winston AI
InkForAll
ContentDetector.ai
WriteFull
ZeroGPT
TurnItIn
Bypass Engine

The tests proved some tools are better than others! Many tools were made fast and just use a well-liked Open Source GPT-2 detector (it got 195k downloads last month).

Why does our model work better?

Here’s why we think Originality.ai does a much better job at finding AI-written stuff than other options out there:

This AI model uses a lot of power, making it expensive to run. So, it can’t be free or supported by ads.
This AI helps people who write on the internet. It learns by reading online stuff to tell if a human or an AI wrote something. It doesn’t look at old books because they are written differently.
The AI gets better by studying hard questions and tricks, and it changes to meet new problems more accurately.

Closing Words

The Bypass Engine group has worked really hard to make their AI content checker better and aim for top performance.

The Outcomes…

Bypass Engine Launches Lite 1.0.0.
- Lite 1.0.0 is very accurate, with a 98% success rate. It only makes a mistake 1% of the time. You can also use it to make your writing better, like with Grammarly tips. It’s perfect for school work.
Bypass Engine Launches Version 3.0.1 Turbo
- Turbo 3.0.1 gets things right almost all the time with very few mistakes and it’s tough to trick.
- If you really don’t want AI in your writing, pick Turbo version 3.0.1.
- If you’re okay fixing a few small things, go for Lite.
In six different studies, the newest Bypass Engineversion was the top performer in finding accurate results every time.
We tested six tools that check if writing is done by AI on a tough new set of tests. The best at finding AI-written content was Bypass Engine.
A New, Easy-to-Use Tool and Special Data Set for Quickly Testing Detectors Made and Shared

We hope this post makes it easier for you to learn about AI detectors and how accurate they are. We want to give you the tools to check things out on your own if you’re interested.

We believe…

We should be open and responsible when making and using AI.
Tools that spot AI can help lessen some bad effects it might have on society.
How right or wrong AI finding tools are should be shared clearly and responsibly, just like we ask for in AI’s making and use.

We hope this study is useful. Got questions about Bypass Engine?

Get in touch with us.

Thinking of doing your own tests? Let us know. If you are a student, a journalist, or just want to learn, we’re here to assist.

See how accurate Bypass Engine is in different studies and give our AI detector a go yourself!