How to detect text generated by AI, according to researchers

1 year ago

tgadmintechgreat

163

AI generated text from tools like ChatGPT are starting to make a difference in everyday life. Teachers test this in the classroom. Marketers are trying their best replace your interns. Memers are coming wild deer. To me? It would be a lie to say that I small worried about robots coming to my writing gig. (ChatGPT is thankfully unable to receive Zoom calls or conduct interviews yet.)

Now that generative AI tools are available to the public, you are more likely to encounter more synthetic content when browsing the web. Some cases may be benign, such as automatically generated BuzzFeed quiz about which deep-fried dessert fits your political beliefs. (Are you a beignet democrat or a zeppole republican?) Other cases may be more sinister, such as a sophisticated propaganda campaign by a foreign government.

Academic researchers are looking for ways to determine if a string of words was generated by a program like ChatGPT. Right now, what is the decisive indicator that everything you read has been promoted by AI?

Lack of surprise.

Entropy Estimated

Algorithms that can mimic natural writing patterns have been around for a few years longer than you might think. In 2019, Harvard and Watson MIT-IBM AI Lab released an experimental tool which scans text and highlights words based on their level of randomness.

Why might this be useful? The AI text generator is essentially a mystical pattern machine: excellent facial expressions, but weak in throwing curved balls. Sure, when you’re typing an email to your boss or sending a group message to your friends, your tone and intonation may sound predictable, but at the heart of our human communication style is moodiness.

Edward Tian, Princeton student went viral earlier this year with a similar experimental tool called GPTZeroaimed at educators. It measures the likelihood that a piece of content was generated by ChatGPT based on its “bewilderment” (also known as randomness) and “explosion” (also known as variance). OpenAI behind ChatGPT has fallen other instrument designed to scan text longer than 1000 characters and make a decision. The company is open about the limitations of the tool, such as false positives and limited effectiveness outside of the English language. In the same way that English-language data often has the highest priority for those behind AI text generators, most AI text detection tools are currently best suited for English-speaking users.

Could you feel if the news article was written, at least in part, by AI? “These AI-generated texts will never be able to handle the work of a journalist like you, Rhys,” says Tian. This is a kind feeling. CNET, a technology-focused website, has published several articles written by algorithms and dragged across the finish line by a man. ChatGPT currently lacks a certain amount of chutzpah, and sometimes hallucinations, which can be a problem for reliable reporting. Everyone knows that skilled journalists save psychedelics for work after hours.

Entropy, imitation

While these detection tools are now useful, Tom Goldstein, professor of computer science at University of Maryland, sees a future in which they will become less efficient as natural language processing becomes more complex. “This type of detector is based on the fact that there are systematic differences between human text and machine text,” says Goldstein. “But the goal of these companies is to make machine text as close to human text as possible.” Does this mean that all hope of finding synthetic carriers is lost? Absolutely not.

Goldstein worked on recent article exploring possible watermarking techniques that could be built into the large language models that underlie AI text generators. It’s not reliable, but it’s a fascinating idea. Remember, ChatGPT tries to predict the next likely word in a sentence and compares multiple options in the process. The watermark can designate certain word patterns as prohibited for the AI text generator. Thus, when the text is scanned and the watermark rules are violated several times, it indicates that this masterpiece was probably created by a human.

How to detect text generated by AI, according to researchers

Lack of surprise.

Entropy Estimated

Entropy, imitation

Leave a Reply Cancel reply