OpenAI has developed a tool capable of detecting text generated by ChatGPT with high accuracy, but the company is hesitant to release it to the public. The Wall Street Journal reports that internal debates over the tool’s potential impact have kept it under wraps for about a year.
Now, in a fresh blog post update, OpenAI confirmed its research into text watermarking as part of a broader initiative to explore content provenance solutions. The company claims its method has been “highly accurate” in certain situations, even effective against localized tampering like paraphrasing.
“Our teams have developed a text watermarking method that we continue to consider as we research alternatives,” OpenAI stated in the blog post.
However, OpenAI cites several concerns that have delayed the tool’s release. The watermarking technique is less robust against more sophisticated tampering methods, “like using translation systems, rewording with another generative model, or asking the model to insert a special character in between every word and then deleting that character.”
There are also worries about potential negative impacts on certain user groups. OpenAI noted, “For example, it could stigmatize use of AI as a useful writing tool for non-native English speakers.”
The company is weighing these risks against the potential benefits of such a tool. A survey commissioned by OpenAI found that people worldwide supported the idea of an AI detection tool by a four-to-one margin, as reported by The WSJ. This technology could be particularly valuable for educators looking to deter students from submitting AI-generated assignments.
However, nearly 30% of surveyed ChatGPT users indicated they would use the software less if watermarking were implemented, presenting a potential conflict between ethical considerations and business interests.
As the debate continues, OpenAI is exploring alternative approaches to text provenance, including the use of metadata. The company suggests this method could offer advantages over watermarking, such as eliminating false positives through cryptographic signing.
“Unlike watermarking, metadata is cryptographically signed, which means that there are no false positives,” OpenAI explained.
For now, OpenAI appears to be taking a cautious approach, prioritizing the development of authentication tools for audiovisual content while continuing to research and refine its text-based solutions. The company’s decisions on when and how to release such detection tools will likely have significant implications for the broader ecosystem of AI-generated content. We’ll just have to wait and see which direction Open AI chooses to go in the following months.