A machine-learning system that guesses whether text was produced by machine-learning systems
Gltr is an MIT-IBM Watson Lab/Harvard NLP joint project that analyzes texts and predicts whether that text was generated by a machine-learning model.
Automatically produced texts use language models derived from statistical analysis of vast corpuses of human-generated text to produce machine-generated texts that can be very hard for a human to distinguish from text produced by another human. These models could help malicious actors in many ways, including generating convincing spam, reviews, and comments — so it’s really important to develop tools that can help us distinguish between human-generated and machine-generated texts.
Gltr uses Open AI’s GPT-2 117M language model, which is also widely used by text-generating models. Gltr looks for texts that fit the GPT-2 model too well, on the basis that texts produced by humans nearly always have “surprising” word combinations that are considered highly unlikely by GPT-2’s lights. In other words, if a text seems too human, it’s probably machine generated.
It’s not hard to think of ways to defeat this tactic: GPT-2 could be modified to inject some random word-choices that roughed up the otherwise overly smooth statistical model; but Gltr relies on human beings to review the scores it gives to text, producing confidence scores that pop up when you hover your mouse over a word in a candidate text. This makes it harder to trick Gltr with random words, but it also means that Gltr is hard to scale up to analyzing large volumes of information, like all the tweets under a popular hashtag.
Even without the benefit of scale, there are some intriguing possibilities for integrating Gltr with a browser: you could toggle it on when you encountered a text that you were suspicious of and use its analysis to help you make up your own mind.
While it is possible to paste any text into the tool, we provided some examples of fake and real texts. Notice that the fraction of red and purple words, i.e. unlikely predictions, increases when you move to the real texts. Moreover, we found that the informative snippets within a text almost always appear in red or purple since these “surprising” terms carry the message of the text.
By hovering over a word in the display, a small box presents the top 5 predicted words, their associated probabilities, as well as the position of the following word. It is a fun exercise to look into what a model would have predicted.
Finally, the tool shows three different histograms that aggregate the information over the whole text. The first one demonstrates how many words of each category appear in the text. The second one illustrates the ratio between the probabilities of the top predicted word and the following word. The last histogram shows the distribution over the entropies of the predictions. A low uncertainty implies that the model was very confident of each prediction, whereas a high uncertainty implies uncertainty. You can observe that for the academic text input, the uncertainty is generally higher than the samples from the model.