It sounds like something out of, well, a detective novel: the U.K.’s Sunday Times broke the news yesterday that Robert Galbraith, the “first time” writer behind the critically acclaimed crime novel The Cuckoo’s Calling, was, in fact, the nom de plume of Harry Potter creator J.K. Rowling. Galbraith was described as a former military police investigator with a surprising knack for language — that is, before “he” was unmasked as the megafamous author, who told the Times that writing under a fake name was “liberating.”
As explained by the New York Times, a writer for the British paper received an anonymous tip via Twitter, in which a now deleted user claimed that Rowling was the real author of The Cuckoo’s Calling. (Is it possible that the anonymous user was the book’s publisher? As the New York Times notes, there’s no way to rule it out.) Sunday Times editor Richard Brooks eventually confronted the publisher, but not before he investigated the similarities between Galbraith and Rowling.
One person involved in that process was Patrick Juola, a professor of computer science at Pittsburgh’s Duquesne University, who was called in by the Times to analyze the Calling text.
“The idea of looking at people’s language to know who they are goes back to the Book of Judges,” Juola says, referring to the history of the word shibboleth, the pronunciation of which was used to identify an enemy tribe in the biblical story of the Ephraimites. For his part, Juola has been researching the subject — now called forensic linguistics, with a focus on authorship attribution — for about a decade. He uses a computer program to analyze and compare word usage in different texts, with the goal of determining whether they were written by the same person. The science is more frequently applied in legal cases, such as with wills of questionable origin, but it works with literature too. (Another school of forensic linguistics puts an emphasis on impressions and style, but Juola says he’s always worried that people using that approach will just find whatever they’re looking for.)
But couldn’t an author trying to disguise herself just use different words? It’s not so easy, Juola explains. Word length, for example, is something the author might think to change — sure, some people are more prone to “utilize sesquipedalian lexical items,” he jokes, but that can change with their audiences. What the author won’t think to change are the short words, the articles and prepositions. Juola asked me where a fork goes relative to a plate; I answered “on the left” and wouldn’t ever think to change that, but another person might say “to the left” or “on the left side.”
As one part of his work, Juola uses a program — Java Graphical Authorship Attribution Program, which is a free download available for anyone to play around with — to pull out the hundred most frequent words across an author’s vocabulary. This step eliminates rare words, character names and plot points, leaving him with words like of and but, ranked by usage. Those words might seem inconsequential, but they leave an authorial fingerprint on any word.
“Prepositions and articles and similar little function words are actually very individual,” Juola says. “It’s actually very, very hard to change them because they’re so subconscious.”
Such clues, Juola is careful to point out, do not necessarily constitute incontrovertible evidence. “It doesn’t prove that [the Cuckoo author] was Rowling, but it’s a starting point,” he says. “In this particular case, I wasn’t that certain at all.” That’s because Juola was provided with relatively few texts to compare against The Cuckoo’s Calling: Rowling’s The Casual Vacancy, Ruth Rendell’s The St. Zita Society, P.D. James’ The Private Patient and Val McDermid’s The Wire in the Blood. Of those four, Cuckoo showed the highest similarity to Rowling’s work, but that only means the author was more likely to be Rowling than to be one of three other writers.
“It’s like DNA,” Juola says. “If I find your DNA at the scene of a crime, I may be able to say that the chances are billions to one that it couldn’t have been any other random person — but that doesn’t prove it was you. It’s just very strong evidence that the jury has to consider. And one of the things that they have to consider is the possibility that you have a twin that you don’t know about.”
But as limited as the evidence was, it apparently helped the Times reporters take their findings to Rowling’s publisher, where they received confirmation of their hunch.
And, though the beginnings of forensic linguistics may be ancient, Juola says this kind of sleuthing may be on the rise. He traces the beginnings of statistical analysis of text back to the 19th century, but the limiting factors have always been time and energy. Even when computers became available to count words, running such a study involved manual entry of every word from the book. Running a study of The Cuckoo’s Calling against four other novels would probably have taken a whole team of researchers days or weeks of tedious labor. With e-books readily available, almost any book can be quickly analyzed. Rowling only got a few months of anonymity, but even that period of secrecy may not long be possible for an author of her fame.
Which is too bad for authors looking for the liberation brought by a pseudonym — or maybe not so much. Since the Times unmasked Robert Galbraith, Amazon is reporting an increase of more than 500,000% in sales for The Cuckoo’s Calling.