Ranking Historical Figures: Skiena and Ward's "Who's Bigger?" Reviewed

Who was the greatest baseball player of all time? Some people say Willie Mays. They emphasize that he had all of baseball’s “five tools”: he could run, hit, field, throw, and hit with power. Other people insist on Ty Cobb, who had the highest career batting average in baseball history. Still others say Cy Young, on the ground that good pitchers are more important than good hitters, and Young won more games than any pitcher who ever lived. Joe DiMaggio has his advocates, who note that he had the longest hitting streak in baseball history, and who emphasize that hitters, unlike pitchers, play every day. Still others say Hank Aaron, who had the most career home runs (except for Barry Bonds, whose all-time record was marred by steroid use).

It is certainly possible to rank baseball players in terms of batting average, wins, hitting streaks, and home runs. But people vigorously disagree about the relationship among those particular rankings and overall “greatness.” Can we mediate these disagreements? Baseball statisticians are trying. After all, the goal of a baseball player is to help his team to win. Maybe we can measure greatness in baseball by exploring how much a player contributes to wins. In fact, a statistical measure called Wins Above Replacement Player (warp) tries to isolate each player’s contribution, by specifying how many wins a player adds, compared with a standardized lesser player (say, a player who does not normally make it into the starting lineup). It turns out that Mays had 156 warp over his career, Cobb 151, Young 168, DiMaggio 78, and Aaron 142. With these numbers, we might be inclined to conclude that Young was baseball’s all-time greatest player (with the exception of Babe Ruth, who heads the warp list at 184).

Whatever its limitations, warp is a far better measure than imaginable alternatives. To identify the greatest baseball player of all time, it would not make a lot of sense to use some kind of poll or referendum. People might choose a recent player, because he is familiar to them or because he is their personal favorite. (David Ortiz, president of Red Sox Nation, might do well this year.) Not knowing about statistics, people might rely on a measure, such as batting average, that is not a valid test of baseball greatness. A poll would enable us to identify the player that polltakers favor, but unless their own measures are fairly reliable, it will not tell us a lot more than that. As compared with polls, statistical analysis has real advantages.

And who was the greatest president in American history? Reasonable people might say George Washington, Abraham Lincoln, or Franklin Delano Roosevelt. But how is presidential greatness measured? Can we devise a warp for presidents—something like Wins Above Replacement President? If so, we would need to specify the functional equivalent of “wins.” Maybe the term would refer to economic growth or to wars averted or to wars won, controlling for historical circumstances; if so, we would need to produce some kind of measure twhat would aggregate presidential “wins.” The problem is that history is run only once. Outside science fiction, it is not possible to say what a Replacement President would have done, and to specify how things would have turned out if he had done it. In the fullness of time, we might be able to make some progress in measuring presidential greatness in statistical terms, but it is not exactly surprising that, to date, rankings of presidents tend to rely on polls, flawed as they are.

And what about religious leaders, scientists, philosophers, artists, and novelists? Can they be ranked as well—in terms of greatness or importance? Might we be able to play some kind of Moneyball with Joyce Carol Oates, Stephen King, James Joyce, Charles Dickens, and Thomas Hardy? Can cultural figures from diverse fields be ranked against each other? How might we compare Einstein, Plato, Descartes, Hume, Michelangelo, Suzanne Farrell, and Bob Dylan? True, it might be ridiculous, or even a bit crazy, to try. Skeptics might wonder about the point of such efforts: what kind of game is this? This is a good and potentially devastating question, but if we want to understand the arc of history and the nature of social influence, the endeavor might turn out to be interesting and perhaps even worthwhile. (And besides, many people find it fun.)

We now have access to digital versions of millions of books, and we can search them to know who and what is mentioned, and where, and how much. The term “culturonomics” sounds both faddish and ugly, but it refers to a promising new field, and we are going to be able to learn a lot from it. In 2011, Jean-Baptiste Michel and multiple co-authors published an article in Science, helpfully if not colorfully titled “Quantitative Analysis of Culture Using Millions of Digitized Books,” which announced that more than five million books had been digitized, thus giving us a new tool by which to identify cultural trends and to quantify changes over time. You can see when certain words become popular, how grammar evolves, when scientific developments begin to be discussed, which illnesses receive attention, which philosophers are mentioned and when and how much, and far more. From millions of digitized books, we should be able to learn a great deal about culture and social norms and how they change. In important ways, we might also be able to rank people, places, and things.

Steven Skiena and Charles Ward are keenly interested in, even delighted by, rankings. In particular, they are interested in ranking people along one dimension: significance. It would certainly be interesting to develop a kind of warp for significance. How much did Einstein, Darwin, Descartes, Freud, Michelangelo, Mozart, Picasso, and Bob Dylan contribute to the world, compared with the average human being? That seems to be an interesting question, but it raises obvious conceptual and empirical challenges. We lack standards and tools to measure social contributions, certainly across time and across diverse fields and enterprises.

Skiena and Ward do not argue against this conclusion. Undaunted, they nonetheless offer a significance ranking. Here is their list of the twenty most significant people of all time:

1. Jesus
2. Napoleon
3. Mohammed
4. William Shakespeare
5. Abraham Lincoln
6. George Washington
7. Adolf Hitler
8. Aristotle
9. Alexander the Great
10. Thomas Jefferson
11. Henry VIII
12. Charles Darwin
13. Elizabeth I
14. Karl Marx
15. Julius Caesar
16. Queen Victoria
17. Martin Luther
18. Joseph Stalin
19. Albert Einstein
20. Christopher Columbus

Skiena and Ward compile this list by reference to what they see as five objective indicators, every one involving the English-language version of Wikipedia. (That is a big problem, and we will get to it in due course.) Their first two indicators draw on Google’s famous algorithm, called Page-Rank. Skiena and Ward contend that the pages of significant people end up getting a lot of links. If numerous Wikipedia pages end up linking to Abraham Lincoln, we have a clue that Lincoln was a major figure. With this point in mind, Skiena and Ward ask: what is the probability that a random Wikipedia page will link to a particular person’s page? The higher the probability, the more significant that person’s page.

Skiena and Ward are aware that you might come to Jesus (so to speak) not through surfing pages that involve people, but because Jesus’ page gets a lot of links from pages that involve institutions, animals, and inanimate objects. By the Page-Rank method, for example, Carl Linnaeus, the great scientist of classification, ends up third on their all-time list, which seems pretty absurd. Owing to this problem, they add a second measure, which limits the PageRank analysis to links among people. With this measure, Carl Linnaeus’s ranking plummets. (Jesus does great.)

For their third measure, Skiena and Ward focus on the number of “hits” that Wikipedia pages receive. They note that this measure can produce dramatically different rankings from those that emerge from PageRank. Many entertainers, such as Justin Bieber and Taylor Swift, get a phenomenal number of hits, even though they do not do especially well on PageRank. Their fourth measure involves the length of Wikipedia articles. In their view, more significant people will tend to end up with longer articles, reflecting the magnitude of their contribution. Fifth, and finally, Skiena and Ward explore the sheer number of times that a page is edited. They think that if a lot of people are contributing to a page, there is a great deal of interest in it, and that interest tells us something about significance.

Skiena and Ward are aware that their different indicators might measure different things. They find that by their two PageRank measures, famous presidents, scientists, and philosophers tend to do quite well, whereas famous movie stars do better in terms of hits, length of articles, and number of edits. They say that their first two measures capture “gravitas,” while the latter three reflect “celebrity.” For their judgment of “fame,” they add the two measures together.

Illustration by Oliver Barrett

But Skiena and Ward used their five Wikipedia indicators on October 11, 2010, and they are aware of a source of serious distortion, which involves changes over time. If we don’t make some kind of adjustment, taking account of the skewing effects of recency, the results look truly bizarre. George W. Bush would rank as the most significant person of all time, whereas Barack Obama would be third, Ronald Reagan sixth, Bill Clinton seventh, and Michael Jackson ninth. (Also, Britney Spears and Aristotle would have similar rankings.) The list of the six most significant people in the history of the human species cannot possibly include four American presidents since 1980. Skiena and Ward seek to convert “fame” to “significance” by including a kind of temporal adjustment.

To see how current fame will “decay,” Skiena and Ward consult Google Ngram, a fascinating source that shows how many times various words appear in millions of books. They contend that with the use of Ngram, it is possible to create a fairly reliable model of how significance falls over time, and they adjust their findings accordingly. With respect to particular people, Ngram displays a wide range of patterns. Paul Revere, John Lennon, Malcolm X, Karl Marx, and Vincent Van Gogh became famous, or at least far more famous, posthumously. By contrast, the references to some once-celebrated historical figures have fallen precipitously; these include Arthur Wellesley (the Duke of Wellington), the explorer John Franklin, and Napoleon II. Albert Einstein shows a pretty steady increase from 1915. Babe Ruth jumps from 1915 to 1949, then starts falling until 1968, only to enjoy steady increases since that time. Woodrow Wilson shows a high point in 1942, falls until 1982, and stays level from there.

Skiena and Ward are not interested only in what happens to specific people but also in the possibility of making general predictions about likely changes over time, and thus translating their “fame” measure into one of “significance.” It turns out that famous people tend to be most discussed about sixty or seventy years after they are born, and that there is a decline from that point—but that with the most famous people, discussion is reduced later in life, and also more slowly. (Jesus is the extreme case here.) The resulting statistical model allows them to make adjustments from current fame and thus to compute not only total significance rankings (producing the top-twenty list replicated above) but also rankings within fields.

The five most significant presidents? Lincoln, Washington, Jefferson, Theodore Roosevelt, and Ulysses S. Grant. Military leaders? Napoleon, Alexander the Great, Julius Caesar, Genghis Khan, and Oliver Cromwell. The most significant economists? Adam Smith, John Stuart Mill, Thomas Malthus, John Maynard Keynes, and David Ricardo. Literary figures? Shakespeare, Dickens, Mark Twain, Edgar Allan Poe, and Voltaire. Novelists (before 1900)? Dickens, Twain, Oscar Wilde, Goethe, and Lewis Carroll. Rock-and-roll hall-of-famers? Elvis Presley, Madonna, Bob Dylan, John Lennon, and Michael Jackson. Actresses? Marilyn Monroe, Judy Garland, Katharine Hepburn, Meryl Streep, and Marlene Dietrich. Television stars? Lucille Ball, Hilary Duff, Stephen Colbert, Roger Ebert, and Jennifer Aniston.

All this is a lot of fun, and it must be acknowledged that the authors’ enthusiasm and sense of play are infectious. But there is an obvious question, and it has to do with what exactly Skiena and Ward are measuring. For all their creativity, diligence, intelligence, and good nature, Skiena and Ward have produced a pretty wacky book, one that offers an important warning about the misuses of quantification. The warning is simple but easy to overlook: when we learn to measure certain things, we have to keep in mind exactly what we have measured, and in our excitement we must avoid reckless extrapolations, suggesting that we have also measured something else.

Consider some obvious anomalies. Does it make any sense for three American presidents (Washington, Lincoln, Jefferson) to be ranked among the ten most significant human beings who ever lived? Is it plausible to think that fourteen American presidents rank among the hundred most significant? Is Franklin Delano Roosevelt a less significant president than Ulysses S. Grant? Should Oscar Wilde be ranked above Goethe (or for that matter James Joyce, who comes out pretty low, below Jules Verne and Rudyard Kipling, and not so far above Ayn Rand)? Does Madonna deserve to be ranked higher than Bob Dylan? (Dylan has a cautionary note for Skiena and Ward: “I accept chaos, I’m not sure whether it accepts me.”) Is Elvis Presley really the sixty-ninth most significant human being ever to have lived? Is Hilary Duff the second most important television star in history? (If you have no idea who she is, try her much-edited and quite long Wikipedia page.)

Or put the clear anomalies to one side. If we say that Thomas Jefferson was a more significant figure than Charles Darwin, or Elizabeth I more significant than Columbus, what precisely are we saying? If we insist that Bob Dylan is more significant than Madonna, we might mean that he is simply better than she is (which is true), or we might mean that he had a larger impact on music than she did (which is also true), or we might mean that his music has affected more people, and more people more deeply, than hers (which had better be true). It is plausible to understand significance by reference to influence (as opposed to excellence), in which case we might have a clue about what we are trying to measure.

But are Skiena and Ward measuring influence? Not at all. Influence would be akin to warp. To identify a measure such as this, we would want to focus on something other than the number of times a person’s
English-language Wikipedia page has been edited, or the length of that page. Do Skiena and Ward mean to measure fame, pure and simple? Again, not at all. Their determined efforts to adjust for “decay” over time demonstrate that their concern is not fame as such. Besides, those who are interested in fame would probably want to dispense with Wikipedia and rely instead on surveys, which could measure both name recognition and specific knowledge.

As Skiena and Ward are aware, it would make no sense to measure significance only by reference to the number of hits, page length, and number of edits; such an approach would denominate today’s popular entertainers, and other celebrities, as history’s most significant figures. Measures of that kind tell us about the interests of Wikipedia readers and editors, and knowing about those interests tells us something about popular tastes. But it does not inform us about significance.

PageRank is a bit better. There aren’t going to be a lot of links to Justin Bieber’s page, and there will be a lot more to Lincoln’s and Washington’s, and one reason is that many people, and many topics, have some kind of connection with Lincoln and Washington. But here, too, Skiena and Ward do not have anything like a good measure of the significance of historical figures. Suppose that more Wikipedia pages link to Hitler than to Darwin, Newton, Plato, and Leonardo. That tells us about Wikipedia links, but it does not reveal much about comparative significance.

Wikipedia is an immensely valuable and in some ways astonishing resource; and if the goal is to measure what interests people, it is hardly senseless to consult it. But Wikipedia itself reports that in October 2010 it had about 116,000 editors (who made at least one edit), and there is no reason to think that the interests and concerns of those 116,000 people—as measured in October 2010—are an accurate reflection of the interests and concerns of the planet’s seven billion people. As I have noted, Skiena and Ward used the English-language version of Wikipedia, but there are more than 280 other versions, and other Wikipedias would likely produce different rankings. If the goal is to learn about worldwide fame or significance, it is more than a bit strange to rely exclusively on the English-language version of Wikipedia. At most, the resulting rankings reflect only the preoccupations of the English-speaking world, and mainly the United States. Surely Jesus would not have done so well in China, to say nothing of all those American presidents. What Skiena and Ward have really done is take a particular version of Wikipedia, as of a certain day in 2010, and use a statistical model from Ngram to project changes, over time, from specific measures of Wikipedia “fame” on that day. It is a nice trick, but it doesn’t help us to rank historical figures in terms of significance.

Human beings can measure countless things. It is easy to find out which baseball player hit the most home runs, which religion has the most followers, which book sold the most copies, which president is most admired by historians, which musician sold the most records, which Web pages get the most hits, which Wikipedia pages are longest. Our ability to measure is growing exponentially. It can be fun to rank people in terms of what we are measuring. But before doing that, we should be clear on what we have measured, and we should avoid nutty extrapolations. Bishop Butler, the eighteenth-century theologian, famously cautioned that “every thing is what it is, and not another thing.” Butler does not make it onto any of Skiena’s and Ward’s lists, but they would have done better if they had kept his point in mind.

Cass R. Sunstein is the Robert Walmsley University Professor at Harvard University and the author, most recently, of Simpler: The Future of Government (Simon & Schuster). He is a contributing editor at The New Republic.

Statistically, Who's the Greatest Person in History?

Why quants can't measure historic significance