In other chapters, Yom-Tov describes findings that range from the provocative to the pedestrian. One example of the former, for instance, is his study into whether Elisabeth Kübler-Ross’s famous “five stages of grief” actually can actually be observed in some quantitatively-provable realm. To do this, his team looked for people who “suddenly developed an intense interest in a specific type of cancer,” as evidenced by their Yahoo searches, and reasonably concluded that those individuals (or their friends or family) had cancer. They then looked at the types of webpages these 20,808 people subsequently visited. They divided these pages into eleven discrete categories (using an odd form of piecework human labor available through the Internet, Amazon’s so-called “Mechanical Turk”), and then divided the categories into “hidden stages” (using mathematical modeling). Coincidentally or not, their findings conform, in some very rough quantitative sense, to Kübler-Ross’s schema: “The best model turns out to be the one that has exactly five hidden stages. This, though indirect, is one of the first quantitative pieces of evidence for a 40-year-old model.” They make some other interesting observations using this approach, for instance that people move through these “stages” differently depending on whether they are looking for information on acute or chronic forms of cancer.

There are, however, some important caveats here. First, of course, these were individuals with (presumably) new diagnoses of cancer (or their family and friends), not individuals who learned they were dying or even necessarily in grief (there is, of course, sadly some overlap). Nor are his stages the same, in a qualitative sense, as those of Kübler-Ross: They were determined by mathematical modeling, and not associated with particular human emotions. Such work therefore reveals both the power of this mode of research, and it’s limits. Who really are these individuals, and what are they actually experiencing? Web searches can only say so much.

Finally, there is also the important issue of the potential dangers of this sort of research, an issue Yom-Tov is appropriately attentive to. “The data we leave on the Web when we ‘surf,’” he admits, “are highly revealing of who we are and what we are. Anonymity of data is no guarantee of complete anonymity.” He describes, for instance, the story of how the search information of some 650,000 AOL users became publicly available for research uses. Though the data was theoretically made anonymous, by adding up the many highly personal searches of users, he notes, it became clear that it was possible to identify individuals. Though quickly taken off the web, the data was immediately duplicated, and according to Yom-Tov, it is still available online today (where it will presumably persist in perpetuity). The very permanence of such data breaches is what makes them so frightening: Once our privacy is violated, it can never truly be restored.

On a similar note, he describes the controversial “Facebook Experiment,” in which the company changed the order of users’ feeds (some saw happier posts on top, others saw sadder ones) in order to investigate how it impacted their moods. This, he notes, provoked a great deal of discussion as to the ethics of the experiment. (It wasn’t done with the typically-required institutional review board approval needed for interventional studies, for instance). What are we to make of this? My own immediate take is strong skepticism towards research that basically amounts to corporate medical experimentation. Yom-Tov, however, makes a somewhat compelling counterpoint. As he describes, all of these companies are essentially already conducting these sorts of experiments all the time, adjusting various features of their sites and examining how it affects customer satisfaction. There is little controversy in this as long as it is done strictly for the purpose of profit production, and not for research. Isn’t this a bit of an odd dual standard? “If we allow such experimentation,” he asks, “shouldn’t we allow the use of data collected from such experiments to be used for medical research?” Perhaps, but this then raises the question of whether Facebook should have been allowed to perform the experiment to begin with. After all, this isn’t simply an instance of observing people on the Internet: It’s an intervention, however minor, in their actual lives.

This matters, because it is not entirely inconceivable that such online mood interventions could have potential real world consequences. For instance, the Guardian quoted tech thinker and entrepreneur Clay Johnson, who wondered on Twitter whether a social media company could aim to alter people’s moods online so as to sway political events: “Could the CIA incite revolution in Sudan by pressuring Facebook to promote discontent?” he tweeted. “Should that be legal?” There are also health consequences to consider. Imagine, for instance, a hypothetical and more drastic version of this experiment in which hundreds of millions of users are randomized into one of two groups: In one, breaking news about violent events went straight to the top of the feed, whereas in the other, posts that had pictures of baby animals had priority. Over time, might one of these groups experience greater anxiety or depression? Could there even be a statistically demonstrable difference in suicides between the two groups? It seems unlikely, but it’s worth pondering—particularly if it were conducted on a population-level scale.

Still, I’d contend that we should be even more concerned about the potential use of these types of methods for various non-scientific purposes. For instance, this type of internet-data research is increasingly being examined as a potential tool for political surveillance. An article headlined “Military-Funded Study Predicts When You’ll Protest on Twitter,” in Defense One, a defense-focused publication, describes how one Office of Naval Research-funded study sought to figure out ways to predict whether one’s “next tweet will be part of a protest.” “The real value, according to the researchers,” the article’s author Patrick Tucker writes, “lies in predicting how big a political storm could be before it hits.” As he notes, some have seen a potential for serious abuse in such research: Such data could theoretically be used to squelch political dissent, or for other unsavory purposes.

This is, of course, the fine line we always walk with science. The Internet no doubt constitutes a massive, if messy, repository of our thoughts, neuroses, desires, and unspoken fears. It would be folly not to study it. At the same time, we must be aware of its limits—intrinsic to its aggregated, impersonal, disembodied form—while at the same time keeping a watchful eye on those who would misuse it.