On March 11, 2011, a 71-year-old fisherman named Morihisa Kanouya stood on a small hill in Ukedo, Japan. In the distance, he saw “a black wall, five to six meters high, [with white spray] above it mixed with the sky, so you couldn’t tell where the sea ended and the sky began.” He wrapped himself around a tree and held his wife in his arms. This was the Tōhoku Tsunami.
When the foam settled, Morihisa had lost his wife and broken his knee. A hydrogen explosion shook the Fukushima Dai-ichi nuclear power station. The cooling system failed. Three nuclear meltdowns followed. The effects of this disaster were slow and subtle, but it was estimated that exposure to leaked radiation will result in over 10,000 deaths. Thirty-six percent of kids in the Fukushima Prefecture experienced abnormal thyroid growths and now have an increased risk of developing thyroid cancer. When the dust settled, the finger-pointing began. The Fukushima Nuclear Accident Independent Investigation Commission had the final word: the incident was “manmade.”
The commission found a crucial error in the design of the Fukushima plant’s safety system. The system was designed to automatically shut down the plant in the event of a natural disaster. But when a 14-meter tall wave combined forces with an earthquake, all of the backup power systems were knocked out, and there was nothing left to cool the reactors as they shut down. The whole plant went up in smoke. The decision to shut down the plant was made in haste, without a sufficient understanding of the situation.
The Fukushima disaster illustrates our pervasive uncertainty about what constitutes meaningful information. Which pieces of information will only confuse the situation and which pieces will help us predict a disaster? How do we recognize false positives? The engineers of Fukushima didn’t recognize that if there was a big earthquake near the shore, there would also be a significant chance of a tsunami. But how big a tsunami? Big enough to flood the power plant and shut down the cooling system? If so, what would be the odds of a nuclear meltdown?
With three degrees of separation between the earthquake and the nuclear meltdown, there were a lot of moments when vital information could have been lost or distorted. If a nuclear meltdown had been predicted as soon the earthquake hit, the Japanese government would have brushed it off as sensationalist. Deciding what information is important is crucial, but challenging. Recall Morihisa, looking back at the foam and the sky: he could not tell one from the other, but if the location of the dividing line was just a few meters lower, it could have been the difference between a couple of damaged houses and a nuclear meltdown.
In 2014, Edward Snowden revealed the existence of the NSA’s Special Source Operations (SSO)division. Tapped into the fiber optic cables through which the whole world’s data travels, SSO can snoop on all online activity. The NSA needs to sift through this deluge of data in order to flag and inspect suspicious agents. To do so, it uses several computer programs to filter through the data, notably one called TURMOIL. TURMOIL searches through the data using a variety of “selectors”—things like geographic location, mentions of encryption software, or specific keywords—which, in turn, determine whether a person should be monitored further. The NSA’s task, then, is a reductionist one. They decide which pieces of information that a person can be reduced to, which bits can lead them to determine, with reasonable certainty, whether someone fits their criteria for suspicion. One false assumption about what “suspicious people” tend to look like and the NSA would end up with tons of false positives or missed security threats.
Powering the NSA’s attempt to effectively sort massive quantities of data is a field of computer science called information theory, which attempts to quantify information through a unit of measure called “entropy.” Say you are playing a game of 20 questions. If you are a clever 20 questions player, every question you ask brings you closer to guessing what your opponent is thinking. Your goal is to ask tricky questions to narrow down your opponent’s possibilities. The fewer the possibilities, the greater the probability that if you randomly guess among these options you will get the right answer. This is what is meant when it’s said that information reduces uncertainty.
The question is, how much does it reduce uncertainty? Finding a numerical answer to this question explains how 20 Questions works. To get a rough estimate of our uncertainty, we can ask, “how many yes or no questions do we have to ask to figure out an answer?” If we’re trying to figure out the result of a coin flip, for instance, we only need to ask one: “is the coin heads?” So we say that the coin flip problem has an entropy of one. If we’re trying to figure out the result of a rolled die, things get a little more tricky. We can ask, “is the number even?” Even if we find out the answer is yes, the result could still be two, four, or six. From there, we have to guess our way through the numbers. This way, we’ll end up asking a maximum of three questions, so rolling a die has an entropy of three.
But let’s say, for instance, that you know the die is loaded and that it is going to land on an even number and your friend does not know a thing. That means that for you to figure out what number the die lands on, you only have to ask two questions, while your friend has to ask three! All of a sudden we can quantify exactly how much you learned by knowing the die is loaded: one unit of entropy.
On computers, where everything is stored in binary digits, each digit is like an answer to a yes or no question. So if for instance we wanted a computer to store the result of a die rolled six, it would need to record that the result was an even number, and that it was not two, and that it was not four. You may be wondering: why can’t we just store whether or not the result was six, instead of ruling out two and four? Well, because you would still need to handle the other cases when it was in fact two or four. Doing so would require two additional questions, so you would end up needing to record the answer to four yes or no questions instead of three, leading to wasted space on your hard drive. An efficient representation of the value of a die on a computer would use three binary digits, where each digit is a one or a zero and represents the answer to one of the yes or no questions. That is the best you can do, since the entropy of a dice roll is three.
But computers generally have to transmit more complex data than dice rolls or coin tosses. Even something as simple as a word has a degree of entropy that is hard to define. Sure, we could just ask, “which letter in the word is next?” We would be the most uncertain if it were one of the 26 letters chosen at random. But words are not random assortments of letters—they are full of patterns. If you are playing a game of hangman and you have “inform_tion” in front of you, is it really anyone’s guess what that last letter is, or is it definitely an “a”? Isn’t telling someone that a word is “inform_tion” the same as telling them it’s “information”?
The assumption that information theory makes is yes: with enough context, you can be sure. But it is not always clear how much context is enough. Finding the right amount of context becomes a problem on computers, where there’s often very little bandwidth or storage space. Take, for example, the problem of storing music files: if you had to store perfectly accurate reproductions of songs on your iPhone, it would only hold a few hundred songs. That is why audio encodings such as MP3 toss out as much information as possible, choosing to retain only a song’s key characteristics. For example, MP3s get rid of pitches above 20,000 Hz, an octave above the highest note on the piano, because most people can’t hear that high. For some high-fidelity audio enthusiasts, though, it’s not okay to get rid of this much information. These audiophiles think that MP3s don’t faithfully represent their music, that a certain subtlety or warmth is lost when information is discarded.
This is the downside to any algorithm that tries to take advantage of information theory—what’s defined as anomalous is tossed aside. But anomalies are often the very thing that make something beautiful. Netflix, for example, shows 24 frames every second in all their videos. If a frame doesn’t reach a user in time, Netflix drops it. Will we really miss 1/24th of a second of missed footage? Watching House of Cards, perhaps not—but what if we’re a dancer trying to break down a quickly executed move? Even the filetype currently used for storing images, JPG, tosses out massive quantities of information, representing images as “wavelets” that only describe the general paintbrush strokes which approximate an image, instead of storing the image itself. Again, for most of us this is vastly advantageous, but for a few—perhaps a painter trying to glean Picasso’s brush strokes from a photo found online—it’s useless. Tech companies are constantly asking themselves “how can we cut as much information as possible?” What they rarely ask is, “who do we alienate if we cut this information?”
In this way, modern technology is consistent with essentialist philosophy. Essentialism dates as far back as Plato, whose theory of forms is one of the original examples of essentialism. Plato noted that certain patterns of structure define objects, while other characteristics are unimportant. For instance, Plato would note that almost all people think of a table as a four-legged structure with a level surface. Whether the table is made of wood or metal, he would say, is not important. In this way, Plato found the information of an object in its form, not in its material. Plato’s idea was fundamental to early philosophy.
But tech’s essentialism can quickly become stereotyping. The NSA’s SSO classifies communication in terms of a communicator’s race, gender, and geographic location in order to meet the goals of the US government’s preemptive counterterrorism demands. So aren’t they probably committing more acts of stereotyping per day than any other organization on the planet?
But the idea of using “selectors” to determine if someone is a security threat within some probability range is firmly rooted in information theory. This dubious assumption, that “selectors” like race and religion increase the probability of randomly selecting a terrorist, is the NSA’s operating principle, though it’s not entirely clear how the NSA would even obtain data to back it up—they can’t exactly test their algorithm and find out what percentage of the people they flag actually turn out to be security threats. But if there were some piece of information that had an entropy of one for determining if someone was a terrorist, then knowing that that piece of information is true for a person would make it twice as likely that they were a terrorist.
For the NSA, essentialism is the easy way out. It would be much more of a challenge for them to find a strategy that is sensitive to difference and doesn’t rely on simplistic rules. But instead, they rely on the past success of tech essentialism like that of MP3 and JPG. By applying these principles to people, the NSA’s SSO brings to light that this essentialism, this willingness to make assumptions about the border between foam and sky, can come at a terrible cost.
Dash Elhauge B‘17 & Charlie Windolf B‘17 can be stored in two apartments, provided there is solid WiFi