There will be no edges, but curves.
-Tracy K. Smith
The black box is a metaphor for a device whose inner workings are illegible. Given an input, a black box provides only an output without any insight of how it arrived there. This can happen in as small of a scale as a calculator, when someone inputs “1+1” and receives “2,” or a machine learning network, when someone inputs their face and the algorithm classifies them as a “convict.” In this way, describing a system as a black box is an admission of surrender. Sometimes knowledge is surrendered for a practical reason: Why would I need to see thousands of lines of code to order an Uber? Sometimes it is for a legal reason: Uber doesn’t want Lyft to steal their code. However, what happens when what has been blackboxed works against the interest of the user? How can what has been hidden be revealed?
One day last September, dozens of selfies framed with a green border and a small line of typifications flooded my Instagram feed. Captions ranged from the innocuous (actor, drawer, goofball) to the far less innocuous (first offender, drug dealer, rape suspect). Deliberately provocative, Trevor Paglen developed ImageNet Roulette “to help us see into the ways that humans are classified in machine learning systems.” Inputting a selfie or photo into Paglen’s app led to an output of textual labels that, according to the algorithm, reflected the visual. How these systems worked—how Roulette classified myself as “divorced man”—was unclear. But understanding wasn’t requisite for having a response. True to the name Roulette, the system's outputs were unexpected—finding the most absurd became a game. To paraphrase Will Ferrell, “no one knows what it means, but it’s provocative.”
An example of an ImageNet Roulette output.
It is said that before the flood, Noah had to identify every living creature before taking them on board his arc. Eons later, somewhere in Palo Alto, researchers were picking up where the bible left off, organizing the “entire world of objects” in a neat taxonomy. Stanford Professor Fei-Fei Li built the database ImageNet in 2009 by sorting 14 million images into 20,000 categories. Her hope was that the sheer quantity of images and classifications, inputs and outputs, would allow a computer to infer connections between the two, identifying entirely new pictures. Due to its breadth, the dataset became a sort of canon for machine learning engineers, who built programs just like ImageNet Roulette that could identify objects in images with 97.3 percent accuracy. But just as Noah left behind the unicorn in his grand taxonomy, a significant subset of ImageNet’s categories had to be taken out to achieve its accuracy: people.
In their initial classifications, Li’s researchers differentiated between cops and convicts just as they did apples and oranges. The only problem, of course, was that social categories couldn’t be essentialized into distinct physical features. A researcher’s assumption that a face is that of a drug dealer comes more from social biases than any visual characteristics. Since these biases are then written into any image recognition system that uses ImageNet as a reference, many engineers were hesitant to train their networks on this dataset. Trevor Paglen built ImageNet Roulette to illuminate a dataset he saw as problematic, to show how inequalities can be reproduced in data that purports to be objective. While this subset has since been removed from ImageNet, there still remains the problem of the black box—how a machine learning network is inherently non-illuminable.
Machine learning makes the metaphor of the black box much harder to visualize by granting a device the ability to “learn.” In many cases, a black box has no need to change—one plus one will always equal two. Machine learning, however, seeks to solve problems that don’t have concrete answers. When the computer guesses the wrong answer, it updates its architecture until reaching a threshold of certainty. If calibrated successfully, a machine learning system can take an input it has never encountered before and form an accurate output. However, even after calibration, the system continues to adjust its architecture to the user, meaning the same input can yield two distinct outputs over time. This is revelatory, but deeply disorienting. Unlike calculators, these black boxes can no longer be delineated by the edges of its knowledge; they mutate with every input.
If I asked you how to recognize a llama, you might bring up its fluffy fur and silly body. Now, ask a computer to recognize a llama, and it becomes a bit more difficult. Instead of looking for fur, computer often use Convolutional Neural Networks to recognize images by looking for patterns between pixel values after applying a batch of filters. These patterns are illegible to the human eye and impossible to calculate by hand. Even the language behind Convolutional Neural Networks reflects the algorithm’s inherent unknowability: the network’s “hidden layers” “convolute” an image until it can be recognized. While these convoluted images are often blackboxed, it’s a fruitful exercise to observe what an image looks like after convolution. Can you identify which parts of the image corresponds to what filter?
An input image of a llama.
9 of 64 filters after five “hidden” layers of convolution.
Perhaps the yellow dot in the top middle filter corresponds to the llama’s mouth? There’s no wrong answer here—in fact, there’s no right answer. After so many hundreds of thousands of tiny acts of multiplication, it’s impossible to connect a convoluted image to its source. The legibility of these filters is of no practical concern; all that matters is that a machine learning network can, within a certain threshold of accuracy, use them to identify a llama. This time, there are no stakes—we definitively know what is and isn’t a llama and can choose to ignore the network if it gets it wrong. However, what happens when a computer is asked a question for which we don’t have an answer? How could we follow its reasoning? It’s not enough to be transparent; strip away the face of the black box and there is nothing.
Kubrick’s 2001: A Space Odyssey stars both a literal and metaphorical black box, the soft spoken Artificial Intelligence HAL, famous for having a “perfect operational record.” The axiom of his absolute accuracy engenders a crisis of faith as the computer commands his astronauts to act against their own well-being. HAL provides much more of a prescient model of the threat posed by machine learning than Hollywood’s more common destructive robots, as essayist Meghan O’Gieblyn observes, “While Elon Musk and Bill Gates maunder on about AI nightmare scenarios—self-replication, the singularity—the smartest critics recognize that the immediate threat these machines pose is not existential but epistemological.” She looks to the 2017 defeat of a championship Go player by an artificial intelligence, in what is described as “not a human move,” as proof the Enlightenment has reached its twilight. She writes, “Deep learning has revealed a truth that we’ve long tried to outrun: our human minds cannot fathom the full complexity of our world...If machines understand reality better than we can, what can we do but submit to their mysterious wisdom?”
This very logic is pitted against HAL’s astronauts upon disobeying it—how can a human know better than all-knowing computer? Unable to glimpse into Hal’s black box, the astronauts can only decide on faith alone. However, one needn’t look to science fiction to watch this play out; every decision made by a machine learning algorithm mandates an act of submission. A self-driving car wants to turn left, so its passengers acquiesce with faith that the car won’t kill them. A criminal justice deep learning system thinks a convict is more likely of recidivism, so a judge defers the length of their sentence to a computer. Here, what has been blackboxed necessitates a level of trust that can only be described as theological: We don’t concretely know that the computer knows any better than us, and we certainly can’t find any answers in its algorithms, but we still submit to its authority. O’Gieblyn cites the prophet Job, demanding an explanation for God’s cruelty only to realize God contains “Things too wonderful for me,” as analogous to making sense of the black box. Again, Kubrick’s hero, upon looking into the black monolith: “My God, it’s full of stars.”
So, what happens next? O’Gieblyn turns to Dostoevsky, who writes in the The Brothers Karamazov of a Grand Inquisitor “creat(ing) a system of childlike rituals—confession, indulgences—by which the laity can ascertain their salvation ... And they will be glad to believe our answer… for it will save them from the great anxiety and terrible agony they endure at present in making a free decision for themselves.” In her timeline, the authority of God, or a computer algorithm, does not arrive as an oppressor; we invite a system of faith into our lives as antidote to our great anxiety. Certainly, it comes at a loss of agency, so such a system demands its subjects trust it will work in their interest. But what happens when the algorithm is oppressive? There’s a distinction, certainly, between a god that writes my Discover Weekly on Spotify and a god that assigns prison sentences. How, in those cases, do we hold accountable what we can barely understand?
ImageNet Roulette was radically successful in its goal, but perhaps not in the way Paglen intended. Suppose O’Gieblyn is right, that the public will be unable to comprehend the deity-like black box without the help of a Grand Inquisitor. What Paglen’s project did so masterfully was make interacting with this box a game. He invited users to participate in the network, to input their faces and be outraged at the output. The green bordered images were perfect for Instagram virality, spreading exponentially across vectors of social media. Again, understanding wasn’t requisite for having a response.
As more and more decisions are sequestered into the black box, I introduce an alternative model of knowledge: the black ball. The black ball encourages users to play with it, to experiment with different inputs and find what they yield. Both metaphors assume that whatever it contains will be incomprehensible; this is not a solution, but a way to work around the unknowable. Whereas much of what is blackboxed only gets discussed in exclusive circles (up until Paglen’s piece, imageNET was only known by data engineers and academics), the black ball would make this technology accessible—it would make experimenting with it fun. We still can’t see it, but we know what it does. By approaching machine learning as an artist, Paglen provides a model for how outsiders can reappropriate technology, to interrogate it and hold it accountable. These outsiders, after all, have the greatest stake in the risks concealed within the black box.
There already exists technology to generate articles indistinguishable from actual news, to create video footage of people saying things they never did. The response from the designers of this technology has been to keep it away from the general public such that it doesn’t get in the wrong hands, but how will this public identify it when it does? Certainly, there are people who unequivocally shouldn’t have such technology, who would use it to harm others, but keeping something private doesn’t preclude them from reaching it. Often times, tech companies will sell directly to these entities, such as in 2014 when Palantir developed a digital profiling algorithm for Immigrations and Customs Enforcement. The black ball holds technology accountable by inserting its effects in the popular discourse. This approach comes with risks, sure, but with education arises the possibility of subversion. Indeed, if peeling away the face of the black box will not be possible going forward, we can find agency through interrogating its edges.
ANDY RICKERT B’21 : person, individual, someone, somebody, somebody mortal, soul > simpleton, simple > fool, sap, saphead, muggins, tomfool > flibbertigibbet, foolish man.