THE COLLEGE HILL INDEPENDENT


Hard Coding

DNA, digital storage, and the matter of memory

by Jonah Max

Illustration by Isabelle Rea

published September 15, 2017


The world is beset by a global hunger for memory. SD memory cards, solid state drives, CD-ROMs, reels of magnetic tape—these devices drape the crust of the earth in networks of stable and temporary memory, orbit at the very edge of our exosphere in satellites, producing and holding fast to data we deem at once to be both vital and superfluous. By the stroke of the same machinery we use to safeguard climate readings recorded at the tip of the planet, we seamlessly archive our mobile boarding passes and the disappointing slideshows of images captured along the way. We most often save these files to a digital cloud, and often only with the knowledge that we will most certainly never look at any of this junk again.

And yet, behind these processes of nearly mindless digital production and storage lies explicitly material practices which ensnare labor, geology, and the energy gradients which drive the Earth, leaving in their wake unscalable mounds of waste. The rare earth mining operations at the Baotou Steel Company on the border of Mongolia, for instance, expose workers to toxic sulphates and ammonia, flooding the surrounding waters with hydrochloric acid as the corporation bores trenches deep into the Earth to retrieve the neodymium magnets necessary for consumer hard drive production. At the Agbogbloshie e-waste facility in Accra along the coast of Ghana, migrant laborers from the north spark dangerous chemical fires in an effort to strip copper from electronic devices deemed obsolete by largely Western markets. Often these minerals re-enter the global market, finding new use in updated versions of their ghostly selves. In light of these treacherous operations, notions of immaterial “data mining” appreciate an explicitly geological, material, and exploitative hue—appearing more depressive and entropic than cyclic and self-sustaining. 

As media theorist Jussi Parikka claims, where the contemporary logic of digital production finds in the Global South a source of heat and labor, it is within the Global North that it finds the cold. As rare earth minerals—now transformed into hard drives and flash drives—ferry towards Europe and North America, they find their home in storage facilities and data centers, often located near or atop permafrost, frozen layers of soil and rock beneath the Earth’s surface. Here deionized water sprays, arrays of industrial fans, and complex cooling ducts ensure that data rests in a near-frozen state, momentarily stilled against the inevitable process of erosion. While popular images of these centers present them as sleek, textureless, and unpopulated  zones of pure information, they too rely on highly specific, localized geological conditions and finite resources, chiefly that of coolness. From the inside, contemporary data centers may look little like traditional factories, but from across the yard, plumes of steam still pour from their stacks, releasing the pent up heat which would otherwise antagonize data’s desire for coldness.

Haunting this entire transnational operation of memory production and storage, however, is the illusion of indefinite continuity. How long can we imagine that conditions remain like this? How long until the water at Baotou is wholly undrinkable? How many years until the Alaskan permafrost melts for good? What timeline can an e-waste site like the one in Ghana reasonably expect? According to a study conducted at Boise State University last spring, even the supply of silicon, an exceedingly common element essential to memory storage, will be outpaced by its global demand as soon as 2040. This systematic excavation of the Earth’s crust and its transformation into consumer electronic goods proceeds along a temporality that artist Robert Smithson deems “fluvial entropy…where everything is gradually wearing down”—an “irreversible process” which cannot be escaped, only changed into new forms.

+++ 

In part stoked by these fears of material depletion, computer scientists and biologists have sought a familiar new home for our digital memories: DNA. Whereas a top-of-the-line hard drive might offer a terabyte or two of data storage and require often hundreds of grams of rare earth metals like neodymium, cerium, yttrium, and ytterbium (as well as various aluminum casings, voice coil actuators, stepper motor actuators, the list goes on), synthetic DNA can hold nearly 215 petabytes (215,000 terabytes) of data in a single gram. Beyond its extraordinary material efficiency, DNA is startlingly resilient—lasting on the magnitude of millions of years. In comparison, a hard drive might last us a decade if we’re lucky. A brief glance at contemporary schlock like CSI: Miami or NCIS, with their unending discoveries of forensic evidence, serves as a quick reminder. Our DNA, biology’s fundamental and nearly universal storage mechanism, is all but destined to outlive us, our rising tides, our depleting ozone, and perhaps for a handful of our species, even the heat death of our sun itself. 

This nearly transcendent quality of our most essential and intimate information, however, does not evade all concerns, or even the most basic material ones that long troubled more conventional forms of data storage. Rather, it seems to abstract and exacerbate them. For practical storage, DNA still lives best in the cold: artificially chilled data centers will be replaced with cryogenic chest freezers—the already frosty temperatures now turned down to an icy negative 150 degrees Celsius. Rare earth mineral excavation will continue, now only to provide materials for the apparatuses of a new sort of cultivation entirely, oligonucleotide farming, where organic chemists tediously string together nucleic acids in solid-state synthesis. And the crude exploitative practices of electronic waste management will be further distanced behind the sterilized, white-washed walls of the laboratory, which carries its own politics and power structures. As data storage travels further into abstraction, anthropologist Bruno Latour, writing in Laboratory Life, reminds us that even the modern laboratory carries with it the distorted essence of the factory. A lab relies on the mass production of scientific apparatuses and the adoption of factory-borne bureaucracies and hierarchies to produce its own “material goods,” chiefly scientific writing. That is, though the laboratory may isolate itself from the punch card, the assembly line, and the smoke stack, it quietly leans on these systems of order, repetition, and waste to produce abstract commodities for publication and patents, not fundamental truths.

+++

Cast in this light, DNA data storage looks less like some return to the founding ingenuity of biology or the facts of life, and more as a fresh silvery skein to wrap around traditional modes of technological production. At the rudimentary level, this new DNA-based storage will presumably be employed to store the same old junk we cast to the dusty archives in the first place. This is not to claim, however, that nothing changes here in the laboratory, and that the accompanying troubles of DNA storage map seamlessly to our present difficulties with hard drives and magnetic tape. First, and perhaps most importantly, whereas old world data storage endlessly worked against the problem of quantity (diminishing reserves of silicon, increased desire for minimizing data’s cost of materials and space), DNA’s material quandaries circle around the stability of the material itself. To begin to appreciate this problem, it is first necessary to understand exactly how biologists and computer scientists write digital information to DNA.

As all computer languages are built atop binary strings, any scrap of digital information—be it a .pdf file, a .jpeg, a full-length film, or an operating system—can be represented as a sequence of 0’s and 1’s. From here the operator can divide this sequence into pairs of 00, 01, 11, and 10, which will in turn be mapped to adenine, thymine, guanine, and cytosine base pairs—the building blocks of DNA. Once a strand of DNA is synthesized according to the binary sequence of the original digital file, a sequencer must simply read the chain of base pairs to reconstruct the binary file, and in turn, restore the original document. Though this mapping procedure appears elegant in its simplicity, the frailty of the material itself can manifest in unfortunate ways. Chiefly, as binary strings often contain a high degree of repetition (0’s followed by 0’s followed by 0’s), the synthesizer is asked to construct what amounts to a run of homopolymers (in this case, we can imagine [adenine-adenine-adenine]), a chain of base pairs which threaten the structural integrity of the DNA and can cause the entire strand to collapse. Moreover, in its current state, this relatively new technology of synthetic DNA is equally prone to error even when given a presumably stable sequence of base pairs—nucleotides mismatch, fail to traverse the synthesis channel, or lose their correct position in the ordered chain. Last year, in an effort to circumvent these difficulties, a team of scientists at Columbia began employing a procedure called fountain coding, creating quasi-randomized arrangements of pairs from the original binary string such that, given a healthy surplus of pairings, homopolymer chains are identified and discarded before synthesis. This way, an entire file can be successfully sequenced even if particular base pairs are lost in the transmission. Though computer scientists have long toiled with the problem of transferring data through noisy, leaky channels, the finicky material nature of DNA has impelled the field to move beyond simple error-correcting code and traditional erasure procedures.

+++

Although even greater troubles still plague the practice of writing information to and reading from DNA, particularly when it comes to its prohibitive cost, the promises of such a system warrant a degree of consideration. Writing for Science, Robert Service notes the promised scale and efficiency of bio-storage could replace the “giant Facebook and Amazon data centers” with “a couple of pickup trucks of DNA.” Moreover, biologists intrigued by DNA’s potential have begun to employ the service of other organic actors to help create more robust bio-infrastructures for digital storage. Leaning on bacteria’s capacity to expand their genome through virus acquisition, scientists at Harvard have successfully transferred files between E. coli bacteria—that is, a week after successfully infecting E. coli with one of the earliest pieces of animation ever created, footage of Eadweard Muybridge’s galloping horse. The scientists were then able to retrieve the very same horse footage from the entire population of E. coli as the DNA sequence had grown, divided, and repopulated repeatedly over the week. This alternate vision of file distribution and document backup, if scalable, would render our current assemblages of remote servers, external hard drives, and USB sticks quickly obsolete. 

While a future of chilled petri dishes and ever-expansive memory may seem nearly utopian to some data scientists, others working in the field of DNA storage have postulated much darker futures. This summer, scientists and the University of Washington successfully synthesized a DNA strand with malware, designed to exploit the computer which sequences the strand. Much like a malicious email attachment or phishing site which silently hijacks a personal computer, this DNA malware could seamlessly take control over the receiver’s computer. If one were to send malware-ridden DNA to a popular genome-sequencing corporation such as Ancestry or 23andMe, simply reading the strand could offer one surreptitious access to thousands of private medical records. While little, if any, of this is realizable today, it speaks to a potential future one might glimpse in sci-fi fare like Johnny Mnemonic, where transnational corporations (and organized crime) fight for control of a DNA courier, a man whose brain itself has been synthetically implanted with sensitive files locked in his own DNA. Before any of these futures, be they utopian or apocalyptic, can be realized, corporations have already made their first move. In May of this year, Microsoft announced that in the coming years, users will have the ability to relocate their cloud storage to DNA storage facilities.

JONAH MAX B’18 has never seen Gattaca.