A female voice greets you by name, and you know you should feel happy that this girl, from six paces of caf√© floor space away, is making eye contact and smiling warmly. But in the time it takes for you to register the smile you're far from happy, because you know that when time unfreezes you'll be awarded nothing more than a beat, roughly 1.7 seconds of silence--0.95 to return the smile and 0.75 of acceptably-timed pause--before you have to speak, and those 1,700 milliseconds might not be enough for your now-frantic neurons to remember who the hell this girl is and why she knows your name.
"Processing Datastream" flares your brain, and you can almost see the translucent letters of a Terminator-esque head-up display begin scrolling down your retinas: on the left a list of every possible place you might've met her, Law and Order's face-matching program working furiously in the center, the progress of your smile-bent mouth muscles on the right, and in the top corner the 1,700-clock, ominously ticking to zero. A moment of neural panic, and finally the words "Kylie from PoliSci section" blink triumphantly across the display--but far too late: the clock reads -1,820, and you've already uttered that telling "Heyyy," meaning: "Hey... you."
Why did it take so long? What goes on in the brain during that HUD-endowed instant, and is it at all similar to how a computer might attempt to recognize a face? For those who like to make a good second impression, perhaps most importantly: why does facial recognition sometimes fail?
As of yet, no single theory has been able to explain every aspect of facial recognition in humans, Rakover and Cahlon warn in their book Face Recognition: Cognitive and Computational Processes, so wondering why your interactions with Kylie are now forever tainted will most likely just lead to more confusion. To pick out an acquaintance from a group of strangers is to flex a huge variety of cognitive skills, none of which is fully understood. Scientists can't even agree on Step One: image processing--mentally converting points of light to a three-dimensional object--seems like the obvious choice, but simple anecdotal evidence can throw that into doubt. Recognizing a professor on campus takes a moment, but recognizing that same professor in a strip club might take a slightly longer moment, if it occurs at all. Locating a familiar face in a group of fast-approaching joggers may take less time if the face belongs to a clingy ex. Face recognition might very well be a top-down process, but to what extent? Does the image get processed before the brain applies assumptions and memories to it, or--think of someone stooping to pick up a coin of crushed gum, thinking it an actual coin--is what a person sees determined by those assumptions?
Top-down analysis--like context, as per the lonely professor--and emotion--as with the jogging ex--factor in, but scientists don't know whether they actually precede or somehow muddle along at the same time as the bottom-up: figuring out which objects are faces, distinguishing age, sex, etc. And that's only the tip of the big melty iceberg. After the question of top-down vs. bottom-up come uncertainties about how the image of a face gets compared to the memory of a face. Some scientists swear by template matching, insisting that the brain reads a face holistically, but others counter by questioning why, then, can humans recognize someone based on limited features, in spite of changed facial hair, or even after years spent apart? Surely, the brain must process each feature individually. The riposte: if recognition is a purely Gestalt-happy process, then why does rearranging a nose, mouth, ears and eyes Photoshop-style mkae a fcae look so fcuikng wreid? Does the brain somehow template-match and feature-match at the same time? The confusion runs on and on and on, all the way to -1,820 and Kylie's dejected frown.
Some cognitive scientists hope to learn more about how humans recognize a face by trying to teach machines how to do it. Unfortunately, if cognitive science makes the problem of picking Kylie out from a crowd sound complicated, computer science takes that to new extremes.
Japanese programmer Takeo Kanade was the first to teach a computer to recognize a face with Cog Sci-inspired top-down processing: in 1973, Kanade taught his machine to compare spots of light and dark on a profile photo to those of a generic template, looking for particular features. Turk and Pentland sent the bar soaring in 1991 with their now-famous Eigenface method, which held gold-standard status for half a decade, until it was finally dethroned in 1997 by Belhumeur's shadow-resistant Fisherface method. A year later, just as Rakover and Cahlon began to test whether humans pay attention to some facial features more than others, Bartlett tried the concept out in programming, and his 1998 software improved Fisherface's accuracy even further.
But what does it mean to analyze a photo, find a face, and compare it to another face? Cognitive scientists can only guess at how this works in humans, but programmers know precisely how their machines operate. The last decade saw the emergence of a dizzying number of photo-scanning techniques: one method stacks all the pixels of an image into a 1-D column (DFT), another divides the image into 8x8 blocks (DCT), while still another looks at columns and rows individually (DWT). Pixel arrays are compared based on light intensity, √† la Eigenface. Accuracy tends to hover around 90 percent for each of these, but in 2007 researchers at the University of Nottingham-Malaysia pushed that number to 99 percent by running many of these techniques in parallel.
Hold on, though. Stacking pixels? 8x8 blocks? This can't be how the human brain does it. So if we've left let's-study-computers-to-learn-about-the-brain behind us, what's Goal #2? Is there a practical application for teaching MacBook how to know when its master has come home? Some, like John Woodward, Jr. in his issue paper Biometrics: Facing Up to Terrorism, say that automated face recognition has massive potential, as it could help fight everything from terrorism to identity theft. He outlines a plan to put what he calls "FaceCheck"--ooh, catchy--in ports, stadiums, police cruisers, train stations, banks, ATMs.... (Wasn't it the bad guys that had FaceCheck in Minority Report?)
Don't get excited (or worried) yet: as promising as 99 percent might sound in the lab, no face-scanning computer has proved quite so reliable in the outside world. On a pixel-by-pixel level, the authors of 2-D 3-D Mixed Face Recognition Schemes point out, changes in lighting, camera angle and facial expression can make Fred-while-frowning resemble Mark-while-frowning much more than Fred-while-smiling. Conditions in the lab are still a long shot away from accurately representing the visual clutter that a machine would have to sift through in an airport or some similar place, and variables can always be made less controlled to bring that shiny one-percent-'til-perfect down to something less pretty.
Researchers have taken up the challenge regardless. Some (Pedersini in 1999 and Onfrio in 2004) took example from human anatomy and gave their machines more than one eye, while others (Blanz in 2003, Bowyer in 2004, and Bronstein in 2005) did exactly the opposite, abandoning human anatomy altogether, trading cameras for lasers, 2-D photos for 3-D texture maps. In 2005, the Bronstein twins came up with a program based on the idea that faces can stretch, but only to a certain maximum; smile- and frown-resistant, their software boasted perfect accuracy even as their 30 subjects changed expression from trial to trial. A year later, Feng found a way to compensate for variations in angle by discovering that almost all human faces share 12 basic curves.
But here comes the second major issue, rowdy as always: logistics. As hardware and software become more complex, their implementation becomes less and less convenient. Rigging security cameras all around a train terminal isn't quite as costly and obtrusive as doing the same with 3-D laser scanning devices equipped with texture modeling software.
Small breakthroughs keep automated face recognition moving forward, and the list of techniques multiplies with each new idea; methods are combined, used in parallel, and improved upon. Major limitations exist, though, that might keep Feng's and Bronstein's code out of the real world for some time.
And could it be for the best? Floating in the background of all this are the ethical issues, questions of where and who and to what extent, search and seizure, surveillance, human rights. Does a person, as Woodward insists, really have no right to the privacy of his or her own face and all the information it contains? When Terminator does start to recognize that Kylie is in the airport, and she looks sad, how far should he be allowed to go with that new talent?
No heat-seeking spider-bot is finding MATT SURKA B'11 in this cold cold bathtub.