next up previous
Next: Dual adaptation space Up: The `visual memory Previous: Salient stack

A wearable face-recognizer

Many people (author included) have difficulty remembering faces. However, I have found that I can remember faces much better by using computer-induced flashbacks.

Computers are quite good at recognizing faces. Previous work is based on using a fixed camera [21][22]. The kinds of applications for this work might include video surveillance with a fixed camera and people moving through its field of view. The FBI-funded FERET project comprises a large database (more than 7000 faces) that can be searched quickly on a workstation-class system in a fraction of a second.

Computational resources, attached to a person, suggest the possibility of turning the tables on the traditional third-person perspective (such as a ceiling-mounted surveillance camera), and, instead, using face recognition from a person-worn perspective (e.g. as a memory aid to those with a visual memory disability).

A variety of implementations are possible with the current WearCam apparatus. These range from storing the database of candidate faces on the body-worn apparatus, to connecting remotely from the apparatus to the database (any database accessable from the Internet will do).

In one simple implementation, faces were captured and transmitted to a workstation-class computer, the results of the closest match being transmitted back. In order to display the identity to the wearer, an enlarged font was used (this would enable a visually handicapped person to still be able to read it). This implementation has been run in both the cyborgian (free-running) mode as well as the conscious mode. In the former, it captures images continuously and attempts to prompt the wearer with a name whenever it can. In the latter, the wearer initiates a querry. Due to the low frame-rate, the cyborgian mode was not as successful as the conscious mode, but that problem could be fixed using faster hardware.

I tried several different implementations of the capture/display configuration (e.g. having both the camera and display rotated 90 degrees, having the camera rotated 90 degrees with the display still in landscape orientation, etc). I found that the best overall configuration was to have the camera rotated 90 degrees ( portrait) but with the display still in landscape orientation, and with cursors displayed on top of the video to facilitate manual alignment of the face with the cursors (the cursors themselves being implemented as a JPEG image). My implementation of manual alignment was rather simple --- I merely waited until the person happened to be facing toward me, then centered the face on my screen by tilting my neck appropriately, and pressed the ``trigger''. The trigger was a pushbutton switch connected to the parallel port of the computer attached to my person. Such a switch may, for example, be shoe-mounted (as in the roulette computers described by Thomas Bass [10]) for unobtrusive use, or attached to a belt or waist bag within easy reach.

I experimented with a variety of different focal lengths, and found that the focal length of 11 millimeters with a 1/3inch CCD provided the most optimum tradeoff between image stability and reach. The longer focal lengths, I found, were harder to aim, while the shorter focal lengths required me to be to be so cloase as to invade the personal space of the candidate. In particular, in situations where the candicate was someone behind a physical barrier, such as the deli counter at the grocery store or the returns desk in a department store, the 11mm lens gave me enough throw to reach across such a barrier.

The simplest implementation involved using the filename to convey the identity of the subject. This required no software because it just so happened that the image display program created a very large font display of the filename while the image was being ``painted'' down the screen. Two successive frames of the video going into my right eye are shown in Fig 4. Because the subject is myself (my own face) this figure also depicts the apparatus used in the face-recognition experiments.



Figure 4: Annotated computer-induced flashbacks Two frames of video sequence entering my right eye. The identity of the face is found to be myself (here I am standing in front of a mirror hence this figure also depicts the apparatus). Note the 90 degree rotation of the image which serves two purposes: (1) to match the aspect ratio of the human face to the camera, and (2) to create a distinct dual-adaptation space. (top) The very large font is quite readable. (bottom) The text remains on the screen until covered by the last few rasters of the image which displays slowly enough that the name can be easily read before being replaced by the facial freeze-frame.

next up previous
Next: Dual adaptation space Up: The `visual memory Previous: Salient stack

Steve Mann
Wed Feb 14 01:19:59 EST 1996