next up previous
Next: Why are the faces Up: The `visual memory prosthetic' Previous: Salient stack

The `wearable face-recognizer'

Many people (author included) have difficulty remembering faces. However, the author has found that faces can be remembered much better by using computer-induced `flashbacks' (short visual stimuli -- perhaps as short as a single frame -- that either prime the memory subconsciously, or consciously but as a background or secondary task).

The wearable multimedia apparatus is easily and quickly reconfigurable, under program control. One of the many modes of operation of WearCam was such that after meeting someone for which there is a desire to form an associative memory between name and face, a picture of a person's face was presented to `flashback' 1 minute later, 2, 4, 8, 16, etc. minutes later. The author found that this repetition greatly improved the ability to remember the face. Furthermore, it was also found that it was much easier to learn the names associated with faces that had been `flashed back' periodically, even though neither the names of those faces, nor those of a control-group that were not `flashed back' were known until after the exposure phase of the experiment was completed.

Humans are quite good at recognizing faces, but computers are also good at recognizing faces. Previous work in the computer-face-recognition community is based on using a fixed camera [23][24]. Face recognition work has not previously been directed toward wearable cameras, but, instead seems more aimed at video surveillance with a fixed camera and people moving through its field of view.

An object of this paper is to propose the `wearable face recognizer'. The use of a wearable face-recognizer suggests the possibility of turning the tables on the traditional third-person perspective (such as a ceiling-mounted surveillance camera), and, instead, using face recognition from a first-person perspective. In particular, the apparatus may be used as a prosthetic device for those suffering from visual amnesia, or even those with visual impairment, who are unable to see the face (or see it clearly enough to recognize it).

In researching the best form of the `wearable face recognizer' the author tried a variety of implementations. These ranged from storing the database of candidate faces on the body-worn apparatus, to connecting remotely from the apparatus to the database (because WearCam is connected to the Internet, any database accessable from the Internet may, at least in principle, be used).

In one implementation, faces were captured using an early version of the `wearable multimedia' system, running under KA9Q's Network Operating System (NOS). Images of candidate faces were transmitted to a workstation-class computer over the inbound channel, while the name associated with the closest match was received back over the outbound channel.

In order to display the identity to the wearer, an enlarged font was used (this would enable a visually handicapped person to still be able to read it).

The simplest implementation involved assigning a new filename to the image of the candidate face to convey the identity of the subject. This required no additional software development because it just so happened that the image display program created a very large font display of the filename while the image was being ``painted'' down the screen (the early `wearable multimedia' system was quite slow at displaying pictures, providing plenty of time for the wearer to read the filename -- the identity of the face). Two successive frames of the video going into the author's right eye are shown in Fig 5.

  
Figure: Annotated computer-induced flashbacks Two frames of video sequence entering author's right eye. The identity of the face is found to be the author (in actual fact, standing in front of a mirror hence this figure also depicts the apparatus used). Note the 90 degree rotation of the image which serves two purposes: (1) to match the aspect ratio of the human face to the camera, and (2) to create a distinct dual-adaptation space. (a) The very large font is quite readable. (b) The text remains on the screen until covered by the last few rasters of the image which displays slowly enough that the name can be easily read before being replaced by the facial freeze-frame.
\begin{figure*}\figlrab{3in}{netcam/facerec_displaying.eps,width=2.5in}
{3in}{netcam/facerec_displayed.eps,width=2.5in}
\end{figure*}

The `wearable face recognizer' may be either `free running', or in `query mode'. In the former, the system captures images continuously and attempts to prompt the wearer, (inserting a name into the reality stream) whenever it can. In the latter, the wearer initiates each query.

Several different implementations of the capture/display configuration (e.g. having both the camera and display rotated 90 degrees, having the camera rotated 90 degrees with the display still in landscape orientation, etc) were tried and tested. It was found that the best overall configuration was to have the camera rotated 90 degrees (portrait) but with the display still in landscape orientation.

Improvements to the `wearable face recognizer' included providing means of alignment, using a registration template (Fig 6). This made use in the `query mode' much simpler and more precise: The author would would wait until the candidate happened to be facing toward the camera, then center the face on the computer screen by tilting the apparatus (the orientation of the apparatus, of course, can be controlled by head movements) and then press the ``trigger''. The trigger was a pushbutton switch connected to one of the eight lines on the parallel port of the computer (actually a full chording keyboard can be easily implemented by using 7 such switches).

Such a switch may, for example, be shoe-mounted (as in the roulette computers described by Thomas Bass [13]) for unobtrusive use, or attached to a belt or waist bag within easy reach.

The author experimented with a lenses having a variety of different focal lengths, and found that the focal length of 11 millimeters with a 1/3inch CCD provided the most optimum tradeoff between image stability and reach. The longer focal lengths were found to be harder to aim, while the shorter focal lengths were found to require that the apparatus be so close to the candidate as to invade the personal space of the candidate. In particular, in situations where the candidate was someone behind a physical barrier, such as the deli counter at the grocery store or the returns desk in a department store, the 11mm lens provided enough throw to reach across such a barrier.


  
Figure: Template-based wearable face-recognizer (a) As candidate approaches, an effort is made to orient the apparatus (by turning of the head) so that the candidate is centered. This is easy because the full-motion color video input stream appears on the computer screen together with the template. (b) At some point, the distance to the candidate will be such that the scale (size of the face on the image plane) will be appropriate, and, while still keeping the orientation appropriate, the match is made. (c) After the match is made, the template image drops away, revealing a radioteletype (RTTY) window behind it, upon which is displayed the desired information (for example, the name of the candidate, ``Alan Alda'', and possibly additional parameters or other relevant information).
\begin{figure*}\figlcrabc{2.33in}{netcam/alanalda_distant_facerect.eps,height=1....
...ght=1.7in}
{2.33in}{netcam/alanalda_close_rtty.eps,height=1.7in}
\end{figure*}


next up previous
Next: Why are the faces Up: The `visual memory prosthetic' Previous: Salient stack
Steve Mann
1998-09-18