The wearable multimedia apparatus is easily and quickly reconfigurable, under program control. One of the many modes of operation of WearCam was such that after meeting someone for which there is a desire to form an associative memory between name and face, a picture of a person's face was presented to `flashback' 1 minute later, 2, 4, 8, 16, etc. minutes later. The author found that this repetition greatly improved the ability to remember the face. Furthermore, it was also found that it was much easier to learn the names associated with faces that had been `flashed back' periodically, even though neither the names of those faces, nor those of a control-group that were not `flashed back' were known until after the exposure phase of the experiment was completed.
Humans are quite good at recognizing faces, but computers are also good at recognizing faces. Previous work in the computer-face-recognition community is based on using a fixed camera . Face recognition work has not previously been directed toward wearable cameras, but, instead seems more aimed at video surveillance with a fixed camera and people moving through its field of view.
An object of this paper is to propose the `wearable face recognizer'. The use of a wearable face-recognizer suggests the possibility of turning the tables on the traditional third-person perspective (such as a ceiling-mounted surveillance camera), and, instead, using face recognition from a first-person perspective. In particular, the apparatus may be used as a prosthetic device for those suffering from visual amnesia, or even those with visual impairment, who are unable to see the face (or see it clearly enough to recognize it).
In researching the best form of the `wearable face recognizer' the author tried a variety of implementations. These ranged from storing the database of candidate faces on the body-worn apparatus, to connecting remotely from the apparatus to the database (because WearCam is connected to the Internet, any database accessable from the Internet may, at least in principle, be used).
In one implementation, faces were captured using an early version of the `wearable multimedia' system, running under KA9Q's Network Operating System (NOS). Images of candidate faces were transmitted to a workstation-class computer over the inbound channel, while the name associated with the closest match was received back over the outbound channel.
In order to display the identity to the wearer, an enlarged font was used (this would enable a visually handicapped person to still be able to read it).
The simplest implementation involved assigning
a new filename to the image of the candidate face
to convey the identity of the subject.
This required no additional software development because
it just so happened that the image display program created
a very large font display of the filename while the image was
being ``painted'' down the screen (the early `wearable multimedia'
system was quite slow at displaying pictures, providing plenty
of time for the wearer to read the filename -- the identity
of the face).
Two successive frames of the video going into the author's
right eye are shown in Fig 5.
The `wearable face recognizer' may be either `free running', or in `query mode'. In the former, the system captures images continuously and attempts to prompt the wearer, (inserting a name into the reality stream) whenever it can. In the latter, the wearer initiates each query.
Several different implementations of the capture/display configuration (e.g. having both the camera and display rotated 90 degrees, having the camera rotated 90 degrees with the display still in landscape orientation, etc) were tried and tested. It was found that the best overall configuration was to have the camera rotated 90 degrees (portrait) but with the display still in landscape orientation.
Improvements to the `wearable face recognizer' included providing means of alignment, using a registration template (Fig 6). This made use in the `query mode' much simpler and more precise: The author would would wait until the candidate happened to be facing toward the camera, then center the face on the computer screen by tilting the apparatus (the orientation of the apparatus, of course, can be controlled by head movements) and then press the ``trigger''. The trigger was a pushbutton switch connected to one of the eight lines on the parallel port of the computer (actually a full chording keyboard can be easily implemented by using 7 such switches).
Such a switch may, for example, be shoe-mounted (as in the roulette computers described by Thomas Bass ) for unobtrusive use, or attached to a belt or waist bag within easy reach.
The author experimented with a lenses having a variety of different focal lengths, and found that the focal length of 11 millimeters with a 1/3inch CCD provided the most optimum tradeoff between image stability and reach. The longer focal lengths were found to be harder to aim, while the shorter focal lengths were found to require that the apparatus be so close to the candidate as to invade the personal space of the candidate. In particular, in situations where the candidate was someone behind a physical barrier, such as the deli counter at the grocery store or the returns desk in a department store, the 11mm lens provided enough throw to reach across such a barrier.