Oldenbourg Electronic Journals
ISSN 0944-2774 http://www.it-ti.de
it+ti - Informationstechnik und Technische Informatik,
2001,
Volume 43, Issue 02, p. 97
S. Mann
Issue 2 Volume 43 (2001) at page 97-106

Can Humans Being Clerks make Clerks be Human? - Exploring the Fundamental Difference between UbiComp and WearComp

Author: Steve Mann Dept. of Electrical and Computer Engineering, University of Toronto, Canada


ABSTRACT

This paper presents experimental results from the author's more than twenty years of wearing a computer imaging system including various information technology devices. It contextualizes and quantifies both a First Person Detriment, and a Second Person Detriment, and provides fundamental results in mitigation of Second Person Detriment through actual or apparent reduction of existentiality (percieved self determination). Within the context of an individual carrying out an action unacceptable to a representative of a larger organization (e.g. a clerk), as compared with a clerk carrying out an action unacceptable to the individual, the clerk has an obfuscatory mechanism by virtue of an actual or pretended lack of control over the situation. The overtness of the action is considered in the context of an existentiality axis (e.g. the degree to which the action appears to originate within the mind of the individual, versus remotely controlled or required), and a selectivity axis (e.g. the degree to which the action appears to be directed at the clerk, versus an action which has been sustained before approach). The primary actions explored are those of visual memory prosthesis, video capture, and personal photographic memory. The hypothesis is that the action, even when highly overt, is acceptable in inverse proportion to both these axes. Moreover, an inverse relation in both axes is confirmed for a fixed level of overtness. For example, a highly overt and highly selective image capture is accepted if existentiality is deniable. The series of experiments were conducted in several different countries around the world, in numerous different kinds of settings, primarily where photography and video capture are prohibited, and to not appear to vary substantially with countries or cultural differences. This paper is the first scientific quantitative presentation of a relationship described roughly in the author's Plenary Symposium session at Ars Electronica, entitled "Sicherheitsglaeser: Sicherheit Zuerst" in 1998. The qualatitive finding is that the use of the wearable computer for denial of existentiality (and denial of selectivity) allows the individual person to break through the organizational shell surrounding clerks, and expose their human element.

Background: The fundamental issue of Wearable Computing versus Environmental Intelligence

Recently it has become fashionable to suggest that ubiquitous computing and environmental intelligence can eliminate the need for carrying a portable computer. The reasoning is generally as follows: If we had projectors and computer screens and sensors (cameras, microphones, etc.) everywhere in the environment, you would not need to carry a computer with you because whenever you needed to do something with a computer, you could just summon the environment to help you. You would not need a keyboard or mouse, because there would be cameras and microphones everywhere to help you. All you would need to do is say for example, "Computer, show me my calendar and today's schedule!" and it would appear on one of the walls, since perhaps all of the walls would be displays.

However, the fundamental issue that separates the underlying philosophy of wearable computing from that of ubiquitous computing is not really the fact that it is wearable (the Wearability/Portability axis), but, rather the fundamental difference in the two philosophies is best captured by the Existentiality axis (Mann 1998, Proc. IEEE) See the following discussion figure:

Here are shown some examples of devices that have differing degrees of two parameters:

It is evident from this plot, that there are a large number of devices along or near the X=Y (Wearability=Existentiality) axis. Examples of outliers away from this axis are shown, but these tend to be less common in our everyday life. Therefore, we tend to think of portable (hand-held) and wearable devices as being liberating, or freedom inducing, whereas environmental technology (such as surveillance cameras) are often installed without or knowledge or consent.

Why carry a camera when there's so many cameras in the environment?

We do not need to dream about the future of someday when computers will be everywhere in the environment, because we already have a parallel ecology we can use as an analogy. We can, for example, right now in today's technology ask the same kind of question about cameras: Why bother carrying a camera since there are so many in the environment. For example, when vacationing at Disney, why bother to bring a camera because they already have cameras nearly everywhere there to watch you and make your life better. You can even purchase a picture of yourself on the rides, so why would you ever want to bother bringing your own camera?

The answer, of course, is in the existentiality axis, not the portability (wearability) axis. By bringing our own camera, we have much greater control of the picture taking process. For example, we can take a picture and own the copyright, whereas if we rely on pictures taken by others, they own the copyright in our image, even though it might be a picture of us. The real issue here is control of information, not the degree to which the apparatus is portable or wearable. However, there is an important relationship between the two axes, because if we wear (or carry) the camera, we control the data from it, but if we rely on an organization's cameras, they control the data.

Indeed, we could, as a society, take all the money we spend on our own cameras, and spend this money instead on public cameras owned by the government. If we did this, we could have our dream of cameras and microphones and computers everywhere come true, and we would never need to carry or wear our own, because there would be so many of them in the environment. However, it is the author's opinion that this would not turn out to be the utopian world we would want to live in. Indeed, it has been the author's experience that the more cameras that are installed in the environment, the less likely we are to be permitted to have our own. Establishments like gambling casinos, department stores, and government buildings with surveillance cameras often try to watch for, and prevent persons from using their own cameras to take their own pictures.

Thus the fundamental issue explored in this paper is the Existentiality axis, and its relation to the Wearability/Portability axis.

Introduction: The Reality Mediator (RM)

Over the past two decades, the author has invented, designed, and built more than a hundred different kinds of wearable computer systems, for the purposes of altering his visual perception of reality, both as a form of visual art and personal exploration, as well as for producing cybernetic photographs (e.g. as appeared in the author's solo exhibit at Night Gallery, 185 Richmond Street, in Toronto, during the summer of 1985).

What was learned from the wide range of experiences attained in inventing, designing, building, and actually wearing these machines, in a wide variety of ordinary day--to--day settings (e.g. not just in a lab), was that there are two fundamental classes of problems, quite apart from the technical feat of getting the machines to actually work. These classes of problems are (1) the effect, often undesirable, the apparatus has on the wearer in long term use, as well as (2) the effects the apparatus has on other people. The first will be called ``first--person detriment'', whereas the second will be called ``second--person detriment''.

Although there are many facets to each of these two classes of problems, such as:

However, the most interesting (in the author's opinion) and deeply philosophical fundamental aspects of each of these two detriments, which have come to light only from actually wearing the apparatus in a wide variety of real life situations for many years, are (1) the inducing of visual confusion disorder, flashbacks, etc., and (2) the strong visceral reaction others have against what are known as the personal empowerment (Mann98, Proc. IEEE) aspects of the invention.

This strong reaction (2) goes beyond merely a mild social stigma or avoidance leading to lonliness, but also includes, in the author's personal experience:

This fundamental second--person issue really centers on the authority of space, and on the ability of an individual to claim or reclaim ownership of that personal space. The most notable dimension of the second--person issue is the photographic or visual dimension. While many aspects of assertion of personal space are involved, in the author's opinion, the most powerful assertion of personal space is that of visual image capture. In particular, just as space--protection is often facilitated through video surveillance, in establishments, a personal image capture is a very strong assertion of personal space (e.g. any kind of apparatus that assists with visual memory). While it has been argued that audio capture may be a greater violation of privacy than video capture, the author's own experience is that there is a much stronger and more viseral reaction to the visual aspects. These social interactions have been explored in the author's documentary ``shootingback'' (http://wearcam.org/shootingback.html), and, indeed, even the metaphor, to ``shoot a movie'', or to ``go out on a shoot'', suggests something stronger than one can obtain with audio or other nonvisual informatic capture means.

Accordingly, the author identifies the two fundamental axes of the Reality Mediator (RM) in day--to--day life:

These more fundamental philosophical aspects are the far more important ones, especially in view of the fact that the author has invented, designed, and built systems that are completely covert and relatively comfortable to wear, and that these systems are clearly easy to manufacture. Therefore, the cumbersome and burdensome aspects of the apparatus, as well as the social stigma associated with looking strange, are moot points, once we can mass produce comfortable and covert embodiments.

It is the author's view that a Reality Mediator is of most benefit when it is worn over a long period of time, so that, as a computational framework, it leads to a constancy of user--interface. In order for there to be widespread acceptance of this apparatus, it may well need to have the appearance of ordinary eyewear (e.g. be covert).

More than just an information display

It is important to emphasize that the RM is more than just an information display, like the many eyeglass-like or goggle-like headworn displays that are commercially available. It is far more than just a TV set or word processor built into eyeglasses, although it certainly can be used to watch television and send email messages while walking around in ordinary day--to--day life.

Headworn camera with TV versus a true Eye Tap Reality Mediator

A traditional camcorder with viewfinder, held up to the eye, provides a form of personal imaging experience, whether intended or not. Visual perception of reality is altered (mediated) by the device. This alteration of reality arises from optical distortion in the viewfinder, some amount of offset between the camera's center of projection and the actual center of projection of the eye (to the extent that one cannot readily remove one's eye and locate the camera in the eye socket, exactly where the eye's normal center of projection resides), as well as other attributes of the system. Cameras with electronic viewfinders alter our perception of reality when we look through them, such as by removing or altering color, or inserting overlays of text such as the letters ``REC'', or graphics. However, what is desired is a more natural apparatus, in which the visual perception of reality is computationally altered in a controlled way. This requires a much more refined personal imaging system, such as an Eye Tap device.

Experimental methodology

It is important to the distinguish between internal validity and external validity in the proposed experiments. While it has become fashionable to constrain experiments to a lab like setting, especially in the behaviourist tradition of psychology, the author believes that this trend takes away much of the human element, and that experiments done in this way often lack applicability to the natural world as a whole. Therefore the experiments presented in this paper were done in the ecological framework of ordinary day--to--day life over the past twenty or thirty years.

There are two classes of experimental subjects, the long term subject (the author), and external subjects, primarily officials, authority figures, and the like. The long term subject (the author) represents, admittedly, a small sample size of one person, but given the length of the experiment (more than twenty years), new and insightful results were obtained. It would be unreasonable, at this point, to have a large sample population wear these devices for a twenty or thirty year time period. Thus results based on the long term subject (the author), fall under an experimental paradigm related to that of George Sratton's experiments published in 1896 and 1897 in Psychology Review, in which the experimental subject was himself.

More recently Gibson (Gibson 1972) formalized this ecological approach to the study of perception in general, in which he emphasized using available environmental information in visual perception studies. The ecological approach differs from the conventional approach to psychology in this sense of external versus internal experimental validity.

Experimental apparatus: Eye Tap devices for mediating reality

Eye Tap devices have three main parts: There are two embodiments of the aremac:
  1. one in which a focuser (such as an electronically focusable lens) tracks the focus of the camera, to reconstruct rays of diverted light in the same depth plane as imaged by the camera; and
  2. another in which the aremac has extended or infinite depth of focus so that the eye itself can focus on different objects in a scene viewed through the apparatus.

Focus tracking Eye Tap systems

This paper describes only the focus tracking embodiment of the Eye Tap system. The aremac has focus linked to the measurement system (e.g. ``camera'') focus, so that objects seen depicted on the aremac of the device appear to be at the same distance from the user of the device as the real objects so depicted. In manual focus systems the user of the device is given a focus control that simultaneously adjusts both the aremac focus and the ``camera'' focus. In automatic focus embodiments, the camera focus also controls the aremac focus. Such a linked focus gives rise to a more natural viewfinder experience, as well as reduced eyestrain. Reduced eyestrain is important because the device is intended to be worn continually.

The operation of the depth tracking aremac is shown in the Figure:

CAPTION: Focus tracking aremac: (a) with a NEARBY SUBJECT, a point P0 that would otherwise be imaged at P3 in the EYE of a user of the device is instead imaged to point P1 on the image SENSOR, because the DIVERTER diverts EYEward bound light to lens L1. When subject matter is nearby, the L1 FOCUSER moves objective lens L1 out away from the SENSOR automatically, as an automatic focus camera would. A signal from the L1 FOCUSER directs the L2 FOCUSER, by way of the FOCUS CONTROLLER, to move lens L2 outward away from the light SYNTHesizer. At the same time, an image from the SENSOR is directed through an image PROCessor, into the light SYNTHesizer. Point P2 of the display element is responsive to point P1 of the SENSOR. Likewise other points on the light SYNTHesizer are each responsive to corresponding points on the SENSOR, so that the SYNTHesizer produces a complete image for viewing through lens L2 by the EYE, after reflection off of the back side of the DIVERTER. The position of L2 is such that the EYE's own lens L3 will focus to the same distance as it would have focused in the absence of the entire device. (b) With DISTANT SUBJECT MATTER, rays of parallel light are diverted toward the SENSOR where lens L1 automatically retracts to focus these rays at point P1. When lens L1 retracts, so does lens L2, and the light SYNTHesizer ends up generating parallel rays of light that bounce off the backside of the DIVERTER. These parallel rays of light enter the EYE and cause its own lens L3 to relax to infinity, as it would have in the absence of the entire device.


Because the eye's own lens L3 experiences what it would have experienced in the absence of the apparatus, the apparatus, in effect, taps into and out of the eye, causing the eye to become both the camera and the viewfinder (display). Therefore the device is called an Eye Tap device.

In stereo versions of the proposed device, there are two cameras or measurement systems and two aremacs that each regenerate the respective outputs of the camera or measurement systems.

The apparatus is usually concealed in dark sunglasses that obstruct vision except for what the apparatus allows to pass through. Because the experimental apparatus is built to be used in ordinary day--to--day life, and not the lab, it must have an appearance of ordinary eyewear and ordinary clothing, so that the test subjects do not seem to regard it as an unusual apparatus. The experimental apparatus is shown in the Figure:

CAPTION: The author's wearable computer system (as pictured on the cover of Toronto Computes, 1999) consists of a small computer that fits in a shirt pocket, and apparatus concealed under ordinary clothing. The eyeglasses, which provide an infinite depth of focus image, have a normal (e.g. not an unusual) appearance.

Experiments in first person detriment

In addition to providing reduced eyestrain, the author has found that the Eye Tap system allows the user to capture dynamic events, such as a volleyball game, from the perspective of a participant. In order to confirm the benefits of the new device, the author has done extensive performance evaluation testing of the device as compared to wearable camera systems. A more detailed presentation of these results is the subject of another forthcoming paper. An example of one of the performance test results appears in the Fig:

CAPTION: There exists a sharp knee in the curve of frame rate versus ability to do many tasks. Many tasks required a certain minimum frame rate below which performance drops off rapidly. Eye Tap systems work better than wearable camera systems at a given frame rate. Moreover, EyeTap systems can be used at lower frame rates to obtain the same degree of performance as can be obtained with a wearable camera system operating at a higher frame rate.}

Can humans being clerks make clerks be human?

The author has designed, built, and tested various wearable computer systems that are completely covert (e.g. do not have an unusual appearance), such that image capture and documentation are possible, so in many ways, the second person detriment problem is a solved problem.

However, another aspect of the experiment is to explore, in an experimental fashion, the second person detriment.

In order to do this, the author built a variety of systems in which the overtness (degree of obviousness that a camera was present) could be varied.

The goal of this work was to set forth an hypothesis that the overtness is actually a function of the other variables, and to understand the relationships between these variables.

The test subjects were chosen from among those who appeared to show the greatest anger toward the author from earlier years of wearing the less covert (more cumbersome) variations of the apparatus. It was found previously that those persons who are part of an organization extensively using video surveillance were more likely to complain when the author had a personal safety device. Most notably, it was the representatives of surveillance regimes who most notably complained about being held accountable. Thus the experimental subjects were drawn from:

Ranking methodology

Rather than using questionairres, or asking subjects how they felt about the experiment, results were instead based on the immediate reactions of the subjects within the context of their natural environments. The reactions of subjects being photographed or videotaped without their permission was surprisingly diverse, especially given their expectations of being able to videotape or photograph the author without his permission. In all cases, to be fair, the experiments were conducted in situations and settings where the author was being videotaped or photographed by the subject, or the subject's organization. Thus the reflectionist approach (e.g. ``shooting back'' at persons already shooting) was used to ensure fair play, from an ethical standard. (Mann97, Leonardo).

A ranking scale, based on the immediate reaction of the subject to being photographed, or imaged with video apparatus, was used as follows:

Obviously, when the camera was completely hidden, it did not produce any reaction, so the axis of interest is overtness.

It is hypothesized that the overtness axis may be considered to be an independent variable, in the context of the ranking scale listed above. In particular, to reduce the dimensionality of the problem, isoscore lines (lines of constant score) passing through a multidimensional space, provide overtness as a function of other independent variables.

In this way, overtness was varied as an independent variable, while noting the effect of other concomitant variables, while noting what level of overtness as a function of one point would provide the same score as another level of overtness for another point, and so on.

This approach answered the the fundamental question as to which of the following functions of overntess and score were most important, and how they were related: