michael naimark
SPIE/SPSE Electronic Imaging Proceedings, vol. 1457,
San Jose, 1991

Elements of Realspace Imaging: A Proposed Taxonomy

Michael Naimark

independent media artist
San Francisco


Along with the marriage of motion pictures and computers has come an increasing interest in making images appear to have a greater degree of realness or presence, which I call "realspace imaging." Such topics as high definition television, 3D, fisheye lenses, surrogate travel, and "cyberspace" reflect such interest. These topics are usually piled together and are unparsable, with the implicit assumptions that "the more resolution, the more presence" and "the more presence, the better." This paper proposes a taxonomy of the elements of realspace imaging. The taxonomy is organized around six sections: 1) monoscopic imaging, 2) stereoscopic imaging , 3) multiscopic imaging, 4) panoramics, 5) surrogate travel, and 6) realtime imaging.


Realspace imaging is the process of recording and displaying sensory information indistinguishable from unmediated reality. Imagine looking at a framed image as if it were a window. Fooling the eye into believing the image is real is a difficult task. Fooling two eyes is even more difficult. Fooling two eyes while allowing freedom of head motion is yet more difficult. Now imagine removing the frame and having the freedom to look around. Now imagine having the freedom to move around. Now add time-based phenomena such as motion and sound. These are the elements of realspace imaging proposed.


Monoscopic images represent one single point of view.

1.1. Orthoscopy (Scale)

Orthoscopic images, images viewed in proper scale, require the viewer see from the same angle of view as that of the camera lens. Orthoscopically correct images must be viewed from only one single point of view, hence virtually every image we see is non-orthoscopic. In addition to scale change, off-axis viewing results in trapezoidal distortion. We are rarely cognizant that we are looking at a trapezoid when we sit to the side of a movie theater, though it has been shown that additional cognitive processing is required to "straighten it out."1

1.2. Spatial Resolution and Color

Spatial resolution is possibly the most-discussed aspect of realspace imaging. Today, the images we see on television and in movie theaters are recorded on a wide variety of formats whose principal difference is spatial resolution (and color, partly a subset of spatial resolution and partly a subset of recording and display technology). The current dilemma over "how much is enough?" has been additionally fueled by standardization arguments for high definition television (HDTV). To some, the research is disheartening: a display with a 45° by 90° field of view would ideally require a 3000 by 6000 pixel display, many times the resolution of even present HDTV standards.2

1.3. Dynamic Range and Brightness

Dynamic range is the span from the whitest whites to the blackest blacks in visual recording and display. The eye has a broad dynamic range allowing us to see bright outdoor scenes and shadow detail simultaneously. Film has less dynamic range than the eye. Scenes shot in film must be carefully lit to "squeeze" the dynamic range into films' limits, such as "fill lighting" the shaded areas more and "key lighting" the bright areas less. Video has even less dynamic range than film, requiring more complicated lighting to achieve the same effect as film. Film shot and transferred to video results in a greater dynamic range than video-originated material, which is one reason why many cinematographers prefer the "film look" (another reason is frame rate, see below).

1.4. Spatial Consistancy and Spatialization

Many kinds of lower resolution images appear acceptable, particularly if the noise or artifacts have realworld analogs, such as looking at the world through a veil (or a fan blade). But when the styles of resolution are inconsistant with each other in the same image, it looks "wrong." Steve Yelick once referred to this as the "Gumby in Manhattan" problem.

Similarly an overlay can be "spatialized" by making it appear as a contiguous part of the image, where lighting and shadow, scale, and synchronous movement must correspond with the background. The motion picture special effects industry has long been aware of such importance, and the so-called "virtual reality" field is currently realizing the need for spatializing data. A videodisc-based example of spatializing data was recently produced by the Apple Multimedia Lab, where electronic graphics were inserted along a filmed desert highway to teach scale to schoolchildren .3

1.5. Monoscopic Depth Cues

1.5.1. Content Cues

There are several monoscopic cues to depth perception which are based on image content, such as perspective, overlapping or occlusion, aerial perspective or atmospherics, light and shade, and textural gradient.4 It is noteworthy that these cues are automatically captured with a camera but must be addressed explicitly with computer-generated imagery, where these factors have historically been of great concern.

1.5.2. Accomodation (Focus)

With one eye open and a fixed head position, a prominent depth cue is accomodation, the focussing of the eye's lense by the surrounding muscles. It is similar to focussing the lense of a camera. There are two ways to determine depth from focussing a camera. One is to focus on an object and read the calibrated focus setting. We sense our eye muscles in a similar way when we change focus. The other way to determine depth is by the amount of blur that exists for objects out-of-focus, which is partly a function of distance and partly a function of brightness. Obtaining depth data in an image by comparing two samples of blurriness has been demonstrated .5

Accomodation discrepancies are often prominent while viewing landscape images, where the eyes should be focussed on infinity (and where parallax diminishes to near zero). One method of achieving "infinity focus" is simply to project farfield imagery far away on a large screen, but a large space is requried. Another technique, common in flight simulators, uses a large concave mirror which magnifies and refocusses the image from a small monitor. But even with this large mirror, the effective viewing area is small, enough for only one person. Smaller and less expensive optics can be used instead of a mirror if the viewer is further restricted to an even smaller viewing area, like a peephole.

Similarly, a concave mirror may be used for nearfield imaging by projecting a real image in front of the mirror's surface. The "floating nickel" illusion and video "genies" hovering in space are popular examples.


Stereoscopic images represent two single points of view, one for each eye, separated to give a noticable lateral displacement, or parallax. Parallax is often erroneously pitched as all that is necessary for depth in imagery (the stereoscopic movies of the 1950s were simply labelled "3D"). There is no easy way to record and reproduce parallax. First, two simultaneous views must be recorded, with care taken for proper convergence and disparity. Then, each view must be seen exclusively by each eye.

Stereoscopic photography has a lively history dating back at least to Wheatstone's invention of the stereoscope in 1833.6 The most popular techniques require glasses to be worn (such as anaglyphic, polarized, or shutter). Methods not requiring glasses usually require the head to be held in a particular position (using mirrors, peepholes, or lenticular screens, for example).


Multiscopic imaging represents multiple points of view: lateral head movement while the body is more-or-less still. The result is local motion parallax (global motion parallax equals travel). Local motion parallax is a stronger depth cue than stereoscopic parallax because more than two points of view can be seen in a relatively short period of time.

That local motion parallax occurs when we rotate our head is a wonderful evolutionary feature of humans (and most other animals) because our eyes are displaced from our neck's axis of rotation. If our eyes were on the neck's axis of rotation (like a camera mounted on a tripod), there would be no lateral displacement when we turn our head and therefore no parallax.

3.1. Mirrors

A mirror is multiscopic by nature. When we view ourselves in a mirror, each eye is seeing a different point of view, so we see stereoscopically. But also when we move our head, we see correspondingly different points of view.

An early technique for achieving multiscopic images required a giant half-silvered mirror with which to reflect hidden images and props over an actual stage set. Like normal mirrors, both parallax and accomodation are preserved, but since it is half-silvered, the reflected imagery appears transparent, making it great for ghosts but not much else. Such giant half-silvered mirrors date back to the Phantasmagoria shows of the 18th century and have been popularized by the ballroom of Disney's Haunted Mansion. Similar examples, but where the 3D floating images were of 2D film and video screens (and if done cleverly appears 3D), were popular attractions at the last three Worlds' Fairs: the GM "Spirit Lodge" (EXPO '86, Vancouver), the Australian Pavillion (EXPO '88, Brisbane), and the Ginko, Gas, and Mitsui Toshiba pavillions (EXPO '90, Osaka).

A technique for producing small multiscopic images employs a flexible vibrating mirror to rapidly change focal lengths. This varifocal mirror reflects a video display whose image is in sync with the vibration, resulting in a relatively small volumetric display.7Since the video must be from a computer-generated 3D model, direct display of camera-originated images is not possible. Indeed, no such camera exists.

3.2. Relief Projection

Another multiscopic technique is sometimes called relief projection, where an image is projected onto a screen whose shape physically matches the image itself. Historically, the most popular application of relief projection are "talking heads," where a mask is made of a person's face to be used as the projection screen. The person is filmed with their head totally motionless but their mouth and eyes moving. The film is projected onto the facemask screen, with careful alignment such that the eyes fall in the eye sockets, the mouth along the mouth line, etc. The illusion is very powerful, and the fact that the image of the eyes and mouth move and the screen does not is barely noticable. The talking head in Disney's Haunted Mansion uses this technique. A more advanced version, where the mask screen moved in sync with the image, was produced at MIT in 1980 . 8 The author has produced room-sized relief projection by painting entire stage sets white after filming them and projecting the original image back on the white-painted surfaces.9 The limit of relief projection, of course, is that the shape of the screen cannot easily change.

One method of making a more flexible relief projection display is to rapidly spin a disc or corkscrew-shaped screen while projecting on it with synchronized lasers.10The result is a volumetric display (usually inside a clear housing for safety) whose size, detail, and flicker rate are related to the computational horespower and the mechanics of the system. And like varifocal mirror displays, all the imagery must come from a 3D computer model rather than directly from a camera.

3.3. Holography

Holography achieves both parallax and accomodation, but is a filmic medium (with the extremely significant exception of Benton's most recent work at the MIT Media Lab). Being film-based has it implications. It cannot be transmitted live like video. Also, it is near-impossible for any kind of computer control and interactivity.

Another popular misconception about holography is its "projection." The holographic image can only be seen while viewing through the film: it may appear behind the film, in front of the film, or both, but one must always be looking through the film. The concept of both the audience and the hologram being on the ground and a holographic image "projected" in the sky is simply inaccurate.

"Stereograms" (or integrams) are holograms made from filmed or computer-generated material, where images are recorded from multiple points of view along a single straight or curved track. If the material is shot with a single moving camera, any motion counteracts the stereoscopy. Real holographic movies, though demonstrated, require massive amounts of holographic film and even still offer very limited viewing.

3.4 . Viewpoint-Dependent Imaging

Local motion parallax is possible with a conventional display if driven by the user's head position. An example was produced at MIT, where an outdoor scene was shot laterally and mastered on videodisc. The videodisc speed and direction was controlled by a single user wearing a position-tracking device on his head. As the user sways back and forth, the video image changes correspondingly.11 Because such a display is interactive, it is limited to one single user. Though trivial with a virtual camera, recording with a real camera becomes increasingly difficult when more than one dimension is shot (allowing the user to sway back and forth, in and out, and up and down simultaneously). And like other single camera applications for multiple points of view, time artifacts (motion) counteracts the multiscopic effect.


Panoramics is the ability to look around. An image is considered panoramic as it approaches framelessness, when the image is larger than the viewer's field of vision. When this occurs, there is a sense of immersion, of being inside rather than outside looking in. Panoramic imagery allows freedom of angular movement.

4.1. Rectilinear Perspective

Rectilinear perspective is shot on flat film and displayed on a flat surface. When viewing rectilinear perspective images off-axis, the frame will appear trapezoidal, but straight lines will always appear straight. Practically every camera we have ever seen or used and practically every image we've ever seen or made, has been of rectilinear perspective.

Because of their flat nature, all rectilinear perspective images must have less than a 180° angle of view, and therefore full panoramic construction requires mutiple images. For example, MIT's Aspen Moviemap was shot with four 16mm cameras with slightly less than 90° lenses, pointing front, back, left, and right. For these images to be viewed together properly, one must stand in the center of a four-walled projection space, otherwise trapezoidal distortion will occur. The four images, when laid flat, will exhibit discontinuities which can be computer-corrected by "undistorting" them linearly.12

4.2. Cylindrical Perspective

Cylindrical perspective is shot on cylindrically-positioned film and displayed on a cylindrical surface. Unlike rectilinear perspective, only one cylindrical perspective image is required for a 360° panorama, and since it is a single image, there are no discontinuities. Cameras with rotating slits and lenses (such as the Widelux, Hulcher, Globus, and Roundshot cameras) can shoot a single image over a relatively short period of time.

Optimal viewing is from the center of the cylinder, like being inside a large lampshade. When the viewer is off-axis, straight horizontal lines appear curved, while straight vertical lines remain straight. The distortion can be "dewarped" with a computer by non-linear correction in one dimension and linear correction in the other dimernsion for a flat display.

4.3. Spherical Perspective

Spherical perspective is shot with spherical optics (such as fisheye lenses) and displayed on a spherical surface. Optimal viewing is from the center of the sphere, like being inside a large dome or an Omnimax theater. When the viewer is off-axis, straight lines in both dimensions will appear curved.

Spherical recording is most often associated with fisheye lenses, but other such specialty lenses exist. For example, the Peri-Appolar lens made by Volpi was used for the Aspen Moviemap. It produces a donut-shaped image representing a 360° azimuth by ±30° elevation, centered on the horizon rather than on the zenith when pointing upward. Shooting off convex mirrors also produces spherical perspective (a Legg's pantyhose package is a favorite), but the camera will be visible in the middle of the frame.

Spherical perspective images can capture the most in a single shot, but flat viewing results in distortion. The distortion can be "dewarped" with a computer by non-linear correction in both dimensions.13

4.4. Substituting Interactivity for Wholeness

For each type of perspective, it is possible to store the entire panoramic image in such a way that the user may access a subset of it. The obvious advantages are that it eliminates the need for a 360° projection space and requires less display bandwidth.

A reasonable method of viewing panoramic imagery is through a small flat rectilinear window such as a video display if the user has control of the point of view, using a joystick, for example. Intel's DVI technology has such a method for "dewarping" and displaying imagery shot with a fisheye lens. 14

A less reasonable method (but one that kept this author obsessed for several years) is where a projected image physically moves around the playback space in order to retain the spatial correspondence.15 A "moving movie" requires neither the power nor bandwidth to fill the entire playback space, but it nevertheless requires a special playback space.

It is possible to combine "interactive small window" viewing with spatial correspondence by wearing the display on one's head and tracking head position. These head-mounted displays (HMDs) can also offer properly accomodated, stereoscopic, wide angle optics16 and have received a great deal of recent attention under such labels as "virtual realities," "virtual environments," and "cyberspace."

Virtually all imagery shown in HMDs today are either computer-generated or from live telerobotic cameras. Realworld recording and storage for HMDs presents novel challenges. For example, shooting for both stereoscopy and panoramics has no simple solution, since two panoramic cameras separated for stereoscopy results in variable parallax as the "interactive small windows" rotate.


Surrogate travel, or "moviemaps," is the ability to move around, allowing the user to laterally move through a recorded or created place. Moving around any virtual space presents some problems not present when looking around a panoramic scene. Looking around need not be explicitly interactive: the entire view can be displayed. But moving around under one's own control must be explicitly interactive: one must tell the system to change lateral position. Thus, while an audience in a panoramic theater can all look in different directions, an audience in a surrogate travel theater must somehow to come to grips with who is navigating.

Problems arise when surrogate travel is shot with real cameras, rather than generated from 3D databases. Though it is possible to record an entire panorama from a given point in a single instant, the only way to record surrogate travel in a single instant is with one camera at each location. The more realistic alternative is to move a single camera from one location to another, but time artifacts caused by moving clouds, shadows, cars, and people may result.

Another major difference between panoramic recording and surrogate travel recording is in continuousness. Once a panoramic scene is recorded and stored as a 2D single image, the user may have continuous access to any portion of it. But surrogate travel requires recording many 2D images at spatial intervals (like one frame every ten feet) and creating in-between images from these is a state-of-the-art computing problem. Hence surrogate travel material made from realworld recording is currently stored as many 2D images, on fast-access lookup media such as optical videodiscs.

5.1. One-Dimensional Movement

One-dimensional surrogate travel is along one particular path. The user may go forward and backward, at any speed, but cannot stray from this path.

5.1.1. Distance-Dependent Recording

In order to give the user a predictable sense of speed control, realworld images along a route are best shot at regular spatial intervals. Motion picture cameras, both film and video, are time-triggered instruments, for example recording one frame every 1/24 or 1/30 of a second. If the camera tracking speed can be held constant, then time triggering is equivalent to distance triggering. Otherwise explicit distance triggering is necessary like from an odometer or an external "fifth wheel."

The triggering distance affects visual continuity on the one hand and frame storage "real estate" on the other. The more images, the smoother the apparent movement, but the more storage space required. Smoothness is also related to angle of view of the camera, height and distance to the nearest objects, and camera stability.

5.1.2. Image Stabilization

Camera stability is a realworld problem, not relavent for virtual cameras or for model cameras on motion control systems. Instability results from any variance of the lateral path or the angular position of the camera during shooting. High frequency instabilites such as vibrations will produce blur or smear and affect individual frames. They can be minimized by using a short exposure time, a wide angle lens, and staying away from close or fast-moving objects.

Low frequency instabilites will produce a "wobble" from frame to frame. Since moviemaps are often shot at frame rates less than normal motion pictures' (one frame per second may be an average recording speed, for example), such instabilities are exagerrated. Consequently, closed-loop gyroscopic stabilizers (such as Wescams, Gyrospheres, or Tyler "Sea Mounts") perform better than either passive gyroscopic stabilizers (such as some helicopter mounts) or passive inertial stabilizers (such as Steadicams). In-camera and in-lens stabilizers (such as the 1962 "Dynalens," Arriflex's Image Stabilizer, and Schwem's "Gyrozoom") can only correct for pan and tilt but not for rotation.

5.2. "1.1" Dimensional Movement

Moving along a path with occassional choice-points is a far cry from being able to "travel anywhere." One might call this class of surrogate travel "1.1 dimensional" because only some of the points along the path have a two dimensional choice and most have only a one dimensional choice.

5.2.1. Match-Cuts

At nodes (points with a two-dimensional choice), the better the match-cut between two intersecting routes, the greater the sense of seamlessness. Several factors contribute to matching cuts. First, the camera has to be in the same position and pointing in the same direction for both routes as it passes through the node. One may use lines on the street or one may use compass coordinates, but there is no easy way to do this in the real world. Also, temporal artifacts are inevitable, since the matching shots must be recorded at different times. Lighting and shadow discepancies can be minimized by shooting during a narrow window of time, like from 10 am to 2 pm, or shooting on cloudy days. For 3D database recording, as well as for motion control model shooting, these problems don't exist.

5.2.2. Camera Angle

Since panoramic recording is often impractical, the camera's angular position becomes an issue, since a less-than-360° lense must be explicitly pointed. The simplest technique is to fix the camera angle to the lateral direction of motion, either pointing straight ahead or pointing sideways. At each node, every possible turn must be separately recorded in order to match-cut the intersecting routes. MIT's Aspen Moviemap was shot in this fashion.17

Another, more complicated way to point the camera is in an absolute direction independent of lateral position. For example, a camera could always point north regardless of whether it is facing forward, sideways, or backward. An advantage is that shooting turns is not required since the camera is pointing in the same direction at any given point.

An even more complex way is to point the camera at an absolute location, such as tracking a central object. For example, the "Golden Gate Videodisc" produced by Advanced Interaction Inc. and directed by the author for the Exploratorium is an aerial moviemap over the Bay Area where the camera always pointed at the center of the Golden Gate Bridge. Like absolute position, the payoff is that separate turn sequences are not necessary to record since the camera always points in the same direction at any given point.

5.3. 2-D and 3-D Movement

Recording and storing two-dimensional grids and three-dimensional lattices, where the user has freedom of movement, is problematic because the numbers grow quickly. Consider that a 15 by 20 foot space with a 10 foot high ceiling requires 3,000 frames if shot at intervals of one feet and over 5 million frames at intervals of one inch!

In the future, the very idea of discrete frame storage will be obsolete. Computers will store information in spatial databases based on whatever data has been collected (and will interpolate what is missing). It has been demonstrated that significant bandwidth compression occurs when the visual information from separate (highly redundant) movie frames are stored as a single computer model.18 But data will still need to be collected, and visual data will be collected with cameras.


Realtime imaging is the process of recording and displaying temporal sensory information indistinguishable from unmediated reality.

6.1. Dynamic Visual Cues

6.1.1. Frame Rate

At least 15 updates per second are necessary for motion to appear on a screen. The upper level is arguable. Modern American film runs at 24 frames per second (fps), American video updates at 60 fps, but 80 or 90 fps may be necessary.19

Apparently, part of our association with the "film look" is film's lack of a sufficient frame rate. When video is "defluttered" (every other field removed reducing the effective update rate from 60 to 30 fps), the result takes on a film look .20 Similarly, the Showscan film format, which records and projects at 60 fps, has a video look.

6.1.2. Temporal Continuity

Temporal continuity is the opposite of cuts, or montage. The real world exhibits temporal continuity always, regardless if it is seen looking out from a train or from a racecar or sitting still. There are no cuts in the real world. (Believing that you really are instantly somewhere else, as opposed to imagining it, is the definition of psychosis.) Temporal continuity is the temporal equivalent of spatial consistancy.

Cinema, on the other hand, consists of adjacent frames which are either continuous (those within a shot) or not (those between shots, the "cuts"). Cinema is the counterpoint between "respect for spatial unity"21 and its "first and foremost" characteristic, montage.22 Noteworthy is Alfred Hitchcock's Rope, a feature film shot with a carefully orchestrated camera, which has virtually no cuts.

6.2. Dynamic Non-Visual Cues

6.2.1. Audio

Audio in synchronization with image is part of our association with cinema's ability to convey presence, and audio has its own resolution specifications. Of particular relavence here is the spatialization of sound. Sound can be spatialized one of two ways: by using multiple speakers each positioned in the point of origin of the sound source or by using binaural sound.23

6.2.2. Inertial Motion

In addition to visual and auditory cues, we receive temporal cues by how we physically feel. This feeling of motion is based primarily in the vestibular system in the inner ear and is sensitive to linear and angular acceleration.24 Flight simulators (as well as Disney's Star Tours and Body Tours) move the viewers on a motion platform sychronized with the image and sound to enhance their effect.

6.2.3. Force Feedback

Force feedback is the ability to "touch" a virtual object inside an image. For example, a force-feedback joystick has been used to simulate textures.25 Similarly, a hand grip made of a four inch bar with three computer-actuated springs on each end can simulate angular and lateral force, and has been used successfully to augment visual display for spatial tasks.26


Each of these elements of realspace imaging can either be respected or violated. An image either is orthoscopic, stereoscopic, or panoramic or it's not. Sometimes violations of these elements are by default: it's more convenient to carry around non-orthoscopic images of your family than "actual size," stereoscopic cameras are expensive, and panoramic movies require special theaters.

But sometimes violations of these elements are intentional: a cut in a film, slow frame rate in a rock video, a simple line drawing rather than a high resolution image, silence rather than sound. The very idea of respecting all elements of realspace imaging is ultimately a losing battle. Giving the user everything is rarely possible. There is never enough bandwidth. There will always be artifacts.

The trick is to give the sense of everything without actually giving everything. The question, then, is how to chose what is most important. And what is most important is always context-dependent. This report is an attempt to lay out the choices. Choose wisely: that is where the art lies.


This paper is a much-condensed version of a forthcoming Apple Computer Technical Report entitled "Elements of Realspace Imaging" written for the multimedia community and supported by the Apple Multimedia Lab in San Francisco. The author wishes to thank Phil Agre, Doug Crockford, Scott Fisher, Brenda Laurel, Robert Mohl, and Rachel Strickland as well as the members of the Apple Multimedia Lab, particularly its Director, Kristina Hooper, all for their lively discussions and criticisms throughout the course of that report.


1. Hochberg, J. and Gellman, I. "Feature Saliency, 'Mental Rotation' Times and the Integration of Successive Views," Memory and Cognition, No. 5, 1977.

2. Schrieber, W.F. "Psychophysics and the Improvement of Television Image Quality"; SMPTE Journal, pp. 717-725, August, 1984.

3. (report on the "Visual Almanac" forthcoming from the Apple Multimedia Lab).

4. Lipton, L. Foundations of the Stereoscopic Cinema, New York: Van Nostrand Reinhold Company, 1982.

5. Bove V. M. Synthetic Movies Derived from Multi-Dimensional Image Sensors, Cambridge, MA: Ph.D. dissertation, Media Laboratory, M.I.T., 1989.

6. Lipton, Stereoscopic Cinema.

7. "3D Display - Without Glasses"; Design News, p. 13, December 1988.

8. Negroponte, N. "Media Room"; SID 22, no. 2, 1981.

9. Reveaux, A. "Displacing Time and Image"; Artweek, June 30 1984.

10. Williams, R. D. and Garcia, F. "Volume Visualization Displays"; Information Display 5, no. 4, pp. 8-10, April 1989.

11. Fisher, S. S. "Viewpoint Dependent Imaging: An Interactive Stereoscopic Display"; SPIE, Volume 367, Processing and Display of Three-Dimensional Data, 1982.

12. Yelick, S. Anamorphic Image Processing, B.S. thesis, Architecture Machine Group, M.I.T., 1980.

13. Ibid.

14. Wilson, K.S. The Palenque Design: Children's Discovery Learning Experiences in an Interactive Multimedia Environment, PhD dissertation, Graduate School of Education, Harvard University, 1988.

15. Naimark, M. "Spatial Correspondence in Motion Picture Display"; SPIE, Volume 462, Optics and Entertainment, 1984.

16. Fisher, S.S. et al. "Virtual Environment Display System"; ACM, Workshop on Interactive 3D Graphics, 1986.

17. Mohl, R. Cognitive Space in the Interactive Movie Map: An Investigation of Spatial Learning in Virtual Environments, PhD dissertation, Education and Media Technology, M.I.T., 1981.

18. Bove, Synthetic Movies.

19. Schrieber, "Psychophysics."

20. Naimark, M. "Videodisc Production of the 'Visual Almanac'"; Technical Report #14, The Multimedia Lab, Apple Computer,Inc., 1988.

21. Bazin, A. What Is Cinema?, Volume 1, Berkeley: University of California Press, 1967.

22. Eisenstein, S. Film Form, New York: Harcourt Brace Jovanovich, Inc., 1949.

23. Wenzel, E.M. et al. "A Virtual Display System for Conveying Three-Dimensional Acoustic Information"; Proceedings of the Human Factors Society, 32nd Annual Meeting, 1988.

24. Deyo, R. and Ingebretsen, D., "Notes on Real-Time Vehicle Simulation"; ACM Siggraph, Course Notes 29, Implementing and Interacting with Real-Time Microworlds, 1989.

25. Minsky, M. "Texture Identification Experiments"; ACM Siggraph, Course Notes 29, Implementing and Interacting with Real-Time Microworlds, 1989.

26. Ming, O. et al. "Force Display Performs Better than Visual Display in a Simple 6-D Docking Task"; ACM Siggraph, Course Notes 29, Implementing and Interacting with Real-Time Microworlds, 1989.