Immersive Display for 3D Earth Models

Immersive Display for 3D Earth Models

Michael Naimark

michael@naimark.net

Abstract

An immersive display for 3D Earth models is described which seamlessly combines a floor display, for optimal flying around and viewing of the ground below, with one or more wall displays, for optimal viewing of ground-level scenery and action. Done properly, views from the ground level appear “like being there” and done artfully, flying around feels as if Earth is literally falling out from below the viewers’ feet as they skyrocket upward and land somewhere else. Two preferred embodiments are described, a large, theater-like space and a smaller, more portable version.

Background

Three-dimensional Earth models have recently become widely available to the general public. These models, viewed either through a web browser or using a downloaded application, allow users to freely navigate around the planet, from seeing the “Whole Earth” from any viewpoint to zooming down, literally, to ground level. The amount of data is enormous, with the potential for photorealistic ground-level quality, for seamlessly posed photos and videos (1), and for geo-located graphical annotations. Such rich, comprehensive Earth models are inevitable, with a wide variety of applications ranging from education and virtual travel to entertainment and gaming. The experience of “flying” around Earth anywhere in seconds and “landing” on the ground at particular locations is unprecedented: it profoundly changes our perception of our planet. Even on a small screen, this experience has a visceral quality.

But what would a 3D Earth model look like on a big screen? How would a large-scale system for experiencing such models be optimally designed? What sort of innovations and novelties could be incorporated to specifically enhance this experience? How could such a system be scalable and economical?

Immersive displays have existed as long as displays have existed, going back to panoramic paintings in the late Eighteenth Century and to giant and multi screen cinema shown at the 1900 Paris Exposition (less than five years after the birth of cinema itself). Large-scale cinema continues as “special venue films” such as Imax, Showscan, and CircleVision. World Expos offer even more variety, with special theaters for projecting on spheres, hexagons, in giant pits, and through the floor, often in stereoscopic 3D. In parallel and for considerably less expense, immersive systems developed as art installations have included robotic projection, unusual multi-screen configurations, relief projection, and moving floors.

With advances in computing and, particularly, in projection technology, real-time interactive digital immersive displays are increasingly popular. One of the first such systems, shown in 1992, was the University of Illinois’s “CAVE,” (2) a room-size cube with digital stereoscopic projection on 3, 4, 5, or 6 surfaces. CAVEs typically require rear projection and are optimized for a single user who wears a position tracker to determine the proper point of view. Smaller, less expensive systems offer more limited features, such as the single projector hemispheric Panoscope (3) developed at the University of Montreal, while larger, less interactive systems offer group experiences, such as 360 degree cylindrical digital projections developed at the University of New South Wales (4) and at Rensselaer Polytechnic Institute (5).

A new public immersive space, one of particular relevance, is the DeepSpace Theatre (6), which opened in early 2009 in the Ars Electronica Centre in Linz, Austria. DeepSpace has a 50-foot wide wall projection integrated with a 50-foot wide floor projection, both using very high resolution “4k” video and both capable of stereoscopic 3D. Several dozen people, standing, can comfortably occupy the space. Both wall and floor are front-projected, so when people stand in the projection area of the floor, they cast shadows, but since the space is so big and the ceiling very high, they are often barely noticed. The Ars Electronica Centre had the first CAVE in Europe, and the developers of DeepSpace consider it an informed solution for larger-scale immersion. They are currently experimenting with various applications, including real-time 3D gaming, giga-pixel image display, and stereoscopic video. Hence, like the CAVE and most other immersive systems, DeepSpace is still considered a “general purpose” immersive display.

Disclosure

The system and methods disclosed here consist of a horizontal floor-level display under or near the audience’s feet together with at least one vertical wall display. The source of imagery for the displays is a 3D computer model of Earth, capable of displaying a detailed view of a “Whole Earth “ in its entirety as well as zooming down to detailed ground level views. These ground level views may also contain seamlessly posed photos and videos as well as geo-located graphical annotations. The 3D Earth model system consists of a 3D Earth database and computer hardware capable of rendering arbitrary views. The actual imagery fed to the displays may be pre-rendered and recorded on a storage medium or may be computed and outputted in real time. The system may display both linear and interactive programs and may be either monoscopic or stereoscopic.

The integration of a horizontal floor display is a unique feature of this system, since, as an Earth model, much of the view and navigation involves “looking down” from above. Integration with a vertical wall display additionally affords the opportunity to view the world from ground level. The floor display synchronized with one or more wall displays also adds a strong visceral element because part of the entire display configuration is always viewed peripherally. Also, since most people are not accustomed to horizontal displays below or near their feet, the novelty element adds to the sense of immersion in and of itself.

This system is designed to display an immersive version of 3D Earth models in two principal modes: flying around the planet and from a landed, typically ground, position. In both modes, all displays present a synchronized, visually seamless view.

The landed position mode represents standing in a single place, with the goal of feeling “like being there”. The floor imagery is of the ground itself at ground level. Such imagery could be a two-dimensional orthogonal view of “looking down” from standing height. The wall imagery is the orthoscopically correspondent viewpoint “looking out” in a direction parallel to the ground, i.e., where both elevation and rotation are zero. From this basic configuration, the system is correctly registered to the physical floor of the actual space (assuming it’s level), and both the image of the floor and the image of the horizon on the wall should “feel level” with respect to the real world. From this configuration, the contents of all displays could contain dynamic elements, either in the 3D model itself such as 3D moving vehicles, people, etc., or embedded in the model such as posed videos or dynamic annotations.

The azimuth of the wall imagery can also be dynamic, i.e., panning the viewpoint is possible without disturbing the feel of being level. If the floor imagery remains static, the effect is like being in a rotating room. If the floor imagery rotates correspondingly in sync, the effect is like “hovering” on the floor as it rotates. If the floor imagery rotates correspondingly in sync and the floor itself physically rotates in sync, the effect is like actually being on the ground (7).

In this landed position, with 2D imagery on the floor as described, the overall effect is similar to the “forced perspective” effect used in cinema and on stage, where the floor appears to be spatially seamless with the vertical backdrop. In cinema production, it is well known that forced perspective only looks perfect from a single point of view, which is where the camera is placed. But in stage production, the same forced perspective usually looks pleasing enough to the audience regardless of where one sits.

It should be noted that various anomalies and artifacts, while not necessarily technically or perceptually “correct,” are often successful, interesting, or economical. For example, the wall imagery cannot be orthoscopically correct for everyone in an audience but it often appears successfully close enough. Or, changing the elevation of wall display (“tilting the camera”), with or without changing the orthogonality of the floor display to match, can produce an intense visceral effect. Or, using a static floor display in sync with a dynamic wall display may appear acceptable since ground views such as sidewalks are generally static and may be considerably cheaper to produce and display.

Finally, the lateral position of the wall and floor imagery can change as well. In this landed position mode, by definition, changing the “X” and “Y” of the viewpoint is like moving around on the ground.

The flying around the planet mode represents changing the “Z” viewpoint to “fly up” while changing the other lateral and angular elements to “fly over” the planet from one location to another, with the goal of creating a unique, impossible-in-the-real-world, “Superman-like”, experience. The floor imagery zooms out while the wall imagery “falls under” (sometimes called a “push-off” in video) in sync. Done properly and artfully, the audience feels as if the Earth is literally falling out from below their feet and that they are skyrocketing upward.

The actual altitude of “flying up” depends on two factors: the distance to the next destination and the aesthetic amount of “bounce” desired. For example, if the starting location is Time Square, New York, and the destination location is the Eiffel Tower, Paris, it will be more pleasing to fly up to an elevation high enough to see the whole Earth. But if the starting location is Time Square, New York, and the destination location is Union Square, New York, fly up as high will be too disorienting. Similar altitude determination algorithms are already incorporated in current Earth Models such Google Earth and Microsoft Bing Maps.

Unlike current Earth models methods of flying from one destination to another, additional care must be made for the determinations of changing the 3 angular parameters of the viewpoint. Given two or more synchronized displays, it is desirable to show the Earth partially on multiple displays during this mode, as a single integrated view. Such determinations may be considered “choreography” (think “2001: A Space Odyssey”): it must be fast, smooth, dramatic, and pleasing. A typical duration of an event in this mode is only several seconds, like Superman, allowing for well-known cinematic effects and illusions (such as “swish pans” and dissolves) to be incorporated with little notice. Of course, “flying for the sake of flying” rather than to travel from one destination to another is also desirable for certain applications, e.g., aerial tours, and may have longer durations.

It is important to create the illusion of looking through the displays rather than only at their surfaces. In landed mode, the floor display may look appropriate as a 2D image since the ground is flat under one’s feet, but the wall display looks more appropriate as a stereoscopic image since elements of the imagery may appear at a variety of depths, e.g., people in the foreground and buildings or mountains in the background.

In flying mode, all imagery on both wall and floor displays is best shown as appearing far away, since everything is far away. At faraway distances, both eyes see essentially the same thing from the same perspective, but with two important cues. One cue is that the eyes accommodate or focus on infinity, just as a camera lens would. “Infinity focus” is possible but generally requires optics in the path between the display and the eyes. The other cue is that both eyes gaze parallel rather than converge inward, as they do when triangulating on things not far away. “Parallel gaze” is possible using stereoscopic displays with the same image for each eye but horizontally offset to prevent the eyes from converging on the surface of the display.

Preferred Embodiments and Optional Features

Two preferred embodiments are described, a large-scale version on the scale of DeepSpace and a smaller scale version on the scale of Panoscope. The large version is theater-like, capable of accommodating approximately 50 standing people, and is not easily portable; while the smaller version is capable of accommodating approximately 10 people and is relatively easy to transport and set up, for example, for conferences, tradeshows, art shows, and educational applications.

Both versions use a computer with a 3D Earth model database and software, but with two additional capabilities. First, they can display embedded video whose frame is posed in the 3D model to appear as perfectly aligned overlays. These videos are typically shot from a stationary camera so that the pose is also stationary. These videos are preferably shot stereoscopically using two lenses. Such videos are ideal for showing people in the foreground and scenery in the background, for example, for on-location interviews.

The second additional capability for both versions is the ability to transition in and out of a pre-recorded two-dimensional ground image. This image may be camera originated, for example, a photo of grass, sand, water, or sidewalk and it may be either static or dynamic (a short video loop, for example). It may also be created artificially or hybrid, based on altered or repeated photographic material. In practice, photos of the ground can be taken during location production of the stereoscopic embedded video, but generic “stock shots” could be used as well. The reason for this capability is that no current Earth model supports high-resolution ground imagery looking down from human standing heights. During the first and last seconds of the flying around mode, smooth transitions are required between these ground images and the Earth model. A simple preferred embodiment is to simultaneously change the scale of the ground image while dissolving from and to the Earth model.

The preferred embodiment for the large-scale version is, given sufficient ceiling height, using front projection for both wall and floor displays. The floor itself can be larger than the display portion, so people can “stand off” the display if they wish, best from the side opposite the wall display. Shadows are minimized in proportion to the size of the space and height of the ceiling. Stereoscopic projection is preferred, using either active or passive viewing systems (hence glasses of some form are required). With either viewing system, it is important to keep the audience more-or-less facing forward toward the wall display; if they rotate to the side or look at the floor projection behind them, the stereo disparity will be compromised (a problem so far unsolved without using individually tracked personal head-mounted displays). A less preferred embodiment, but more practical and economical, is to simply use monoscopic, non-stereoscopic, imagery, in which case adding more motion parallax (e.g., a constantly moving viewpoint) partially compensates.

The preferred embodiment for the smaller version is using two matrices of flat-panel displays, one for the wall and one for the floor. The floor display matrix may be either covered with transparent protective material such as glass for the audience to stand on or it may be in immediately adjacent to (e.g., in front of) the audience. Since the displays are closer to the audiences, depth cues are even more important, and the use of stereoscopic video is preferred. Adding slight magnifying optics (a positive diopter such as reading glasses) also trick the eyes into focusing on infinity when they are actually looking at closer distances.

Generally speaking, since the larger version is more theater-like, the program is usually designed to be more theater-like: faster-paced and either linear or with limited choices given the size of the audience. For such programs, limited branching options are often adequate and therefore the material can be pre-rendered and recorded on a storage medium to increase speed and decrease costs. The smaller version, with a more limited audience size, can be either driven by real-time system to allow unconstrained navigation or by pre-rendered, stored material.

Optional features include more displays; haptics and audio; and motion platforms and rotating floors. Additional wall displays can be added, making the system more CAVE-like, though they are costly, require more space (especially if rear-projection is used), and may lower the contrast of the displays due to light scatter from one display to another. Both haptics and audio can be used to enhance the experience, and in particular, low-frequency audio can be used as a synaesthetic substitute for physical, haptic sensations. For example, a “jolt” from subwoofers at the instant of takeoff or landing may “feel” authentic. Similarly, the use of motion platforms or simple rotating floors for the audience adds a strong visceral effect.

Today’s 3D Earth models are still in development in terms of detail and responsiveness. Drastic increases in both are inevitable. For one thing, the data exists. The system described here can only get richer and cheaper.

References

(1) http://interactive.usc.edu/viewfinder/

(2) http://www.evl.uic.edu/pape/CAVE/

(3) http://www.panoscope360.com/ (pages 1-3)

(4) http://www.icinema.unsw.edu.au/projects/infra_avie.html

(5) http://empac.rpi.edu/commissions/ (pages 1-2)

(6) http://www.aec.at/center_exhibitions_ds_en.php?id=96

(7) US Patent 5,601,353, Naimark, et al., Panoramic display with stationary display device and rotating support structure

Back to Projects Pending 2010