Cognitive Design Principles for Automated Generation of Visualizations

Barbara Tversky

Stanford University, USA

Maneesh Agrawala

Microsoft Research, USA

Julie Heiser

Stanford University, USA

Paul Lee

NASA-Ames, USA

Pat Hanrahan, Doantam Phan, Chris Stolte

Stanford University, USA

Marie-Paule Daniel

LIMSI-CNRS, France

 

 


Cognitive Design Principles for Automated Generation of Visualizations

Before there were written languages, there were visualizations, painted on caves, inscribed in stone, carved on wood. Visualizations of things that are actually visible, such as maps, building plans, people, and wildlife, are ancient and widespread. Visualizations of things that are metaphorically visual, like graphs of economic data or charts of organizations, are a relatively new invention, first appearing in Europe in the late 18th c (Beninger and Robyn, 1978). Like written language, visualizations are cognitive tools, designed to augment the capacity of the human mind (see Tversky, 2001, 2005 for further discussion). Then, as now, they serve myriad purposes, to record information, to lighten the burden of working memory, to convey information to others, to promote discovery, inference, and insight, to facilitate collaboration. Primary among their roles is communication. Some visualizations, notably maps, have undergone years of informal user testing as communication tools. The consequent refinements provide suggestions for good design, and the process of refinement provides suggestions for uncovering design principles. Other visualizations are the latest products of the latest computer tools, often in need of refinement.

Now, visualizations seem to be everywhere, in automobiles, newspapers, airports, textbooks, and instructions. As users know all too well, many of these visualizations are frustrating; they are complex and cluttered, often with extraneous and distracting decorative details. Like all communication, to be effective, visualizations should schematize effectively, that is, they should extract, emphasize, and even distort the information that is important to the task and eliminate the information that is not. Take maps as an example. An aerial photograph, despite its photorealism, is not an effective road map. It portrays detail that is not only irrelevant to the road structure but hides it. An effective map for driving emphasizes roads, intersections, and other features that aid navigation. In fact, road maps exaggerate the size of roads, so that they can be seen in the maps. Maps serve purposes than driving. A good map for hiking will display other features such as trails and topography. A tourist map may deliberately mix perspective, to show an overview of the streets to allow people to find their ways and frontal views of the tourist sites, to allow people to recognize them. In order to facilitate perception and comprehension, maps may violate certain spatial relations, for example, exact distance, size, angle, and perspective.

In the best cases, visualizations have developed in a community of users who produce and comprehend them, refining them and fine-tuning them to suit the circumstances through casual user testing in the wild. This informal process can be systematized in the laboratory. Participants can be asked to produce visualizations; characteristics of those visualizations can be extracted. These characteristics can be systematically varied, and tested for comprehension in other participants. What is important and what is effective in visualizations depends on cognition, on how humans represent the task at hand and how they perceive and represent the information presented in visualizations.

Creating effective visualizations, as found in instructions, textbooks and other media, requires collaboration between graphic designers and domain experts, as well as testing the target audience. The domain experts know the content to be communicated; the designers know techniques for conveying content. In practice however, tightly intertwined collaboration between designers and domain experts is not always possible. Even when such collaboration is possible, it is not ideal, as designers may be educated in design, but not in the discipline and domain experts may be educated in the discipline, but not in general design principles. In addition, designing is a time-consuming, labor-intensive process that cannot keep up with the increasing demand. Try to imagine, for example, the number of maps that are downloaded daily from websites. Currently computers save labor, but primarily by replacing only low-level tools such as pens, brushes, ink, paint, and paper. To keep up with demand, it is essential that computers provide higher-level design tools that can make it easier to quickly produce effective visualizations

We have undertaken a novel use for computers, to instantiate cognitive design principles in algorithms that automate the process of generating effective visualizations. Our approach combines research in cognitive and computer science in three iterating steps:

            1) Revealing the mental representations people have for a given domain and the visual devices they use to convey it, yielding domain cognitive design principles

2) Development of algorithms that create effective visualizations based on cognitive design principles.

3) Testing the visualizations to insure that they adequately convey the desired information.

In the section that follows, we consider the issues involved this three-step approach and we illustrate the approach in two domains we have worked on, route maps and assembly instructions.


A Mutlifaceted Approach

General Cognitive Design Principles

In summarizing a large number of studies comparing static and animated visualizations for teaching a broad range of concepts, we suggested that effective visualizations conform to two general principles (Tversky, Morrison, & Betrancourt, 2002). According to the Principle of Congruence, the structure and content of a visualization should correspond to the structure and content of the desired mental representation. According to the Principle of Apprehension, the structure and content of a visualization should be readily and accurately perceived and comprehended. To illustrate the depth of these principles, consider animations, increasingly popular as tools for creating them become available.

Many animated diagrams, such as those showing how the heart works or how to operate equipment (see http://www.interactivephysics.com/simulations.html for examples) at first appear to fulfill the congruence requirement in that they use change over time to convey change over time. Dozens of experiments, however, have failed to show benefits of animations over equivalent still diagrams in conveying information, from animations illustrating how a bicycle pump works to those illustrating how computer algorithms work. This is surprising, given the premises that animations use change in time to convey change in time, and the conclusion arouses controversy. Some of the failures of animation may be because the animations violate the Apprehension Principle. They are too complex or too rapid to be accurately perceived and conceived and, unlike static graphics, they cannot be re-inspected at the viewers’ own pace. However, animations also seem to violate the Congruence Principle. Although events in the world are continuous, they are typically understood as a sequence of discrete steps (e. g., Zacks, Tversky, and Iyer, 2001). The steps are the joints, the transitions from action to action. The joints of events performed by hands are often objects or object parts; those performed by feet are turns at landmarks (Tversky, Zacks, and Lee, 2004). The joints of events do not come at regular temporal intervals. If people conceive of animated events as sequences of discrete steps, then it may be more effective to visualize events in steps rather than requiring the user to do the segmentation. A sequence of stills for example, may actually provide a more compatible cognitive match than an animation (Tversky, 2005). A sequence of stills allows viewers to directly compare the state of the system at each important step.

Individualizing Visualizations

The utility of many visualizations depends on their adaptability. For example, with maps different schematizations of the same environment are desirable depending on the task and the user. Hikers, bikers, drivers, travelers, and surveyors all need different information just as those unfamiliar with an environment need different information from those familiar with it. Hand-designed maps can take such needs into account. But creating effective individualized maps by hand for every user in every situation would be too expensive and time-consuming to be practical. Even within the domain of route maps for drivers, the number of possible routes that drivers may want is inconceivably large.

Our vision is to create individualized visualizations automatically, using computer algorithms that instantiate cognitive design principles. To do this we require methods for uncovering cognitive design principles and methods for incorporating these principles into computer algorithms. We describe both, using examples from two domains representative of large classes of visualizations, maps and assembly instructions. Maps are one of the most ancient and pervasive of visualizations, whether drawn on paper or carved in wood or incised in stone (e.g., Brown, 1979). Assembly instructions are representative of a large class of visualizations that includes instructions on how to put something together, how to operate a complex system as well as how complex systems, from hearts to corporate structures function. Underlying such visualizations are the parts of the system and their spatial, temporal, or functional relations. Visualizations of systems, too, are ancient; frescoes in Egyptian tombs show how crops are grown and harvested. Within each domain, we have chosen to explore and develop examples likely to be familiar to the general public, for maps, route maps and for systems, assembly instructions.

Revealing Cognitive Design Principles

To make visualizations that are congruent with the desired mental representations requires techniques that reveal those internal mental representations. Cognitive psychology has a large bag of tricks for externalizing the internal. Reaction times are commonly used to this end. The reasoning typically goes as follows: If the mental representation has a certain format, then responses for retrieval tasks congruent with that format should be faster and more accurate than responses for formats that are not congruent with the format, as they require more onerous and time-consuming mental transformations. For example, maps that are not oriented in the direction of travel must be mentally transformed to correspond with the navigator’s current orientation. Therefore people are generally slower and less accurate when using such maps. Similarly, certain patterns of error, of grouping, and of description follow from certain types of mental representations, elucidating them. Returning to maps, people draw streets that are not parallel as parallel, suggesting that that is how they think about them (e. g., Tversky, 1981).

For complex mental representations, more open-ended techniques may be more revealing. One that we have adopted and will describe here is to ask participants to construct descriptions and depictions for a given domain. This captures the natural way that communication occurs, and provides us with a rich set of data. Analyzing the structure that is common to both descriptions and depictions provides the commonalities and differences between them. The common structure reveals the underlying representation. The features particular to descriptions and depictions provide insights into the design of visualizations, as visualizations are typically accompanied by language. These insights do not fully determine the visualizations. Guidelines must be drawn from other sources as well, and then tested by users in order to qualify as design principles. Comprehension usually goes beyond production, from babies learning to speak to adults using diagrams, so production sets a lower limit. Comprehension tasks, then, complement and supplement production tasks.

Creating Computer Algorithms

Creating effective visualizations entails numerous design decisions, as illustrated by a few examples. For a visualization of a route, what landmarks should be included, and where should they be included? How much can angle and distance be distorted without confusing the user? For a visualization of a process that extends in time, how should the process be segmented into steps? How should the steps be ordered? For each step, what is the best view or perspective to show? What detail should be included, and what omitted? What details need to be distorted, or shown as insets? What extra-pictorial features need to be added, features such as lines, arrows, and highlighting? The cognitive research will inform these decisions, but they do not completely determine the algorithm. The cognitive principles may be in the form of trade-offs, or simply insufficient, so some aspects of algorithm generation may still rely on the educated sensibilities of the designer. User testing can help overcome these shortcomings, but it may never be possible to test all the possible design variants. Our approach is to build tools that are based on the cognitive design principles, but also allow users to override the automated decisions as necessary. Many of the design issues that arise are not unique to the particular examples we have chosen, so that their solutions will have generality to other domains.

Route Maps

Revealing Cognitive Design Principles for Route Maps

The map someone sketches to show a friend how to get to a party doesn’t usually resemble a map from the USGS or Rand McNally. Nonetheless, such sketch maps have been used for hundreds of years, presumably with success. Sketch maps differ from the efforts of geographers in several ways. To find out how, Tversky and Lee (1998) approached students near a campus dorm, asking them if they knew the way to a popular fast food place. If they did, they were asked to either sketch a map or write down directions to the restaurant. Typical instructions appear in Table 1, and typical sketch maps in Figure 1.

------------------------------

Insert Table 1 about here

------------------------------


------------------------------

Insert Figure 1 about here

-------------------------------

The maps shared a number of characteristics. They had an infrastructure of lines formed into paths and turns. The paths included, in this case, roads, were primarily those that comprised the route. Other paths intersecting the route were included when they were useful for keeping the traveler on track, for example, intersections just before or after a turn. Paths were simplified, primarily to straight and curved lines. Turns were also simplified, to approximately 90 degrees. Distances were altered; long distances with little action were shortened and short distances with many turns were enlarged.

Note that these simplifications and alterations are actually distortions that violate the metric properties of the environment. However the topology of the sketched routes – the points of intersection between the roads – corresponded to the true topology of the routes in the real environment. Landmarks were included when they were important for turns or for keeping on track; they were typically expressed as names of streets or blobs representing structures and usually labeled. The text route descriptions shared many of the same properties. Exact angles of turns and distances were typically absent. The shape of paths was dichotomized just as in the depictions; for straight paths, informants wrote, “go down” and for curved ones, they wrote, “follow around.” Despite these simplifications, omissions, and distortions, such schematized depictions and descriptions are usually sufficient to arrive at the destination because the environment provides the missing information about angles of turns, shapes of roads, and distances. What is essential is the sequence of paths and turns. Visualizations are used in contexts, and the contexts can provide missing information and disambiguate. Thus, the sequence of paths and nodes is the skeletal mental representation underlying both depictions and verbal descriptions of routes.

Of course, a highway map could be used for the same purpose. But a highway map, even one with the route marked on it, has disadvantages. It is cluttered with irrelevant information that makes finding the relevant more difficult. It has a single scale, so large portions of the map convey no information, and at the same time, discerning many small turns may be difficult. Because a highway map doesn’t extract the information needed for a mental representation of a route, it demands considerable time-consuming processing before it can be useful.

Applying Cognitive Design Principles to Computer Algorithms for Route Maps

Two basic design principles follow from the cognitive analysis:

1)    People think of routes as sequences of paths and turns at landmarks and therefore the topology of the route (i.e. the turning points) must be depicted accurately.

2)    People don’t accurately apprehend or represent distances or angles, and therefore

such geometric information can be simplified to increase emphasis on the turning points.

We have developed a system called LineDrive that automatically designs route maps for any given origin and destination based on these principles (Agrawala and Stolte, 2001). LineDrive is responsible for choosing graphic attributes, such as, position, orientation, size, etc. for each of the graphic elements in the map, including roads, labels, cross-streets and landmarks. The space of possible map designs encompasses all possible choices of graphic attributes for each of the graphic elements. LineDrive uses search-based optimization over this large multi-dimensional space to find the map that best adheres to the design principles. The design principles are instantiated as layout constraints within the search-based optimization framework.

            Because LineDrive maps are based on a cognitive model of how people think about routes they are far easier to follow while driving than standard highway maps (see Figure 2). LineDrive has been commercially deployed on large internet map service sites (see www.mapblast.com) and the maps been received enthusiastically by a large community of users.

-------------------------------

Insert Figure 2 about here

--------------------------------

Assembly Instructions

Maps have been used to communicate spatial information within communities over the millennia, much like spoken language. As for spoken and written language, this provides a natural user-testing laboratory, where some construct maps and others use them, with greater or lesser success, an ongoing process that refines and improves. Instructions for assembly or operation have not undergone that refinement, and indeed are the frequent recipient of groans, complaints, and slogans. An informal, but wide-reaching survey of instructions for assembling or operating the sorts of things people bring into their houses, furniture, cameras, cell phones, computers, and the now proverbial VCR, reveals a number of common difficulties (some of this collection appears in Mijksenaar & Westendorp, 1999) . The typical visualization is an exploded view of the parts with lines and arrows. Such diagrams are usually at one scale and from one perspective, both of which may not be appropriate for all steps of assembly. They are frequently cluttered with so many parts and connections that it is difficult to discern particular components. They all too frequently show the entire assembly or operation at once, so that the sequence of assembly is not given. They often use extra-pictorial features such as arrows in multiple ways, with insufficient context to disambiguate meaning. As we shall see, even the instructions produced by student novices correct many of these problems. One notable exception to these common shortcomings of instruction is the widely admired instructions for Lego. Lego instructions are step-by-step; they also change perspective and scale when needed to show how to attach components.

Revealing Cognitive Design Principles for Assembly of Objects

The cognitive structures underlying assembly are multiple: A mental model of the object to be assembled, a mental model of the actions required for assembly, and a model for ordering the actions. People think of objects as a hierarchy of parts (Tversky & Hemenway, 1984). The parts segmented are those that are perceptually salient and functionally significant. In most cases, perceptual salience, that is, contour distinctiveness, and functional significance correlate, as in the wheels of an automobile or the handle of a pump or the legs of a chair. The correspondence between perceptual salience and functional significance promotes inferences from structure to function, especially when the form of the part suggests function, as in wheels, handles, and legs. Thus, objects though sometimes seamless, are nevertheless perceived as consisting of distinct parts, segmented by appearance and by function or behavior. The same holds for actions, such as making a bed or assembling an object. Though typically continuous in time, actions are thought of as a sequence as of discrete steps, distinct in both perceived action and conceived function. Goal-directed action sequences such as assembly are also conceived of hierarchically, with the higher level segmented by actions on separate objects or significant object parts and the lower level segmented by finely articulated actions on the same object or object part (Zacks, Tversky, & Iyer, 2001).

To reveal the mental representations underlying assembly and to reveal graphic preferences at the same time, we followed the same general strategy for uncovering cognitive design principles as we had for route maps. Heiser, Daniel-Ginet, and Tversky (in preparation) asked students to assemble a TV cart using the picture of the assembled cart on the package as a guide (see Figure 3).

-------------------------------

Insert Figure 3 about here

-------------------------------

After assembling the cart, the students were asked to construct instructions for assembling it under one of four conditions: 1) Use sketches and language to create instructions so someone else can easily assemble the TV cart; 2) use sketches and language, but confine yourself to short, concise instructions; 3) use only language; and 4) use only sketches. This is the Instruction Production Study. As expected, the steps corresponded to the major object parts, yielding 5 steps. Of the many possible sequences, participants primarily used two, corresponding to mechanical ease of assembly. The students had been divided by a median split into high and low spatial ability on the basis of spatial tests of mental rotation (Vandenburg & Kuse, 1978) and perspective-taking (Money, & Alexander 1966). There were vast differences in the sketches produced by high and low ability participants, which we describe below. In a second study, the Instruction Rating Study, a subset of instructions from the production study were selected to span a range of sketch manners and techniques. These were given to a new group of participants, as before, split into high and low ability, who first assembled the TV cart using one of the sets of instructions and then rated the instructions for quality. As noted, there were large differences in the sketches produced by high and low spatial ability participants. Interestingly, the more sophisticated techniques used in the sketches produced by those with high ability were exactly those preferred by participants of all ability levels in the rating study.

-------------------------------

Insert Figure 4 about here

-------------------------------

What were the diagrammatic techniques that the high ability participants used and that received high ratings?