Causal maps and Bayes nets: A cognitive and computational account of theory-formation
Alison Gopnik
University of California at Berkeley
Clark Glymour
Carnegie Mellon University
and
University of California at San Diego
David Sobel
University of California at Berkeley
This paper was presented at the International Congress on Logic, Methodology and Philosophy of Science, Cracow, Poland, August, 1999 and the Rutgers Conference on the Cognitive Basis of Science, New Brunswick, November, 1999. Author’s address for correspondence: Alison Gopnik, Dept. of Psychology, University of California at Berkeley, Berkeley, CA, 94720. Gopnik@socrates.berkeley.edu.1
Long Abstract
We outline a more precise and detailed cognitive and computational account of the "theory theory." Many everyday theories and everyday theory changes involve a type of representation we call a "causal map": an abstract, coherent, representation of the causal relations among events. Such relations are not, for the most part, directly observable, but they can often be accurately inferred from observations of patterns of contingency and correlation among events and of the effects of experimental interventions. We can think of everyday theories and theory-formation processes as cognitive systems that allow us to recover an accurate causal map of the world.
This kind of knowledge can be perspicuously represented by the formalism of directed graphical causal models, or "Bayes nets". This formalism provides a natural way of representing causal relations, it allows for their use in prediction and experimental intervention, and it provides powerful tools for reliably inferring causal structures from patterns of evidence. Human causal inference and theory formation may involve more heuristic versions of similar computations. Preliminary experimental results suggest that 2-4 year old children may spontaneously construct new causal maps and that they assume one of the fundamental axioms of Bayes net systems, the Markov Condition. We suggest further experiments to test these ideas and we also suggest ways that the formalism itself could be usefully extended in the light of psychological considerations.
Recently cognitive psychologists have argued that much of our adult knowledge, particularly our knowledge of the physical, biological and psychological world, consists of "intuitive" or "naïve" or "folk" theories (Murphy and Medin 1985; Rips 1989). Similarly, cognitive developmentalists argue that children formulate and revise a succession of such intuitive theories (Carey 1985; Gopnik 1988; Keil 1989; Wellman 1990; Wellman and Gelman 1997; Gopnik & Meltzoff, 1997). This idea, which we have called the "theory theory", rests on an analogy between everyday knowledge and scientific theories. Advocates of the theory theory have drawn up lists of features that are shared by these two kinds of knowledge. These include static features of theories, such as their abstract, coherent, causal, counterfactual-supporting character, functional features of theories such as their ability to provide predictions, interpretations and explanations, and dynamic features such as theory changes in the light of new evidence (see Gopnik and Wellman 1994) The assumption behind this work is that there are common cognitive structures and processes, common representations and rules, that underlie both everyday knowledge and scientific knowledge.
If this is true it should be possible to flesh out the nature of those cognitive structures and processes in more detail. Formulating the analogy between science and development has been an important first step, but it is time to try to describe in some detail the representations and rules that could underpin both these types of knowledge. Ideally, such an account should include ideas about the computational character of these representations and rules, and eventually even their neurological instantiation. This is the project that has been so successful in other areas of cognitive science, particularly vision science.
In this paper we will outline a more developed cognitive and computational account of the theory theory. In particular, we will argue that many everyday theories and everyday theory changes involve a type of representation we will call a "causal map". A causal map is an abstract representation of the causal relationships among kinds of objects and events in the world. Such relationships are not, for the most part, directly observable, but they can often be accurately inferred from observations. This includes both observations of patterns of contingency and correlation among events as well as observations of the effects of experimental interventions. We can think of everyday theories and theory-formation processes as cognitive systems that allow us to recover an accurate causal map of the world.
We will argue that this kind of knowledge can be perspicuously represented by the formalism of directed graphical causal models, more commonly known as Bayes nets (Pearl 1988; Spirtes, Glymour, and Scheines 1993). This formalism provides a natural way of representing causal relations, it allows for their use in prediction and experimental intervention, and, most significantly, it provides powerful tools for reliably inferring causal structures from patterns of evidence. In recent work in artificial intelligence, systems using Bayes nets can infer accurate, if often incomplete, accounts of causal structure from suitable correlational data. We will suggest that human causal inference and theory formation may involve more heuristic versions of similar computations. In particular, we will suggest that human beings assume one of the fundamental axioms of Bayes net systems, the Markov Condition. Finally, we will provide some preliminary empirical results which suggest that young children spontaneously construct new causal maps and that they do so in a way that respects the basic assumptions of the Bayes net formalism. We will suggest further experimental investigations that could be used to test and elaborate our hypothesis. And we will suggest ways that the formalism itself could be usefully extended in the light of psychological considerations.
2. Theory-formation and the causal inverse problem.
The most successful theories in cognitive science have come from studies of perception. The visual system, whether human or robotic, has to recover and reconstruct three-dimensional information from the retinal (or fore-optic) image. One aspect of vision science is about how that reconstruction can be done computationally, and about how it is done in humans. Although accounts are very different in detail they share some general assumptions, in particular these: (1) visual systems, whether human or automated, have an objective problem to solve: they need to discover how three-dimensional moving objects are located in space. (2) the data available are limited in particular ways. Organisms have no direct access to the external world. Rather, the external world causes a flow of information at the senses that is only indirectly related to the properties of the world itself. For example, the information at the retina is 2-dimensional, while the world is 3-dimensional. (3) Solutions must make implicit assumptions about the ways that objects in the world produce particular patterns – and successions of patterns –on the retina. The system can use those assumptions to recover spatial structure from the data. In normal conditions, those assumptions lead to veridical representations of the external world. But these assumptions are also contingent; if the assumptions are violated then the system will generate incorrect representations of the world (as in perceptual illusions) (see Palmer 1999).
We propose an analogous problem about discovering the causal structure of the environment. (1) There are causal facts, as objective as facts about objects, locations, and states of relative motion, used and evidenced in prediction, intervention and control, and partially revealed in correlations. (2) The data available are limited in particular ways: children and adults may observe associations they cannot control or manipulate; they may observe features they can only control or manipulate indirectly, through other objects or features; the associations they observe, with or without their own interventions, may involve an enormous number of features, only some of which are causally related. (3) Human beings have a theory-formation system, like the visual system, that recovers causal facts by making implicit assumptions about the causal structure of the environment. Those assumptions are contingent; where they are false, causal inference, whether in learning new causal relationships or in deploying old ones, may fail to get things right.
3.Causal maps
What kinds of representations might be used to solve the causal inverse problem? The visual system seems to use many very different types of representations and rules to solve the spatial problem. In some cases, like the case of translating two-dimensional retinal information to three-dimensional representations, the kinds of representations and the rules that generate them may be relatively fixed and "hard-wired". However, other, more flexible, kinds of spatial representations are also used
In particular, since Tolman, cognitive scientists have suggested that organisms solve the spatial inverse problem by constructing "cognitive maps" of the spatial environment (Tolman 1932; O'Keefe and Nadel 1978; Gallistel 1990). These cognitive maps provide animals with representations of the spatial relations among objects. Different species of animals, even closely related species, may use different types of cognitive maps. There is some evidence suggesting the sorts of computations that animals use to construct spatial maps, and there is even some evidence about the neurological mechanisms that underpin those computations. In particular, O’Keefe and Nadel (1978) proposed that these mechanisms were located in the rat hippocampus.
There are several distinctive features of cognitive maps. First, such maps provide non-egocentric representations. Animals might navigate through space, and sometimes do, egocentrically, by keeping track of the changing spatial relations between their bodies and objects as they move through the spatial environment. In fact, however, cognitive maps are not egocentric in this way. They allow animals to represent geometric relationships among objects in space, independently of their own relation to those objects. A cognitive map allows an animal who has explored a maze by one route, to navigate through the maze even if it is placed in a different position initially. This aspect of cognitive maps differentiates them from the kinds of cognitive structures proposed by the behaviorists, which depend on associations between external stimuli and the animal’s own responses. This, of course, made Tolman one of the precursors of the cognitive revolution.
Second, cognitive maps are coherent. Rather than just having particular representations of particular spatial relations, cognitive maps allow an animal to represent many different possible spatial relations, in a generative way. An animal who knows the spatial layout of a maze can use that information to make many new inferences about objects in the maze. For example, the animal can conclude that if A is north of B and B is north of C than A will be north of C. The coherence of cognitive maps gives them their predictive power. An animal with a spatial map can make a much wider variety of predictions about where an object will be located, than can an animal restricted to egocentric spatial navigation. It also gives cognitive maps a kind of interpretive power; an animal with a spatial map can use the map to resolve ambiguous spatial information.
Third, cognitive maps are learned. Animals with the ability to construct cognitive maps can represent an extremely wide range of new spatial environments, not just one particular environment. A rat moving towards a bait in a familiar maze is in a very different position than, say, a moth moving towards a lamp. The moth appears to be hard-wired to respond in set ways to particular stimuli in the environment. The rat in contrast moves in accordance with a learned representation of the maze. This also means that spatial cognitive maps may be defeasible. As an animal explores its environment and gains more information about it, it will alter and update its map of that environment. In fact, it is interesting that the hippocampus, which, in rats, seems to be particularly involved in spatial map-making, also seems particularly adapted for learning and memory. Of course, this general learning ability depends on innate learning mechanisms that are specialized and probably quite specific to the spatial domain such as, for example, "dead reckoning" mechanisms (see Gallistel, 1990)..
Fourth, and connected to the third point, animals who construct cognitive maps often seem driven to do so independently of the particular immediate goals that those maps might serve. Rats placed in a new maze will explore the environment even if there is no immediate reward for doing so. They act in the environment in particular ways that allow them to construct accurate maps. Of course, these exploratory behaviors may lead to rewards in the long run. By expending energy to construct an accurate map of its environment the animal will eventually reap adaptive benefits.
Our hypothesis is that human beings construct similar representations that capture the causal character of their environment. This capacity plays a crucial role in the human solution to the causal inverse problem. These causal maps are what we refer to when we talk about everyday theories. Everyday theories are non-egocentric, abstract, coherent, learned representations of causal relations among events, and kinds of events, that allow causal predictions, interpretations and interventions.
Note that we are not proposing that we actually use spatial maps for the purpose of representing or acquiring causal knowledge, or that we somehow extend spatial representations through processes of metaphor or analogy. Rather we want to propose that there is a separate cognitive system with other procedures devoted to uncovering causal structure, and that this system has some of the same abstract structure as the system of spatial map-making with which it must in many cases interact. We also do not mean that knowledge of causal relations is developed entirely independently of knowledge of spatial facts, but that there are special problems about learning causal relationships, and special types of representations designed to solve those problems.
Just as cognitive maps may be differentiated from other kinds of spatial cognition, causal maps may be differentiated from other kinds of causal cognition. Given the adaptive importance of causal knowledge, we might expect that a wide range of organisms would have a wide range of devices for recovering causal structure. Animals, including human beings, may have some hard-wired representations which automatically specify that particular types of events lead to other events. For example, animals may always conclude that when one object collides with another the second object will move on a particular trajectory. These sorts of specific hard-wired representations could capture particular important parts of the causal structure of the environment. This is precisely the proposal that Michotte (1962) and Heider (1958) made regarding the "perception" of both physical and psychological causality. Animals might also be hard-wired to detect specific kinds of causal relations that involve especially important events, such as the presence of food or pain. Such capacities appear to be involved in phenomena like classical conditioning or the Garcia effect, in which animals avoid food that leads to poisoning ((Palmerino, Rusiniak, and Garcia 1980).
Animals could also use a kind of egocentric causal navigation, they might calculate the causal consequences of their own actions on the world and use that information to guide further action. Operant conditioning is precisely a form of such egocentric causal navigation. Operant conditioning allows an animal to calculate the novel causal effects of its own actions on the world, and to take this information into account in future actions. More generally, trial-and-error learning seems to involve similar abilities for egocentric causal navigation.
Causal maps, however, would confer the same sort of advantages as spatial maps (Campbell 1995). With a non-egocentric causal representation of the environment, an animal could predict the causal consequences of an action without actually having to perform it. The animal could produce a new action that would bring about a particular causal consequence, in the same way that an animal with a spatial map can produce a new route to reach a particular location. Similarly, an animal with a causal map could update the information in that map simply by observing causal interactions in the world, for example, by observing the causal consequences of another animal’s actions, or by observing causal phenomena in the environment. The animal could then use that information to guide its own goal-directed actions. The coherence of causal maps allows a wide range of predictions. Just as an animal with a spatial map could make transitive spatial inferences (if A is north of B and B is north of C, then A will be North of C) animals with causal maps could make transitive causal inferences (If A causes B and B causes C then A will cause C). A causal map would allow for a wide range of causal predictions and also allow a way of interpreting causally ambiguous data.
Since causal maps are learned they should give animals an opportunity to master new causal relations, not just whatever limited set might be "hard-wired" perceptually. We would expect that animals would perpetually extend, change and update their causal maps just as they update their spatial maps. Finally, we might expect that animals designed to construct causal maps would do so independently of specific rewards. In particular, experimentation would serve the same role in constructing causal maps that exploration serves in constructing causal maps. Animals that construct causal maps would be driven to systematically explore the causal properties and relations of objects in their environment, independently of the specific functions those objects and events might serve.
4. Theories as causal maps
"Everyday" or "folk" theories seem to have much of the character of causal maps. Such everyday theories represent causal relations among a wide range of objects and events in the world independently of the relation of the observer to those actions. They postulate coherent relations among such objects and events which support a wide range of predictions, interpretations and interventions. Because of their causal character, they support counter-factual reasoning – many philosophers have argued for a close conceptual link between causal and counterfactual reasoning. And both children and adults seem to have a kind of explanatory theory formation drive which leads them to actively experiment with the world to construct new causal accounts of events. In fact, one of us has argued elsewhere that explanation just is the phenomenological mark of the fulfillment of this drive (Gopnik 1998). Moreover, theories, like causal maps, are learned through our experience of and interaction with the world around us. Because of this, the theory theory has been especially prominent as a theory of cognitive development.
These are also features that unite everyday theories and scientific theories. While not all scientific theories are causal, causal claims and inferences do play a central role in most scientific theories (see Salmon 1984, Cartwright 1989). Moreover, when scientific theories are less concerned with causal structure it tends to be because these theories involve formal mathematical structure instead. However, this is also one way in which scientific theories appear to be unlike everyday theories. Scientific theories seem to be acausal chiefly only in so far as they are formulated in explicit formal and mathematical terms. Everyday theories are rarely formulated in such terms.
Moreover, the idea of causal maps seems to capture the scope of "theory theories" very well. The theory theory has been very successfully applied to our everyday knowledge of the physical, biological and psychological worlds. However, the theory theory does not seem to be as naturally applicable to other types of knowledge, for example, purely spatial knowledge, syntactic or phonological knowledge, musical knowledge or mathematical knowledge. Nor does it apply to the much more loosely organized knowledge involved in empirical generalizations, scripts or associations (Gopnik & Meltzoff 1997). But these types of knowledge also do not appear to involve causal claims in the same way. Conversely some kinds of knowledge that do involve causal information, like the kinds of knowledge involved in operant or classical conditioning, do not seem to have the abstract, coherent, non-egocentric character of causal maps, and we would not want to say that this sort of knowledge was theoretical.
Some earlier accounts have proposed that theories should be cognitively represented as connectionist nets (Churchland 1990 or as very generalized schemas (Giere 1992). The difficulty with these proposals is that they seem to be too broad to capture what makes theories special, practically any kind of knowledge can be represented as nets or schemas. On the other hand, more modular accounts, such as that of Keil (1995) have proposed that there are only a few specific explanatory schemas, roughly corresponding to the domains of physics, biology and psychology, that are used in everyday theories. These proposals do not seem to capture the wide range of explanations that develop in everyday life, nor the way that in cognitive development and in science we move back and forth among these domains. The idea of causal maps seems to capture both what is general and what is specific about everyday theories.
5. Learning causal maps
If the causal maps idea is correct, we can rephrase the general causal inverse problem more specifically. How do we recover causal maps from the data of experience? How can we learn a new causal map? The epistemological problems involved in recovering causal information are just as grave as those involved in recovering spatial information. Hume posed the most famous of these problems, that we only directly perceive correlations between events, not their causal relationship. How can we make reliably correct inferences about whether one event caused the other, and how can we dynamically correct errors when we make them? Causation is not just association, or contiguity in space, or priority in time, or all three, but often enough that is our evidence.
It gets worse. In many cases, we make inferences about causes that are themselves unobservable. We not only assume that one event causes another, but we assume that this happens because of unobserved, and sometimes unobservable, intervening causes. Something in a stick makes it ignite, something in a plant makes it grow, something in a person’s mind makes him act. But we only observe the events of the stick igniting, the plant growing, or the other person acting. How do we know what caused those events?
Moreover, causal structures rarely just involve one event causing another. Instead, events involve many different causes interacting in complex ways. A system for recovering causal structure has to untangle the relations among those causes, and discount some possible causes in favor of others. Finally, many causal relations may be probabilistic rather than deterministic. Smoking increases the chances that you will get cancer but it doesn’t ensure that you will get cancer. And, given the "noise" in the data, the evidence for even deterministic causal relations will usually be probabilistic. Even if one event does deterministically cause another it is unlikely that we will always observe the two events co-occur. The system must be able to deal with probabilistic information.
We propose that the cognitive system makes certain assumptions about how patterns of events indicate causal structure, just as the visual system makes assumptions about how retinal patterns indicate spatial structure. These assumptions help solve the causal inverse problem. Broadly speaking, there are three different kinds of assumptions that might be used to help solve the general problem of discovering causal relations
First, we might propose what we will call substantive assumptions. Some perceptual features of events might be taken to indicate causal structure The Michottean perceptual causal principles have this character, although, of course, they would only allow a very limited set of causal conclusions. There might also be broader, and more general assumptions of this kind, which could underpin a wider range of causal inferences, temporal sequence, for example – effects cannot precede causes. Similarly, we might propose that we automatically interpret the relation between intentions and actions as causal. The connection between my decision to raise my arm and my arm going up might automatically be read as causal. In fact, there is some evidence that infants make these sorts of substantive assumptions about causal events (Leslie and Keeble 1987, Oakes and Cohen 1990, 1994).
Alternatively, we might propose what we will call formal causal assumptions, which posit constraining relations between causal dependencies and patterns of associations. These assumptions would say that certain patterns of association or contingency among events reliably indicate causal relations. It is important to realize that this sort of account would NOT reduce causal structure to patterns of association or define causal structure in terms of association or probability. The idea is that theory-formation systems make certain fundamental assumptions about how patterns of contingency are related to causal relations, in much the same way that the visual system makes assumptions about how two-dimensional sensory information is related to three-dimensional space. Those assumptions may turn out to be wrong in individual cases, just as they may turn out to be wrong in the visual case. Overall, and in the long run, however, these assumptions will lead to accurate representations of the causal structure of the world. Again, as in the spatial case, this would explain why they were selected for by evolution. Some time ago Watson provided important evidence suggesting that infants are, in fact, sensitive to contingency information and argued that they use this information to draw causal conclusions (Watson 1967; Watson 1979; Bahrick and Watson 1985).
A third type of causal assumption, which might be called a "theory-driven" assumption, has actually had the most attention recently. As a result of our earlier experience, we may develop general ideas about which types of events cause other types of events and which do not. For example, we may believe, in general, that beliefs and desires cause actions or that animals’ colors do not cause their growth. We can use that general prior knowledge to make correct inferences about causal structure in particular cases.
This is the kind of reasoning that has been the focus of attention in "theory theory" accounts of both adult’s and children’s knowledge. Both adults and even very young children use this sort of knowledge extensively in their everyday understanding of the world, and in their everyday categorization and inference. In fact, this has been a major and important discovery of recent research in cognitive development. Earlier theories of cognition, particularly Piaget’s theory, characterized preschoolers as "precausal". More recent work reveals that preschoolers have extensive causal knowledge about everyday physics (Bullock and Gelman 1979), biology (Gelman and Wellman 1991) and psychology (Gopnik and Wellman 1994; Harris, German, and Mills 1996).
This third type of knowledge, however, somehow has to be derived from the other two types of knowledge. Causal knowledge can’t just appear out of the blue. These theory-driven general causal principles might be innately specified, which would make them like the Michottean substantive principles (for accounts such as this see Spelke et al. 1992, Keil 1995). Or they might be learned directly by observing the associations among events in the world. Or they might be developed through a combination of substantive and formal causal assumptions. Recent work in developmental psychology suggests how this last kind of development might take place.
6. Bootstrapping causal maps
In earlier work on the theory theory one of us proposed a general picture of development (Gopnik and Meltzoff 1997). On this view, infants are born with substantive innate theories as well as with mechanisms for revising those theories in the light of new evidence. The existing theories constrain children’s interpretations of new evidence, and provide a starting point for theory revision. We can formulate this general picture more specifically in terms of causal maps. This sort of account would combine substantive and formal assumptions about causal structure. Children might begin with a set of substantive principles about how to divide the world up into possible causes and plausible causal relationships. These ideas would place limits on which patterns of associations they would attend to. In turn, however, information about associations could lead children to revise their initial substantive principles, and to develop new substantive principles. Contingency patterns could eventually trump initial substantive assumptions about causal relations. Children could use patterns of contingency, for example, to learn about causal relations that violate assumptions of spatial or temporal contiguity or that apply to new kinds of data or new objects. On this view, substantive causal assumptions would provide a "starting-state" which would allow theory-formation to get off the ground. However, formal causal assumptions, assumptions about how associations specify causes, would be the principal engine of theory change. Developing and testing the details is a formidable project.
Like causal maps themselves, this kind of causal learning is an interesting kind of half-way point between domain-general and domain-specific mechanisms of cognitive development. Unlike the usual domain-specific mechanisms, causal inference procedures can be applied to input from many domains. Causal learning need not be limited to information about people or plants or objects. Causal learning need not be restricted to postulating particular types of causes for particular effects. We might explain a human action in terms of some combination or interaction among physical, psychological and biological causes. Both children and scientists may postulate genuinely new types of causal entities and mechanisms to explain the data. For example, the "theory of mind" literature suggests that children postulate a new type of causal entity, a mental representation, to explain certain psychological phenomena (Perner 1991; Gopnik & Wellman, 1994).
On the other hand, causal learning of the kind we are describing is more constrained than traditional domain-general learning mechanisms, such as logical inference on the one hand, or associations or connections on the other. Causal inferences are not themselves necessarily deductive, they depend on contingent assumptions about how evidence and causal structure are related. Causal inferences also go beyond mere associations. The process of causal learning we have described is quite different from the process of simply capturing or matching regularities, even high-level regularities, in the input, as classic associationist accounts or contemporary connectionist and dynamic systems accounts typically do.
7. Bayes nets as causal maps.
We propose, then, that children use causal maps: non-egocentric, abstract, coherent representations of causal relations among objects. They can modify or alter these maps, and construct new maps, by considering the patterns of associations among events and by assuming that these patterns of association indicate causal structure in a reliable way. An adequate representation of how such maps work must do three things: (1) It must show how they can be used to enable an agent to infer the presence of some features from other features of a system; it must allow for accurate predictions (2) it must show how an agent is able to infer the consequences of her own or others actions; it must allow for appropriate interventions (3) it must show how causal knowledge can be learned from the agent’s observations and actions.
There has recently been a great deal of computational work investigating such representations and mechanisms. The representations commonly called Bayes nets can model complex causal structures and generate appropriate predictions and interventions. Moreover, we can use Bayes nets to infer causal structure from patterns of associations, whether from passive observation or from action. A wide range of normatively accurate causal inferences can be made, and, in many circumstances, they can be made in a computationally tractable way. The Bayes net representation and inference algorithms allow one sometimes to uncover hidden unobserved causes, to disentangle complex interactions among causes, and to make inferences about probabilistic causal relations (see Pearl 1988; Spirtes, Glymour, and Scheines 1993; 2000; Jordan 1998; Glymour and Cooper 1999).
This work has largely taken place in computer science, statistics and philosophy of science, and has typically been applied in one-shot "data-mining" in a range of subjects, including space physics, minerology, economics, biology, epidemiology and chemistry. In these applications there is typically a large amount of data about many variables that might be related in a number of complex ways. Bayes net systems can automatically determine which underlying causal structures are compatible with the data, and which are ruled out. But these computational theories might also provide important suggestions about how human beings, and particularly young children, recover and represent causal information. Work in computer vision has provided important clues about the nature of the visual system, and this work might provide similar clues about the nature of the theory formation system. Causal maps might be a kind of Bayes net.
Earlier we described how different sets of assumptions could be used to solve the causal inverse problem. In particular, we pointed to "formal assumptions" about how patterns of contingency and correlation among events can be used to infer causal structure. Now we want to talk in more detail about just what those assumptions might be like and how they might be instantiated in a computational system.
Causal relations in the world lead to certain characteristic patterns of events. Hume called this pattern "constant conjunction", if A causes B then when A occurs, B will follow. But this claim seems to be too narrow. B may not ALWAYS follow A but only generally or usually follow A. More recently philosophers have suggested a different formulation, if A causes B, the occurrence of A will change the probability that B will occur. We might think that this could provide us with a way of solving the causal inverse problem. When we see that A is correlated with B, that is that the probability if A is consistently related to the probability of B, we can conclude that A caused B (or vice-versa).
But there is a problem. The problem is that other events might also be causally related to B. For example, some other event C might be a common cause of both A and B. A doesn’t cause B but whenever C occurs both A and B will occur together. For example, I may notice that when I drink wine in the evenings, I am likely to have trouble sleeping. It could be that the wine is causing my insomnia. But it could also be that I usually drink wine in the evenings when I go to a party. The excitement of the party might be keeping me awake, independently of the wine. The party might both cause me to drink wine, and cause me to be insomniac, and this might be responsible for the correlation between the two events. A would be correlated with B and yet it would be wrong to conclude that there was a causal relation between them.
Clearly, what we need in these cases is to have some way of considering the probability of A and B relative to the probability of C. Many years ago the philosopher of science Hans Reichenbach proposed one natural way of doing this, which he called "screening off" (Reichenbach 1956). Consider the case of my insomnia. How could I find out which causal hypothesis is correct? I could observe the relative probabilities of the three events. If I observe that I only have insomnia when I drink wine at parties, and not when I drink wine by myself, I could conclude that the parties are the problem. If I observe that I only have insomnia when I drink wine at parties, and not when I abstain at parties, I could conclude that the wine is the problem. Or I might observe that both factors contribute independently to my sleeplessness.
We can represent this sort of reasoning more formally as follows. If A, B and C are the only variables and A is only correlated with B conditional on C, C "screens off" A as a cause of B – C rather than A is the cause. If A is correlated with B independent of C, then C does not screen off A and A causes B.
This sort of "screening off" reasoning is ubiquitous in science. In experimental design we control for events that we think might be confounding causes. If I wanted to experimentally test the causes of my insomnia I could deliberately try solitary drinking or sober partying and see what happened. In observational studies we use techniques like partial correlation to control for confounding causes. In effect, what I did in my reasoning about my insomnia was to "partial out" the effects of partying from the wine-insomnia correlation.
The trouble with the reasoning we’ve described so far is that it’s limited to these rather simple cases. But, of course, in real life events may involve causal interactions among dozens or even hundreds of variables rather than just three. And in real life, the relations among variables may be much more complicated than either of the simple structures we described above. In real life, the causal relations among variables may also have a variety of different structures, A might be linearly related to B, or there might be other more complicated functions relating A and B, or A might inhibit B rather than facilitating it. In real life, the causal relations might involve Boolean combinations of causes. A and B together might cause C, though either event by itself would be insufficient. And, finally, in real life, there might be other unobserved hidden variables, ones we don’t know about, that are responsible for patterns of correlation. Is there a way to generalize the "screening off" reasoning we use in the simple cases to these more complicated ones? Would a similar method explain how more complicated causal relations might be learned? The Bayes net formalism provides such a method.
9. Bayes nets and their uses
Bayes nets are directed graphs, like the one shown below. The nodes or vertices of the graph represent variables, whose values are features or properties of the system, or collection of system to which the net applies. "Color," for example, might be a variable with many possible values; "weight" might be a variable with two values, heavy and light, or with a continuum of values. When Bayes nets are given a causal interpretation, a directed edge from one node or variable to another, X toY, for example, says that an intervention that varies the value of X but otherwise does not alter the causal relations among the variables will change the value of Y. In short, changing X will cause Y to change.
For each value of each variable, a probability is assigned, subject to a fundamental rule, the Markov assumption. The Markov assumption is a generalization of the "screening off" property we just described. It says that if the edges of the graphs represent causal relations, then there will only be some patterns of probabilities of the variables, and not others. The Markov assumption constrains the probabilities that can be associated with a network. It says that the various possible values of any variable, X, are independent of the values of any set of variables in the network that does not contain an effect (a descendant of X), conditional on the values of the parents of X. So, for example, applied to the following directed graph
_________________________________________________________________
Insert Fig. 1 Here
__________________________________________________________________
the Markov assumption says that X is independent of {R, Z} conditional on any values of variables in the set {S,Y}.
Bayes nets allow causal predictions. Information that a system has some property or properties often changes the probabilities of other features of the system. The information that something moves spontaneously, for example, may change the probability that it is animate. Such changes are represented in Bayes nets by the conditional probability of values of a variable given values for another variable or variables. Bayes net representations simplify such calculations in many cases. In the network above, the probability of a value of X conditional on a value of R may be calculated from the values of p(R | S), p(S) and p(X | S). (See (Pearl 88)l, 1988). This allows us to predict the value of X if we know the value of R.
In planning we specifically predict the outcome of an action. The probabilities for various outcomes of an action that directly alters a feature are not necessarily the same as the probabilities of those outcomes conditional on that altered feature. Suppose R in the graph above has two values, say red and pink. Because the value of S influences R, the conditional probabilities of values of S given that R = red will be different from the conditional probabilities of values of S given that R = pink. Because S influences X, the probabilities of values of X will also be different on the two values of R. Observing the value of R gives information about the value of X. But R has no influence on S or X, either direct or indirect, so if the causal relations are as depicted, acting form outside the causal relations represented in the diagram to change the value of R will do nothing to change the value of S or X. It is possible to compute over any Bayes network which variables will be indirectly altered by an action or intervention that directly changes the value of another variable. It is also possible to compute the probabilities that indirectly result from the intervention. These calculations are sometimes possible even when the Bayes net is an incomplete representation of the causal relations. (See Spirtes, Glymour, and Scheines 1993, Pearl and Verma T. 1991; Glymour and Cooper 1999).
Bayes nets thus have two of the features that are needed for applying causal maps: they permit prediction from observations, and they permit prediction of the effects of actions. With an accurate causal map, that is the correct Bayes net representation, we can accurately predict that y will happen when x happens, or that a particular change in x will lead to a particular change in y, even when the causal relations we are considering are quite complex. Similarly, we can accurately predict that if we intervene to change x then we will being about a change in y. Bayes nets have another feature critical to cognition: they can be learned.
10. Learning Bayes nets.
In "data mining" applications Bayes nets have to be inferred from uncontrolled observations of variables. To do this, the Markov assumption is usually supplemented by further assumptions. The additional assumptions required depend on the learning procedure. (A detailed survey of several learning algorithms is given in essays in Glymour and Cooper, 1999.) One family of algorithms uses Bayes Theorem to learn Bayes nets. Another class of algorithms learns the graphical structure of Bayes nets entirely from independence and conditional independence relations among variables in the data, and requires a single additional assumption. We will describe some features of the latter family of algorithms.
The additional assumption required is faithfulness: the independence and conditional independence relations among the variables whose causal relations are described by a Bayes net must all be consequences of the Markov assumption applied to that network. For example, in the figure above, it is possible to assign probabilities so that S and X are independent, although the Markov assumption implies no such independence. (We can arrange the probabilities so that the association of S and X due to the influence of S on X is exactly canceled by the association of S and X due to the influence of Y on both of them.) The faithfulness assumption rules out such probability arrangements.
The faithfulness assumption, is essentially a simplicity assumption. It is at least logically possible that the contingencies among various causes could be randomly arranged in a way that would "fool" a system that used the causal Markov condition. For example, to go back to the earlier example, wine might contain both chemicals that disrupt REM sleep and chemicals that cause an initial sedating effect. The effects of those different chemicals might, just by random coincidence, be perfectly balanced in such a way that one completely offset the other. If this were true we might never observe any correlation between wine and sleep, and we might conclude, incorrectly, that the two were causally unrelated. The faithfulness condition assumes that in the real world such sinister coincidences will not take place.
The learning algorithms for Bayes nets are designed to be used either with or without background knowledge of specific kinds. In addition to the Markov assumption and the faithfulness assumption we may add other assumptions about how causes are related to events. We may combine the formal assumptions with substantive or theory-driven assumptions. For example, an agent may, and a child typically will, know the time order in which events occurred., and may believe that some causal relations are impossible and others certain. Information of that sort is used by the algorithms. For example, suppose the child, or whatever agent, knows that events of kind A come before events of kind B which come before events of kind C. Suppose the true structure were:
_________________________________________________________________
Insert Fig. 2 Here
__________________________________________________________________
Given data in which A, B and C are all associated, a typical Bayes net learning algorithm such as the TETRAD II "Build" procedure (Scheines, et al., 1994) would use the information that A precedes B and C to test only whether B and C are independent conditional on A. Finding that conditional independence, the algorithm will conjecture the structure in figure 2. No other structure is consistent with the associations, the conditional independence, the time order, and the Markov and faithfulness assumptions.
13. Bayes nets and animals.
There is a substantial literature on learning in animals, in the tradition of operant and classical conditioning. The traditional interpretation of that literature was that it reflected the operation of fairly simple associative mechanisms. More recently, however, a number of authors have pointed to a more cognitive, and specifically, causal, interpretation of this literature (see e.g. Shanks and Dickinson 1987). As we pointed out above, operant conditioning works because it recovers veridical information about causal structure. It does so, however, for a very limited range of causal relations. In operant conditioning these relations specifically involve relations between the actions of an animal, the events those actions bring about, and adaptively significant reinforcement events.
In contrast, adult human causal judgments, causal maps and Bayes nets apply to a very wide range of causal relations, indeed causal relations among any two events, whether or not those events are adaptively significant or involve the animals’ own actions. There is some evidence that even chimpanzees contrast with human beings on this dimension. That is, while chimpanzees are extremely adept at trial and error learning, they seem to have much more difficulty learning about causal relations among events that are independent of their own actions (see Tomasello and Call 1997; Povinelli, 1997, in press).
However, there is also evidence that animals might use learning procedures of the Bayes net type, and might respect the Causal Markov condition, for this limited range of egocentric causal relations. In particular, the phenomenon known as "blocking" in classical conditioning (Kamin, 1969) is a form of "screening off". An animal who over several trials experiences a tone followed by a shock, and then experiences a tone and light together followed by a shock, will avoid the tone rather than the light. Similarly, there is also some evidence that animals seem to use a form of "screening off" in "backwards blocking". An animal who sees a tone and light together followed by the shock, and then simply sees that the light is not followed by the shock, will avoid the tone when it occurs alone (Miller and Matute 1996) . Adult humans also show forwards and backwards blocking effects in their causality judgments. (Shanks 1985; Wasserman and Berglan 1998).
Note that forwards blocking can also be explained by the simpler associative mechanisms involved in the Rescorla-Wagner curve, but "backwards blocking" can not be. Moreover, there are other circumstances in which normative Bayes net learning models and the Rescorla-Wagner model of classical conditioning give quite different predictions. The circumstances include those in which a causal factor or "cue" is independent of the occurrence or non-occurrence of an effect, but is not independent conditional on the occurrence of other cues. Unfortunately, the relevant experiments have yet to be performed.
14. Bayes nets and adults
Human adults seem to have causal maps that go beyond the causal representations of classical or operant conditioning. Is there any empirical evidence that these maps also involve Bayes net like representations? In fact, there is some evidence that adults make causal judgments in a way that respects the assumptions of a Bayes net formalism. There is a long literature, going back to Kelley in social psychology, about the way that adults perform a kind of causal "discounting" (Kelley 1973). Adults seem to unconsciously consider the relationships among possible causes, that is to consider alternative causal graphs, when they make causal judgments. In particular, Patricia Cheng’s recent "causal power" theory turns out to be equivalent to a particular common parametrization of causal graphs in Bayes net theories (Cheng 1997) Cheng’s theory, which was empirically motivated and developed independently of the Bayes net work, makes the same assumptions about the relation between causal graphs and probabilities that are made in these AI models (Glymour, In press).
While Bayes nets provide tools for prediction and intervention, they also admit algorithms for learning new causal relations from patterns of correlation. Interestingly, however, there is little work on how adults learn about new causal relations. This is probably because adults rely overwhelmingly on their prior causal knowledge of the world in making causal judgments. They already have rich, powerful, well-confirmed theoretical assumptions about what will cause what. Because of the enormous causal knowledge adults bring with them, experimentation on adult causal learning is virtually forced to imaginary or artificial scenarios to separate learning from the application of background knowledge. In everyday life, adults may rarely be motivated to revise their earlier causal knowledge or construct new knowledge of a general kind (of course adults learn new causal particulars every day.). The cognitive problem for adults is to apply that knowledge appropriately in particular situations.
As we will see, the situation is very different for children. Interestingly it is also different in the special conditions in which adult human beings do science. By definition scientific inquiry is precisely about revising old causal knowledge and constructing new causal knowledge, science is quintessentially about learning. It is no coincidence that work on causal inference and in particular the Bayes net formalism has largely been done by philosophers of science, rather than cognitive psychologists. Human capacities for learning new causal facts about the world may be marginal for understanding much everyday adult cognition, but they are central for understanding scientific cognition.
15. Bayes nets and children.
We propose that the best place to look for powerful and generalized causal learning mechanisms, learning of the sort that might be supported by Bayes net algorithms, is in human children. Unlike adults, children cannot just rely on prior knowledge about causal relations, prior knowledge isn’t prior until after you’ve acquired it. And empirically, we have evidence that massive amounts of learning, particularly causal learning, take place in childhood. Indeed, in some respects the cognitive agenda for children is the reverse of the agenda for adults. Children are largely protected from the exigencies of acting swiftly and efficiently on prior knowledge, adults take those actions for them. But children do have to learn a remarkable amount of new information, in a relatively short time, with limited but abundant evidence.
Moreover, unlike non-human animals, children’s learning must extend well beyond the limited set of causal relations that involve adaptively important mechanisms or involve the effects of ones own actions. Both human adults and children themselves have a large store of information about causal relations that do not involve positive or negative reinforcement and are not the result of the actions (this, of course, was one of the lessons of the cognitive revolution)
Bayes net representations and computations might play an important role in the acquisition of this type of knowledge. Our hypotheses then are 1) that children construct new causal maps of the world, and 2) that, at least in part, the processes by which they do so can be described by Bayes net representations and learning mechanisms.
There are some important caveats here. Whenever we apply computational work to psychological phenomena we have no guarantee that the human mind will behave in the same way as a computer. We even have important reasons to think that the two will be different. Clearly, the computations we propose would be performed unconsciously both by children and adults. (since the three year olds we are considering are still unable to consciously add 5 and 7 it is rather unlikely that they would be consciously computing exact conditional probabilities). Moreover, it is likely, indeed, almost certain that human children rely more heavily on prior knowledge and on various heuristics than the current Bayes net "data mining" systems do.
Nevertheless we would again draw an analogy to our understanding of vision and spatial cognition. In this area there has been a thoroughgoing and extremely productive two-way interaction between computational and psychological work. While computational vision systems clearly have different strengths and weaknesses than human vision, they have proved to be surprisingly informative. Moreover, and rather surprisingly, the human visual system often turns out to use close to optimal procedures for solving the inverse problem.
We have prima facie evidence that children do, in fact, learn an almost incredible amount about the causal structure of the world around them. That is the evidence that supports the theory theory in general. The computational models that have been proposed to explain that learning have been either the highly constrained "parameter setting" models of modularity theories (see e.g. Pinker 1984), or the highly unconstrained and domain-general regularity detection of connectionist modelling (see e.g. Elman et al.. 1997). Neither of these alternatives has been satisfactory as a way of explaining children’s learning of everyday theories. We know a lot about how much and what children learn, and even know when they learn it. We know rather little, however, about how that learning could take place.
We suggest that Bayes nets may be an important addition to the developmentalists’ explanatory arsenal. In particular, Bayes nets may be an important tool in explaining how children develop and change their everyday theories – their causal maps. Bayes nets give a computational solution to the three aspects of causal maps we have emphasized, prediction, intervention and learning, and to date no other formalism does. From a psychological perspective, Bayes nets provide a formalism for thinking about and investigating how causal maps are learned and used, and they provide psychologists with a coherent framework for interpreting experimental results on causal judgment and inference.
The program we propose is therefore not to theorize that children are optimal data miners, but rather to investigate how infants and children learn causal maps, and how much (and, possibly, how little) their learning processes accord with Bayes net assumptions and heuristics.
16. Preliminary evidence: Causal knowledge in young children and the blicket detector
We have so far completed two series of experiments that test these hypotheses. The first set of experiments (Gopnik and Sobel in press.) demonstrated that children as young as two years old swiftly and spontaneously learn new causal relations. These causal relations are genuinely new – they are not based on prior knowledge and they could not be innately specified. They do not involve the child’s own actions, nor do they involve reinforcement. In fact, these experiments show that children can even learn causal relations involving previously unobserved variables. Moreover, they use information about those causal relations in their naming and categorization. Equally important, children do not behave in the same way when they are presented with non-causal associations.
The second set of experiments (Gopnik, Sobel, and Glymour, submitted) show that children as young as two years old learn these new causal relations from contingency data in ways that assume the causal Markov condition.
The existing work on the "theory theory" investigates children’s everyday causal understanding of objects, animals and people. While this work demonstrates that very young children have causal knowledge, it does not tell us where that knowledge comes from. The theory theory assumes that this knowledge is learned, but this knowledge also might be innately specified, or it might be directly taught by adults. Could children learn about a brand-new causal relation? To test this, we invented the "blicket detector," a machine that lights up and plays music when certain objects but not others are placed upon it (In fact, the machine is controlled by a human confederate, but neither adults nor children guess this). This apparatus appears to present children with a new causal relation.
In one experiment (Gopnik & Sobel, in press, Experiment 1), children between the age of 30 months and five years old were shown four blocks. Two set the machine off and two did not. After this demonstration, children were told that one of the objects that had set the machine off was a "blicket". Then the experimenter asked the child to give him the other "blicket." Importantly, children were not told that the machine was a blicket detector and had no prior exposure to this novel causal property. Children were given two types of sets of blocks, a neutral set and a conflict set. In the neutral set the blocks were either all identical, or all different. In the conflict set there were two pairs of perceptually identical objects, one member of each pair would set the machine off and the other would not. Here, the perceptual features of the object actually conflicted with the objects’ causal powers. In the neutral tasks, even the 30-month-olds in the study categorized the object on the basis of its causal power. In the conflict tasks, children of all ages were equally likely to choose the causally or perceptually similar object as the "blicket".
In a control condition (Gopnik & Sobel, in press, Experiment 2), the same machine and objects were presented to the children of the same ages with the same procedure. However, in this condition the object did not appear to be causally related to the machine. Instead of placing each object on the machine, the experimenter would hold each object over the machine. For two of the objects, he would simultaneously press a button on the detector, which activated it. For the other two, he simply held his hand on the button, but did not press it and nothing happened. Children were then told that one of the blocks which had been associated with the machine’s activity was a "blicket" and were asked to show the experimenter the other blicket. In contrast to the first experiment, children of all ages chose at chance in the neutral tasks and used the perceptual properties of the object as a basis for categorization in the conflict tasks. Children would not categorize the object as a blicket based on a mere association between that object and the machine.
This experiment suggested that children spontaneously learn new causal relations. In a second experiment we explored whether children would use "screening off" to do this. Gopnik, Sobel, & Glymour (submitted) presented three and four-year-old children with the blicket detector after a familiarization period and told them specifically that it was a blicket detector and that "blickets made the machine go." Children were then presented with two types of tasks. In the screening tasks the experimenter first put object A on the detector. The detector went off. Then he put object B on the detector. The detector did not go off. Finally, he placed both objects placed on the machine simultaneously twice in a row. The machine went off both times. Finally he asked the children if each object, individually, was a "blicket" or not.
In a control association task, the experimenter placed A on the machine by itself three times. Each time A set the machine off. Then the experimenter placed B on the machine by itself three times. It did not set the machine off the first time, but did set it off the next two times. Again, children were asked if each object individually was a blicket or not.
In both tasks A is associated with the machine’s lighting up three times, and B is associated with the machines lighting up 2 out of three times. However, "screening off" reasoning would lead the children to conclude that B was not a blicket in the screening task. It would make no prediction in the association task.
In the screening task four-year-olds overwhelmingly said that A was a blicket but B was not a blicket. While children said that A was a blicket 91% of the time, they only said B was a blicket 16% of the time. In contrast, in the association task they were equally likely to say that A and B were blickets (97% and 78% respectively, not significantly different). In the screening task, three-year-olds also said that A was a blicket more often than that B was a blicket (100% and 65% respectively, a significant difference). However, they identified B as a blicket more often than the older children did. In the association task, the three-year-olds were also equally likely to say that both objects in the association procedure were blickets (100% and 80% respectively, not significantly different). In a second experiment, 30-month-olds were given a slightly modified procedure to deal with the possibility of a "yes" bias in the younger subjects: they were asked which object was a blicket. The results were similar, children chose A as the blicket significantly more often than B in the screening task, they chose each object equally often in the association task. .
These experiments suggest, in a very preliminary way, that even very young children make causal maps and that they use "screening off" reasoning to do so. This is consistent with the idea that children, in fact, make the causal Markov assumption when they reason about causal events. Dozens of other experimental questions and experiments suggest themselves. The causal structures in these experiments were quite simple, with one effect and two or three possible causes. We could explore whether children will make accurate inferences about more complex causal structures. Experiment 2 suggests that children make causal inferences even in probabilistic cases (they think the object that sets the machine off two out of three times is a blicket). We could explore how children will reason about probabilistic causal relations as well as deterministic relations. Experiment 1 suggests that children will spontaneously postulate unobserved variables to explain causal effects. We could explore just when children will create "theoretical entities" in this way. And we can see how children combine other causal assumptions, like assumptions about time order, and specific kinds of prior knowledge, with this kind of reasoning.
13. Further computational work - Bayes net heuristics.
Just as the Bayes net formalism suggests new experiments on children, applying Bayes nets to children’s cognition suggests new computational questions. The situation of a child learning a new causal map is quite different from that of a typical data-mining program. The Bayes net formalism suggests a variety of heuristics, some explored in the artificial intelligence literature and some not, that might be used by people for concept formation and for learning causal maps. Here is a very incomplete list of suggestions.
By introducing such heuristics we could adapt the Bayes net formalism to solve problems that are more like the problems human learners solve.
14. Conclusion.
In a book published a mere three years ago the first author of this paper expressed pessimistic sentiments about the prospect of a computational account of everyday theory-formation and change. "Far too often in the past psychologists have been willing to abandon their own autonomous theorizing because of some infatuation with the current account of computation and neurology. We wake up one morning and discover that the account that looked so promising and scientific – S-R connections, gestaltist field theory, Hebbian cell assemblies – has vanished and we have spent another couple of decades trying to accommodate psychological theories to it. We should summon up our self-esteem and be more stand-offish in future".(Gopnik & Meltzoff, 1997) We would not entirely eschew that advice. Pessimism may, of course, still turn out to be justified -- what we have presented here is a hypothesis and a research program rather than a detailed and well-confirmed theory. Moreover, we would emphasize that, as in the case of computer vision, we think the computational accounts have as much to learn from the psychological findings as vice-versa. Nevertheless, sometimes it is worth living dangerously. We hope that this set of ideas will eventually lead, not to another infatuation, but to a mutually rewarding relationship between cognition and computation.
Notes
1. Versions of this paper were presented in seminars at the University of Chicago, and the California Institute of Technology and in the Cognitive Science program and Dept. of Statistics at Berkeley and we are grateful to those who commented. Conversations with Steve Palmer, Lucy Jacobs, and Andrew Meltzoff played a major role in shaping these ideas. John Campbell and Peter Godfrey-Smith also read drafts of the paper and made very helpful suggestions.
Reference List
Bahrick, Lorraine E, and John S Watson. 1985. Detection of intermodal proprioceptive-visual contingency as a potential basis of self-perception in infancy. Developmental Psychology 21, no. 6: 963-73.
Bullock, Merry, and Rochel Gelman. 1979. Preschool children's assumptions about cause and effect: Temporal ordering. Child Development 50, no. 1: 89-96.
Campbell, John. 1995. Past, space and self. Cambridge, MA: MIT Press.
Carey, Susan. 1985. Conceptual change in childhood.. Cambridge, Mass: MIT Press.
Cartwright, Nancy. 1989. Nature's capacities and their measurement.. Oxford. New York: Clarendon Press. Oxford University Press.
Cheng, Patricia W. 1997. From covariation to causation: A causal power theory. Psychological Review 104, no. 2: 367-405.
Churchland, P. M. 1990. On the nature of explanation: A PDP approach. Physica 42, no. 1-3: 281-92.
Elman, Jeffrey L, Elizabeth A Bates, Mark H Johnson, and Annette Karmiloff-Smith. Rethinking innateness: A connectionist perspective on development.
Gallistel, Charles R. 1990. The organization of learning.. Cambridge, MA: MIT Press.
Gelman, Susan A, and Henry M Wellman. 1991. Insides and essence: Early understandings of the non-obvious. Cognition 38, no. 3: 213-44.
Giere, Ronald N. 1992. Cognitive models of science. Minnesota Studies in the Philosophy of Science ; V. 15 Minneapolis: University of Minnesota Press.
Glymour, C, and G. Cooper, Eds. 1999. Computation, causation, and discovery. Menlo Park, CA: AAAI/MIT PRESS.
Glymour, Clark. In press. Bayes-Nets as psychological models. In Cognition and explanation. Eds Frank Keil, and Rob Wilson Cambridge, MA: MIT Press.
Gopnik, A. 1988. Conceptual and semantic development as theory change. Mind and Language 3: 163-79.
Gopnik, A., and A. N. Meltzoff. 1997. Words, thoughts and theories. Cambridge, MA: Bradford, MIT Press.
Gopnik, Alison. 1998. Explanation as orgasm. Minds and Machines 8, no. 1: 101-18.
Gopnik, Alison, and David Sobel. In press. Detecting blickets: How young children use information about novel causal powers in categorization and induction. Child Development .
Gopnik, Alison, and Henry M Wellman. 1994. The theory theory. In Mapping the mind: Domain specificity in cognition and culture. Eds L. Hirschfield, and S. Gelman, 257-93. xiv, 516. New York: Cambridge University Press.
Harris, Paul L., Tim German, and Patrick. Mills. 1996. Children's use of counterfactual thinking in causal reasoning. Cognition 61, no. 3: 233-59.
Heider, Fritz. 1958. The psychology of interpersonal relations. New York: Wiley.
Jordan, M.
, ed. 1998. Learning in graphical models. Cambridge, MA: MIT Press.Keil, Frank C. 1989. Concepts, kinds, and cognitive development. The MIT Press Series in Learning, Development, and Conceptual Change. Cambridge, MA: MIT Press.
Keil, Frank C. 1995. The growth of causal understandings of natural kinds. Causal cognition: A multidisciplinary debate. Eds Dan Sperber, David Premack, and Anne Premack, 234-67. New York, NY: Oxford University Press.
Kelley, Harold H. 1973. The processes of causal attribution. American Psychologist 28, no. 2: 107-28.
Leslie, Alan M, and Stephanie Keeble. 1987. Do six-month-old infants perceive causality? Cognition 25, no. 3: 265-88.
Michotte, Albert Edouard. 1962. Causalite, permanence et realite phenomenales; etudes de psychologie experimentale. Louvain: Publications universitaires.
Miller, Ralph R, and Helena Matute. 1996. Biological significance in forward and backward blocking: Resolution of a discrepancy between animal conditioning and human causal judgment. Journal of Experimental Psychology: General 125, no. 4.
Murphy, Gregory L, and Douglas L Medin. 1985. The role of theories in conceptual coherence. Psychological Review 92, no. 3: 289-316.
O'Keefe, John, and Lynn Nadel. 1978. The hippocampus as a cognitive map. New York: Oxford University Press.
Oakes, L. M., and L. B. Cohen. 1990. Infant perception of a causal event. Cognitive Development 5: 193-207.
———. 1994. Infant causal perception. In Advances in infancy research, Vol. 9. Eds C. Rovee-Collier, and L. P. LipsittNorwood, NJ: Ablex.
Palmer, S. 1999.Vision science: From photons to phenomenology. Cambridge, Mass.: MIT Press.
Palmerino, Claire C., Kenneth W. Rusiniak, and John Garcia. 1980. Flavor-illness aversions: The peculiar roles of odor and taste in memory for poison. Science 208, no. 4445: 753-55.
Pearl, J.
, and Verma T. 1991. A theory of inferred causation. Second annual conference on principles of knowledge representation and reasoning: San Mateo, CA: Morgan Kaufmann.Pearl, Judea. 1988. Probabilistic reasoning in intelligent systems. San Mateo; CA: Morgan Kaufman.
Perner, Josef. 1991. Understanding the representational mind. Learning, Development, and Conceptual Change. Cambridge, MA: MIT Press.
Pinker, Steven. 1984. Language learnability and language development. Cognitive Science Series. Cambridge, MA: Harvard University Press.
Povinelli, Daniel. in press. Folk physics for apes? New York, NY: Oxford University Press.
Reichenbach, Hans. 1956. The direction of time. Berkeley, CA.: University of California Press.
Rips, Lance J. 1989. Similarity, typicality, and categorization. In Similarity and analogical reasoning. Ed S. Vosniadou, and A. Ortony, 21-59. New York: Cambridge University Press.
Salmon, Wesley. 1984. Scientific explanation and the causal structure of the world. Princeton: Princeton University Press.
Shanks, D. R.
1985. Forward and backward blocking in human contingency judgment. Quarterly Journal of Experimental Psychology 37b: 1-21.Shanks, David R, and Anthony Dickinson. Associative accounts of causality judgment. The Psychology of Learning and Motivation: Advances in Research and Theory, Vol. 21: vii, 319.
Spelke, Elizabeth S, Karen Breinlinger, Janet Macomber, and Kristen Jacobson. 1992. Origins of knowledge. Psychological Review 99, no. 4: 605-32.
Spirtes, Peter, Glymour, Clark and Scheines, Richard. 1993. Causation, prediction and search. NY: Springer-Verlag.
Spirtes, Peter, Glymour, Clark and Scheines, Richard 2000. Causation, prediction and search . 2nd revised ed. Cambridge, MA : MIT Press.
Tolman, Edward Chace. 1932. Purposive behavior in animals and men. New York: The Century Co.
Tomasello, Michael, and Josep Call. 1997. Primate cognition. New York: Oxford University Press.
Wasserman, Edward A, and Lyndon R Berglan. 1998. Backward blocking and recovery from overshadowing in human causal judment. The role of within-compound associations. Quarterly Journal of Experimental Psychology: Comparative & Physiological Psychology 51, no. 2.
Watson, John. 1979. Perception of contingency as a determinant of social responsiveness. In The origins of social responsiveness. Ed E TohmanHillsdale, NJ: Erlbaum.
Watson, John S. 1967. Memory and "contingency analysis" in infant learning. Merrill-Palmer Quarterly 13, no. 1: 55-76.
Wellman, Henry. 1990. The child's theory of mind. Cambridge, Mass.: MIT Press.
Wellman, Henry, and Susan Gelman. 1997. Knowledge acquisition in foundational domains. Handbook of child psychology. 5th ed., Ed. D. Kuhn, and R. SieglerNew York: Wiley.

Fig. 1. A causal graph

Fig. 2 - The true structure of a causal graph