Philosophy of Scientific Experimentation III (PSX3)

October 5 - 6, 2012

University of Colorado Boulder Campus

Keynote Speakers:

  • Eric Cornell - Nobel Laureate (Physics, 2001), University of Colorado Boulder
  • Friedrich Steinle,University of Berlin


Schedule of Talks for PSX3

All talks will be held in Room S-110, Koelbel Building, Leeds School of Business, University of Colorado


9:00 AM Questions about a textbook experiment - Eric Cornell; JILA, Physics (University of Colorado), Nobel Laureate (Physics) 2001

In 1995, Carl Wieman and I got a bunch of atoms much colder than anything had ever been before.   At those wildly low temperatures, the atoms underwent Bose-Einstein condensation (BEC), just as Einstein had predicted,  70 years before.   BEC has turned out to be kind of a big deal.  I’ll talk a little about that experiment, and then ask, and try to answer, various questions,including:  Why did we do it?  How did we know we had done it?   Did we know what we were doing as we did it?  Did we see the “big deal” part of it coming?

10:00-10:15  Coffee

10:15 Experimentation On Analogues - Susan Sterrett, Carnegie Mellon University

11:15 Analogy in Experimental Practice: The Perplexing Case of Muriatic Acid - Amy Fisher, University of Puget Sound

12:15  Lunch

2:00 The False Dichotomy Between Experiment and Observation: The Case of Comparative Cognition - Irina Meketa, Boston University

3:00 The Theory Ladenness of Scientific Experimentation: Evidence from the History of Science and from Cognitive Psychology - William Brewer, University of Illinois

4:00 Coffee

4:15 Explanation and Prediction in Historical Natural Science - Carol Cleland, University of Colorado


9:00  Exploratory experiments: Situating a concept - Friedrich Steinle, Technical University of Berlin

The concept of Exploratory Experimentation was proposed in the 1990s by Burian and Steinle independently from each other, based on historical analyses of episodes from biology and physics. Since then, the concept has been used and further developed in the ongoing discussion on experimentation. At the same time, the concept can and should be located within a longer, but rarely analyzed historical tradition of thinking about experiments. In my talk, I will sketch the concept as it was put forward in the 1990s, point to a historical line into which it should be situated, and analyze the shifts in meaning and emphasis it has taken more recently. Situating the concept in such a way leads to significantly sharpening its contours.

10:00 Coffee

10:15 Experimental Cosmology - James Mattingly, Georgetown University

11:15 Experiment in Cosmology: Model Selection, Error-Elimination and Robustness - Genco Guralp, Johns Hopkins University

12:15 Lunch

1:15 Modeling Data-Acquisition in Experimentation: The Case of the ATLAS Experiment - Koray Karaca, University of Wuppertal

2:15 Extrapolation Claims in Experimental Evolution - Emily Parke, University of Pennsylvania

3:15 Coffee

3:30 Metabolism of error, standard controls and the growth of experimental techniques - Christopher Diteresi, George Mason University

4:30 Discussion

Lecture Abstracts

Questions about a textbook experiment

Eric Cornell

JILA, Physics (University of Colorado), Nobel Laureate (Physics) 2001

In 1995, Carl Wieman and I got a bunch of atoms much colder than anything had ever been before.   At those wildly low temperatures, the atoms underwent Bose-Einstein condensation (BEC), just as Einstein had predicted,  70 years before.   BEC has turned out to be kind of a big deal.  I’ll talk a little about that experiment, and then ask, and try to answer, various questions,including:  Why did we do it?  How did we know we had done it?   Did we know what we were doing as we did it?  Did we see the “big deal” part of it coming?

Exploratory experiments: Situating a concept

Friedrich Steinle

Technical University of Berlin


The concept of Exploratory Experimentation was proposed in the 1990s by Burian and Steinle independently from each other, based on historical analyses of episodes from biology and physics. Since then, the concept has been used and further developed in the ongoing discussion on experimentation. At the same time, the concept can and should be located within a longer, but rarely analyzed historical tradition of thinking about experiments. In my talk, I will sketch the concept as it was put forward in the 1990s, point to a historical line into which it should be situated, and analyze the shifts in meaning and emphasis it has taken more recently. Situating the concept in such a way leads to significantly sharpening its contours.

The Theory Ladenness of Scientific Experimentation: Evidence from the History of Science and from Cognitive Psychology

William Brewer

University of Illinois

This paper is an example of naturalized philosophy of science. I will discuss the issue of the theory ladeness of scientific experimentation (i.e., the degree to which the theoretical beliefs and knowledge of prior results of the scientists carrying out an experiment impact the data derived from the experiment). I will address this issue with evidence from the History of Science and from Cognitive Psychology.

**Demonstration of the Phenomena**
Thomas Kuhn (1962) and Norwood Hanson (1958) made important arguments for the theory ladenness of scientific observation. However, there have been many attacks on their claims (Lakatos, 1970; Suppe, 1977; Hoyningen-Huene, 1993) and so first we need to establish that there is indeed a phenomenon to be studied.
At the beginning of the 20th century the French physicist Blondlot announced the discovery of a new form of radiation he called N-rays that could be detected by small increases in the brightness of a spark gap (Klotz, 1980; Nye, 1980). However, physicists in England and the USA had trouble reproducing the findings. The American physicist Wood visited the lab and (secretly) carried out a control experiment. He modified the physical apparatus so the N-rays would not be detectable and yet observers in the laboratory continued to report the occurrence of N-rays.

In the 1920s there was a controversy between labs in Cambridge and Vienna over evidence for the artificial disintegration of the elements (Stuewer, 1985). Rutherford and Chadwick reported that only certain elements emitted protons when hit by alpha particles, whereas the Vienna lab reported that almost all elements showed the effect. Chadwick visited Vienna and found that the Viennese observers continued to report scintillations after he removed the alpha particle source (which was behind a screen).These two episodes are important because they both show that when an analytic control experiment was carried out pitting the physical effects against the beliefs of the experimenters the experimental results tracked the experimenters’ beliefs, not the physical processes, thus providing clear evidence for the theory ladenness of experimentation.
Another type of example occurs in Millikan’s measurement of the charge of the electron. Holton (1978) and Franklin (2002) have analyzed Millikan’s lab notebooks and have shown that in a number of cases Millikan used his theory to select which observations of the charge to include in his reports of his experiments.

Evidence for the theory ladenness of experimentation can also be found in the patterns of experimental results in a field plotted across the dates when the experiments were published. The most well studied sets of data of this type are the results of experiments designed to measure physical constants reported by the Task Group on Fundamental Constants (Birge, 1957; Cohen & DuMond, 1965). In a non-theory laden world one would expect an initial scatter of means with large errors bars, but, with increasing experimental precision, the results would settle down to relatively similar means with small error bars. In actual practice one finds a fair number of cases in which an initial value is followed by a stair step of results (climbing either up or down and each new result usually within the error bar of the previous experiment) that eventually, over decades, settles down more like the ideal pattern. Most commentators have attributed this pattern to the theory ladenness of experiments in which knowledge of the previous results has a strong impact on the reported results of the following experiments (“intellectual phase lock,” Cohen & DuMond, 1965; “bandwagon effect,” Franklin, 1984).

Recently there has been much discussion of the “decline effect” in which an initial very salient finding shows a declining effect size or vanishes all together in later experiments (Lehrer, 2010). In a recent paper (Brewer, 2012a) I have argued that this pattern of data is also evidence of the theory ladenness of experimentation. Overall, it seems to me that the evidence just reviewed make a powerful case for the ladenness of experimentation.

**Psychological Analysis of Theory Ladenness**
The initial discussions of theory ladenness in philosophy of science by Hanson and Kuhn focused on the theory ladenness of (visual) observation. In a series of papers (Brewer & Lambert, 1993, 2001; Brewer, 2012b) we have carried out a psychological analysis of examples of theory ladenness and argued that the initial focus on visual observation was too narrow and that theory ladenness occurs throughout the cognitive processes used in carrying out science . In these papers we review evidence from research in cognitive psychology (with nonscientists) that shows that there is theory ladenness in perception, attention, thinking, memory, and communication. While our approach is an improvement over the earlier approaches that focused only on visual observation, it seems to me that we did not devote enough thought to understanding the cognitive processes involved in carrying out experiments in modern science, and in the current paper I will attempt to do a better job of analyzing these cognitive processes.

**Normative Implications**
In my recent chapter on theory ladenness (Brewer, 2012b) I criticize my earlier papers (Brewer & Lambert, 1993, 2001) as focusing too strongly on the inhibitory role of theory driven top-down processes in science. I argue that having a theory is a two-edged sword. It can inhibit the progress of science as in the cases discussed earlier, but clearly having a theory that successfully explains some phenomena is at the core of modern science. Having a theory served the astronomers well in the discovery of Neptune, but led them astray in the “discovery” of Vulcan (a planet thought to be between Mercury and the Sun).

In Brewer (2012b) I assert that historical cases such as the ones discussed earlier suggest that physical scientists tend be exceptionally careful in controlling confounding physical factors, but less careful in controlling psychological factors, such as the beliefs of the experimenter (see Hetherington, 1983, for a similar argument). However, the recent introduction of blinding techniques in particle physics (Franklin, 2002; Klen & Roodman, 2005) introduce a very strong level of control for psychological factors.


Explanation and Prediction in Historical Natural Science

Carol Cleland

University of Colorado

ct:In earlier work (Cleland [2001], [2002]) I identified fundamental differences in the methodology of prototypical historical science and classical experimental science. The target hypotheses of prototypical historical natural science differ from those of classical experimental science in being about long-past, token events, as opposed to regularities among types of events. Hypotheses concerning long-past, token events are typically evaluated in terms of their capacities to explain puzzling associations among traces discovered through fieldwork. In contrast, the acceptance and rejection of hypotheses in classical experimental science depends upon the success or failure of predictions tested in controlled laboratory settings. I argued that these differences in practice could be epistemically justified in terms of a pervasive time asymmetry of causation. ‘The asymmetry of overdetermination’ (as it was dubbed by David Lewis ([1979])) underpins the objectivity and rationality of the methodology of prototypical historical natural science, explaining why the latter is not, as sometimes maintained, inferior to classical experimental science.

This talk explores the intimate connection between explanation and justification in prototypical historical natural science (Cleland [2011]). I begin by briefly reviewing my analysis of the practice of prototypical historical natural science, fleshing out salient details with a case study. Subsequently I address Turner’s ([2004], [2007]) and Jeffares’ ([2008]) charge that successful and failed predictions play a much more central role in the evaluation of prototypical historical hypotheses than I have acknowledged. I argue that the actual practices of historical scientists do not support this claim. Historical hypotheses about particular past events are rarely rejected in the face of failed predictions and they are often accepted in the absence of successful predictions. Indeed, as I discuss, “predictions” that are actually made in prototypical historical research are typically too vague for their success or failure to play central roles in the evaluation of the hypotheses with which they are associated. Someone who is still under the influence of the covering law model of explanation might retort that good explanations have the same logical structure as prediction; a truly adequate explanation is a potential prediction. I briefly consider and reject various attempts to accommodate historical explanation within the basic framework of the covering law model. As I discuss, the evidential warrant for hypotheses in prototypical historical science is founded upon common cause explanation, which is not prediction-like in character. Even narrative explanations, which are common in much of the historical natural sciences, depend upon the identification of common causes for their empirical justification.

Common cause explanation has long been justified in terms of the principle of the common cause. For purposes of this talk, I take the principle of the common cause to (roughly speaking) assert that seemingly improbable associations (correlations or similarities among events or states) are best explained by reference to a shared common cause. The principle of the common cause represents an epistemological conjecture about the conditions under which a certain pattern of causation may be non-deductively inferred: Most seemingly improbable coincidences are produced by common causes.

The problem with the principle of the common cause is its justification. As Sober ([1988], [2001]) and Tucker ([2004], [2011]) argue, it seems to be grounded in either purely methodological or strictly metaphysical considerations. I defend a third possibility: The justification for the principle of the common cause depends upon the truth of the thesis of the asymmetry of overdetermination, which is empirically well grounded in physics, as opposed to logic or a priori metaphysics. According to the thesis of the asymmetry of overdetermination most localized events overdetermine their past causes (because the latter typically leave extensive and diverse traces, or effects) and underdetermine their future effects (because they rarely constitute the total cause of an effect). Put another way, according to the thesis of the asymmetry of overdetermination, most localized cause and effect relations in our universe form many pronged forks opening in the direction from past to future. If the temporal structure of causal relations in our universe were different--if most causal forks opened in the opposite direction (from future to past) or most cause and effect relations were linear (one-to-one) instead of fork-like, or most events were chance (uncaused) occurrences--one would not be justified in inferring the likelihood of a common cause from a seemingly implausible association among traces (of the past). As a consequence of the causal structure of our universe, the present is filled with epistemically overdetermining traces of past events. This means that it is likely (but not certain) that seemingly improbable associations among present-day phenomena are due to a common cause. As I discuss, explicating the principle of the common cause in terms of the asymmetry of overdetermination illuminates some otherwise puzzling features of the practices of scientists engaged in prototypical historical research, such as why (contra Sober and Tucker) they exhibit a general preference all other things being equal (in the absence of empirical or theoretical information suggesting otherwise) for common cause explanations over separate causes explanations


Metabolism of error, standard controls and the growth of experimental techniques

Christopher Diteresi

George Mason University

William Wimsatt has argued that philosophical analyses of scientific reasoning should focus less on normative idealizations of inference and more on what he has suggestively called “the metabolism of error.” In this paper, I develop Wimsatt’s suggestion by articulating a specific notion of metabolizing error that I take to be characteristic of the growth and development of experimental techniques. I then consider an extended example from developmental biology in order to show how the often elaborate standard procedures for performing common techniques in experimental biology can themselves be understood and analyzed as historical records of past errors metabolized.

I begin by explicating Wimsatt’s ‘metabolism of error’ and the role it plays in his overall stance toward analyzing scientific reasoning. While ‘metabolism of error’ may be an unusual formulation, similar notions of ‘fruitful error’ and of ‘learning from mistakes’ are quite common and have long received attention from philosophers interested in scientific methodology and naturalistic epistemology. The first task of the paper is to distinguish Wimsatt’s stance toward error from other views, such as Peircean self-correction and Deborah Mayo’s ‘argument from error,’ that might be taken to be prima facie similar. I suggest that what distinguishes Wimsatt’s view is its focus on cognitive adaptations, whereby cognitive adaptation is understood broadly to include ‘scaffolding’ cultural institutions and artifacts. On this view, reasoning well is a matter of skillfully selecting, re-engineering, and locally applying the right cognitive tools for the scientific job at hand.

After explicating Wimsatt’s original notion, I then elaborate “metabolism of error” in specifically experimental terms first by articulating a pragmatic understanding of error, and then by extending the metaphor of metabolism into a pathway for coping with errors as they arise in the course of performing an experimental technique. On my elaboration, metabolizing error involves recognizing a kind of procedural failure, determining its species (mistake, artifact, or genuine anomaly) and then re-engineering into the experimental procedure a suitable workaround (checklist, control, or re-design, respectively).

To illustrate how experimental techniques grow by metabolizing errors, I consider as an extended example a common experimental technique – morpholino genetic knockdown – for manipulating gene expression in developing embryos. Morpholinos are synthetic oligonucleotides which when injected into cells hybridize with endogenous target RNA and block translation, thereby “knocking down” expression of a gene of interest. Morpholino knockdown experiments are used extensively across a variety of model organism systems; in zebrafish and frog model systems, this technique is a component of the basic experimental toolkit. The popularity of the technique is due to its success as a partial realization of a long-standing experimental ideal in developmental genetics of being able to selectively manipulate the expression of a particular gene in a particular population of cells at a particular stage of development. In practice, however, morpholino injection is known to be liable to a host of errors. In the face of these known errors, the technique has been made reliable by elaborating a set of best practices or standard procedures for conducting morpholino injections. These standard protocols include a panoply of experimental checks and controls designed to detect any of the known errors should they occur. I argue that such standard protocols and experimental controls are the products of a process of metabolizing error. The upshot of this process is that, as new errors arise and are recognized and addressed, the standard protocols themselves grow, with the result that over time more and more work needs to be done to determine a single experimental result.




Analogy in Experimental Practice: The Perplexing Case of Muriatic Acid

Amy Fisher

University of Puget Sound


I argue that analogy plays an important role in experimentation. It informs practice by generating a set of criteria, i.e. likenesses and/or differences, with which to assess existing experimental results while also opening new areas of scientific inquiry. As C. Kenneth Waters and, more recently, John Norton and Paul Bartha argue, analogy is regarded skeptically within the philosophy of science because it relies on a set of complicated relationships and a form of inductive reasoning that is difficult to accurately and generally describe (Waters 1986; Norton 2003; Bartha 2010). Although the problem of induction seems intractable to logicians, given the ubiquity of analogical reasoning in experimental practice, it is important to try to characterize and understand its usage.

Historians and philosophers of science ask which phenomena formed a basis for comparison and how were likenesses prioritized and privileged in scientific theory and practice (e.g. Hesse 1963; Sterrett 2001; Janssen 2002; Levine 2009; Bartha 2010; Norton 2003). Emphasis, however, has been placed on theory rather than practice. I am interested in developing a more comprehensive understanding of experiment, not just as a function of social and cultural factors, but also as a process with elements that can transcend time period and locality (e.g. Steinle, 2002; Buchwald and Franklin 2005). I contend analogy is one such non-local practice. Many scientists, from Isaac

Newton to Neil deGrasse Tyson, have used analogies in research and education. From a theoretical point of view, analogies cannot lead to certain knowledge, but building on John Norton’s “material theory of induction”, analogies can lead to new empirical knowledge and provide a starting point for experiment (Norton 2003). Analyzed in the context of practice rather than formalism, analogy becomes more tractable. Thus, this paper adds to the small, but growing number of case studies examining the role of analogy in the design of experimental stratagems.

Experiments can, as Friedrich Steinle has argued, be exploratory, i.e. a fact finding mission, providing new information about a set of phenomena rather than testing a specific hypothesis or theory (Steinle 2005). Or, as Allan Franklin has shown, there may be a negotiation over the ways in which experimental results are interpreted and weighed, stemming from biases in data selection or incongruent results (Franklin 2002). I am interested in examining the ways in which exploratory

experiments and commensurable and incommensurable results were used to create practical and useful analytical categories, especially in late eighteenth and early nineteenth-century electrochemical research. In particular, I argue that the formation and evaluation of analogies between observations and/or measurements provided a starting point for experimental and theoretical analysis, an important stage between exploratory experiments and the development of robust theory. When drawing comparisons, eighteenth and early nineteenth-century scientists often used the phrases: “in analogy to”, “resembles”, “imitates” or “in the same manner as” to express likeness.  As this paper will show, the use of analogy was not primarily a pedagogical or heuristic tool, but rather played a substantive role in experimental practice as subsequent debates over whether a set of phenomena belonged together and which properties were most important or characteristic of that group opened new fields of inquiry.

The use of analogy in scientific inquiry is not confined to the eighteenth century (e.g. Hallyn 2000; Bartha 2010; Norton 2010). It was, however, during this period that quantitative experimental research emerged. Eighteenth-century electrical studies, for example, demonstrated that careful experimentation could lead to the exploration, development, and articulation of laws, mathematical expressions of the physical relationships between dependent variables, as shown by Coulomb’s study of electrical forces with a torsion balance (e.g. Heilbron 1999). Although this is a key aspect of experimental physics, there were other features of conceptual importance to the development of quantitative science emerging in this period, such as measurement, e.g. precision, accuracy, experimental error, knowing how an instrument works, and what it measures, and new emphasis on procedure, e.g. knowing how the order of operations affects experimental results, their reliability, and the conclusions that can be drawn from observation and/or measurement (e.g. Heilbron 1990; Buchwald 2006). Detailed studies of scientific practice, especially electrical experiments, can provide insight into how scientists marshaled evidence to develop and support theoretical claims about objects in nature (e.g. Steinle 2003).

In this vein, my paper analyzes the experimental practices of electrochemists during this period, focusing on the study of acids. Many chemical practitioners, including Humphry Davy and Antoine Lavoisier, emphasized the utility of analogy to the development of electrical and chemical research programs. Analogy provided a means of devising a working experimental hypothesis, based on the likelihood of previously observed events, to be either confirmed or refuted by additional experiments. This approach facilitated research. The results of an experiment could be compared to a set of known experimental outcomes with similar characteristics. If the analogy held,

it increased confidence in the validity of the hypothesis. If it failed, it suggested new experimental avenues and research questions to explore. In this manner, “exploratory experiments” took on a life of their own. Identifying similarities, likenesses, and/or differences between sets of phenomena provided a loose, but useful first-order framework that informed additional research.

For example, the name “acid” denoted a compound that manifested an agreed upon set of behavioral properties: nitrous acid like sulphuric acid caused corrosion, reacted with alkalis to form salts, and turned litmus, a plant extract, red. Lavoisier developed a sophisticated research program to further investigate the composition of acids. In addition to manifesting similar effects, chemical analysis revealed that each acid consisted of an inflammable substance and oxygen. Because the form of each acid was the same, Lavoisier argued that the cause of acidity must reside in its common constituent, oxygen or the “acid-generator”. He wrote: “when, from these particular facts, the general induction is made, that oxigene is a principle common to all acids, the consequence is founded on analogy [that oxygen engenders acidity in these substances], and here it is that the theory commences. Experiments which daily become more numerous, afford an increasing probability to this theory” (Lavoisier 1789, 20). Although Lavoisier’s analogy and explanation of acidity was ultimately incorrect e.g. muriatic (hydrochloric) acid does not contain oxygen, it was nonetheless productive. As William Henry, an influential British chemist, noted: many chemists, like Davy, undertook analyses of complex acids, like fluoric (hydrofluoric) and muriatic acid, to determine whether oxygen was indeed one of its components (Henry 1800). These experiments led to new discoveries, e.g. that chlorine and fluorine were elements, not compounds (e.g. Davy 1810; Le Grand 1974).

More specifically in this presentation, I analyze the role of analogy in Lavoisier’s experiments suggesting that oxygen was the crucial component in acids and Davy’s experimental refutation of Lavoisier’s results. From this case study, I argue that analogical reasoning encouraged either the development of a new operational definition for a group of phenomena, i.e. a working hypothesis and set of criteria by which to compare experimental results, or a more sophisticated and nuanced understanding of their differences. I examine which material properties formed the basis of comparison in each case and why. This analysis shows that the use of analogy promoted research to substantiate why the likenesses should outweigh the differences (or vice versa) when assessing results and designing experiments.


Bartha, Paul. 2010. By Parallel Reasoning: The Construction and Evaluation of Analogical Arguments. Oxford: Oxford University Press.

Buchwald, Jed Z. and Allan Franklin. 2005. “Introduction: Beyond Disunity and Historicism.” In Wrong for the Right Reasons, edited by Jed Z. Buchwald and Allan Franklin, 1-16. Dordrecht: Springer.

Buchwald, Jed Z. 2006. “Discrepant Measurements and Experimental Knowledge in the Early Modern Era.”

Archive for History of Exact Sciences 60: 565-649.

Davy, Humphry. 1810. “Researches on the Oxymuriatic Acid, its Nature and Combinations; and on the Elements of the Muriatic Acid. With Some Experiments on Sulphur and Phosphorus, made in the Laboratory of the Royal Institution.” Philosophical Transactions 100: 231-257.

Franklin, Allan. 2002. Selectivity and Discord: Two Problems of Experiment. Pittsburgh: University of Pittsburgh Press.

Hallyn, Fernand, ed. 2000. Metaphor and Analogy in the Sciences. Boston: Kluwer Academic Publishers.

Heilbron, John. 1990. “The Measure of Enlightenment.” In The Quantifying Spirit in the 18th Century, edited by John Heilbron, Tore Frängsmyr and John Lewis, 207-242. Berkeley: University of California Press.

Heilbron, John. 1999. Electricity in the 17th and 18th Centuries: A Study in Early Modern Physics. New York: Dover Publications.

Henry, William. 1800. “Account of a Series of Experiments, Undertaken with the View of Decomposing the Muriatic Acid.” Philosophical Transactions 90: 188-203.

Hesse, Mary. 1966. Models and Analogies in Science. Indiana: University of Notre Dame Press.

Janssen, Michel. 2002. “COI Stories: Explanation and Evidence in the History of Science.” Perspectives in Science 10: 457-522.

Lavoisier, Antoine. 1789. “Note of Mr. Lavoisier Upon the Introduction.” In Richard Kirwan’s An Essay on Phlogiston and the Constitution of Acids to which are added, Notes, Exhibiting and Defending the Antiplogistic Theory and annexed to the French Edition of this Work by Messrs. De Morveau, Lavoisier, De La Place, Monge, Berthollet, and De Fourcroy, translated by William Nicholson, 10-22. London: J. Johnson.

Le Grand, Homer. 1974. “Ideas on the Composition of Muriatic Acid and their Relevance to the Oxygen Theory of Acidity.” Annals of Science 31: 213-226.

Levine, Alex. 2009. “Partition Epistemology and Arguments from Analogy.” Synthese 166: 593-600.

Norton, John D. 2003. “A Material Theory of Induction.” Philosophy of Science 70: 647–70.

Norton, John D. 2010. “There are no Universal Rules for Induction.” Philosophy of Science 77: 765-777.

Steinle, Friedrich. 2002. “Experiments in History and Philosophy of Science.” Perspectives in Science 10:


Steinle, Friedrich. 2003. “The Practice of Studying Practice: Analyzing Research Records of Ampere and Faraday.” In Reworking the Bench, edited by Frederick L. Holmes, Juergen Renn and Hans-Joerg Rheinberger, 93-118. Dordrecht: Kluwer Academic Publishers.

Steinle, Friedrich. 2005. Explorative Experimente: Ampere, Faraday, und die Ursprunge der

Elektrodynamik. Munich: Franz Steiner Verlag.

Sterrett, Susan. 2001. “Darwin’s Analogy between Artificial and Natural Selection: How Does It Go?” Studies in the History and Philosophy of Biological and Biomedical Sciences 33: 151-168.

Waters, Kenneth C. 1986. “Taking Analogical Inference Seriously: Darwin’s Argument from Artificial Selection.” Proceedings of the Biennial Meeting of the Philosophy of Science Association 1: 502-513

Experiment in Cosmology: Model Selection, Error-Elimination and Robustness

Genco Guralp

Johns Hopkins University


This past year’s Nobel Prize in physics was awarded to two teams which, working indepen- dently, confirmed the striking fact that the expansion of the universe is accelerating. For many cosmologists, this prize marks another major point in the chain of successful results cosmology obtained in its relatively short history of being an experimental science. In fact, modern cosmology prides itself for becoming a “precision science,” breaking sharply with its “speculative” past. Hence, a critical study of empirical cosmology could both provide us with valuable philosophical lessons concerning  empirical knowledge and contribute to our understanding of how this type of knowledge is generated in different scientific domains.

Even though the usual textbook account cites the serendipitous discovery of the Cos- mic Microwave Background Radiation in (1964) as the beginning of the precision era in cosmology, many authors still refer to Edwin Hubble’s (1929) discovery of the expansion of the universe, on the basis of his observations of “extra-galactic nebulae,” as the turning point in the history of observational cosmology. It is remarkable that the problem that Hubble had to deal with, namely, establishing a reliable method to measure the distances to astrophysical objects, still  continues to be one of the central challenges  for modern observational cosmologists.

In this paper, I aim to examine the work of two research teams that successfully de- vised new experimental techniques and methods of analysis to overcome the obstacles that cosmological distance measurements presented since the time of Hubble. Beginning with an excursion into the history of these two collaborations, namely, the High-z Supernova Search Team (hereafter, HZS) and the Supernova Cosmology Project (hereafter, SCP), I offer a comparative  study of the  empirical dynamics that  eventually led to both teams’ receiving credit for the discovery of the acceleration of the universe. This discovery, which was later interpreted in terms of a still not completely understood notion of “dark energy,” forms an integral part of the modern concordance model in cosmology.

The central question that interests me can be formulated in the following way: Given that  it is generally accepted that  the expansion of the universe is accelerating, how is this conclusion justified empirically?  I treat this question via a historical analysis of the discovery, incorporating other auxiliary questions such as: What were the research aims of both groups before they embarked on their empirical work? Were they aiming at testing a theory, determining the value of a parameter or gathering evidence for a particular claim? What was the exact claim the groups endorsed after they conducted their research?  And how did they argue for that claim? Combining oral history interviews with the analysis of several key publications of both research teams1, which I refer to as the evidence papers (following [Staley(2011)]), I suggest that the best way to address these questions is through approaching the experimental program that governs the measurements of both teams within the context of model selection problem in statistical inference theory. Situating the study of experimentation in cosmology within  this epistemological context, I urge that we will be in a position to make use of this example to deal with  certain classical problems in philosophy of experimentation, such as the theory-ladenness of observation. In this setting, it appears that the epistemology of experiment, to use a term introduced by Allan Franklin, defined as the study of “how we come to believe rationally  in an experimental result” ([Franklin(1986)], p.   165), that  led the researchers  to conclude that  the expansion of the universe is accelerating,  depends crucially on an epistemic structure with three main components, all of which are based on the notion of elimination of systematic errors:

  1. Error-elimination in observations through identification of bias and contentious as- sumptions.
  2. Error-elimination in instrumentation through both methodological and technical means.
  3. Error-elimination in data-analysis through improved statistical inference schemes. Furthermore, the requirements of each of these epistemological  components were carried out with  an underlying methodological tool that is known as robustness  analysis. Con- sequently, the following thesis emerges as the key to the epistemology of the experiments conducted by HZS and SCP: The experimental work that resulted in the discovery of the accelerating universe is best characterized  as an experimental model selection effort based on constraining cosmological parameters statistically, which is achieved by the tripartite strategy of error-elimination that employs robustness analyses in various forms.


In line with the fact that the main justificatory argument for the experimental result is based on statistical inference, both teams aim at finding out which cosmological models are ruled out by the data on statistical grounds  and which models are consistent  with it.   For both teams, the empirical validity  of their  results comes from their  ability to contain through statistical means the adverse effects of the systematic uncertainties that are present in their data due to various biases and other astrophysical problems. That is to say, even though neither of the teams can eliminate systematic errors fully, they are still able to argue that they can discriminate among competing models, for the favored model is robustly supported. Here robustness is generally understood as: the value of a parameter remaining consistent with the result, when the fitting  method or the data-points included in the analysis are changed.

I further claim, on the basis of these observations,  that within  the context of testing cosmological models, the notion of theory-ladenness of observation is a too coarse-grained notion and one should rather think of observation  as model-space-laden,  where the model- space is generated by the different  values of the cosmological parameters. Even though the observations depend on the cosmological model-space in general, they are capable of deciding between models within  that space. Thus “theory-ladenness” does not endanger the rational choice between competing  cosmological models.

The paper is organized  as follows: In the first section, I begin by presenting the historical background of the problem by both expanding on the main elements of a cosmological model and the prior efforts for determining the Hubble constant.  Then I explain how within  the problem situation of cosmology in the early 90’s, the crucial question became the measurement of the deceleration parameter to determine the “fate of the universe.” I show that the initial  motivation for both teams was actually to measure the deceleration of the universe. In the next section, I explain the High-z Program and the two methods of light-curve fitting  that they employ, viz., the template-fitting  method and the Multicolor Light Curve Shape Method (MLCS). The employment of these methods are crucial for HZS to claim robust statistics for their results. Then, I pass on to the SCP collaboration and examine their fitting procedures. Interestingly, they employ both Bayesian and frequentist methods and provide 12 different  fits,  which are all strongly inconsistent  with  a non- accelerating universe. These variations in fittings constitute the basis for the robustness claim of the SCP team. Thirdly, I study the error-elimination strategies of both groups and show that they can be understood  on the basis of the schema that I presented above. In the fourth and the final section of the paper, I discuss the notion of model-space-ladenness of observation, which I claim to be a better account of what we witness in cosmology  as opposed to a coarse “theory-ladenness” picture. The two main ones being [Perlmutter  et al.(1999)] for the SCP collaboration and [Riess et al.(1998)]




[Franklin(1986)] Allan Franklin. The Neglect of Experiment. Cambridge University Press,


[Perlmutter et al.(1999)] Perlmutter et al. Measurements of ω and λ from 42 High-Redshift

Supernovae. The Astrophysical Journal, (517):565–586, 1999.

[Riess et al.(1998)] Riess et al. Observational Evidence from Supernovae for an Accelerat- ing Universe and a Cosmological Constant. The Astrophysical Journal, (116):1009–1038,


[Staley(2011)] Kent Staley. The Evidence for the Top Quark. Cambridge University Press,





Modeling Data-Acquisition in Experimentation: The Case of the ATLAS Experiment

Koray Karaca

Interdisciplinary Centre for Science and Technology Studies

Bergische Universität Wuppertal

(Large Hadron Collider (LHC) Epistemology Project)


The “hierarchy of models” account of scientific experimentation, which was originally developed by Suppes (1962) and later elaborated by Mayo (1996) and Harris (1999), offers a modeling framework that accounts not only for how experimenters deal with various problems encountered at different stages of experiment, but also for how, and in what forms, theoretical considerations regarding the phenomena under scrutiny are involved in different stages of experiment. The basic tenet of this account is that various experimental procedures involved in different stages of experiment are organized by models of different types that interact with each other through a three-layered hierarchy of models, ranging from what are called low-level “data-models” via “experimental models” to high-level “theoretical models”.

In this paper, I explore to improve the hierarchy of models account, so that it can be equipped to account also for the process of data-acquisition in the present-day high-collision rate particle physics experiments. As a representative case, I consider the ATLAS experiment that is currently underway at CERN.1 The ATLAS experiment is a multi-purpose experiment that has a diversified physics program that aims not only to discover the Higgs boson2 and to improve the current understanding of the not-well understood processes of the Standard Model of elementary particles—such as the top quark processes—but also to test the models and theories that offer possible extensions of the Standard Model, e.g., various theories/models based on concepts such as “super-symmetry”, “technicolor”, “extra space-like dimensions”, “heavy gauge bosons” and “dark-matter”.

The collision-events that are considered to be potentially relevant to the above-mentioned aims of the ATLAS experiment are first roughly selected out from the detector output according to certain pre-determined selection criteria, and subsequently they are Karaca-2


processed through a series of more stringent and systematic selections and manipulations before they are analyzed and interpreted in the more formal setting of the above-mentioned theories and theoretical models. These selections and manipulations are typically so sophisticated that they are executed through highly advanced software systems called “processor farms”. In order for the ATLAS experiment to be carried out successfully, all experimental procedures involved in the data-acquisition process need to be organized and coordinated into a single coherent system. This is often referred to as the “data-acquisition system” and constitutes an important part, besides instrumentation, in the overall organization of the ATLAS experiment.3

As I describe in detail in the paper, the data-acquisition process at the ATLAS experiment is essentially a selection process that consists of various experimental procedures ranging from the initial data-taking from the detector system, including also the transport, control and monitoring of various types of data and various factors affecting the experimental environment, to the relatively more complex procedures such as the implementation of the selection (computer) algorithms and their steering. I make use of the ATLAS Technical Design Report4 to show that the various experimental procedures constituting the data-acquisition system of the ATLAS experiment as well as how these procedures relate to one another have been designed by using various types of diagrams; namely, context, sequence, communication and class diagrams. These diagrams are referred to as “diagram models” in the literature of System & Software Engineering (SSE) in that they embody diagrammatic modeling of various “dynamic” and “static” features of information systems.5

I shall draw the following conclusions from the above summarized discussion: (1) the underlying operational details of the data-acquisition system are crucial to understand how the ATLAS experiment is actually conducted in the laboratory; (2) the nitty-gritty procedures constituting the data-acquisition process at the ATLAS experiment are diagrammatically represented by diagram models of various types borrowed from the literature of SSE. In light of these conclusions, I shall suggest that, in the context of the ATLAS experiment, diagram models serve as, what I shall call, “data-acquisition models”. Unlike other types of models constituting the different levels of the hierarchy previously proposed by Suppes and Mayo—namely, “data-models”, “experimental models” and “theoretical models”—data-acquisition models represent the procedures through which experimental data are selected according to certain pre-determined selection criteria and then recorded in data-bases. The case of the ATLAS experiment illustrates that experimenters utilize model-based reasoning not only to confront experimental data with theoretical claims under consideration, but also to select and acquire data in the laboratory. Therefore, given that models of data-acquisition serve to acquire experimental data, I shall suggest that they should constitute the bottom level (i.e., below the level of “data models” that serve to represent experimental data in Karaca-3 canonical forms) of the hierarchy of models of experimentation previously proposed by Suppes and Mayo.

Another important lesson the above discussion suggests is that the design of the data-acquisition system of the ATLAS experiment has been implemented through the use of a visual imagery that consists of a suitable set of diagram models adopted from the literature of SSE. I take this conclusion to indicate that visual representation plays a much broader role in experimentation; in the sense that it bears not only on the production and design of experimental instruments—as previously argued by Rothbart (2003)—and on the production and assessment of experimental data—as previously suggested by Galison (1997)—but also on the organization and coordination of all the experimental procedures required for the selection and acquisition of experimental data. I shall thus suggest that the diagram models used in the design of the ATLAS experiment encapsulates procedural knowledge, in that they involve the knowledge of the specific procedures of how to perform data selection and acquisition at the ATLAS experiment. In this sense, they should be seen as constitutive of the design of the ATLAS experiment.


1 The ATLAS and CMS are the Large Hadron Collider experiments that are being conducted at CERN.

2 On July 4, 2012, CERN announced that a new particle consistent with the characteristics of the Higgs boson was detected in the ATLAS and CMS experiments; see the press release at:

3 Note that this applies to all present-day particle physics experiments.

4 See “ATLAS Level-1 Trigger Technical Design Report”, CERN/LHCC/98-014.

5 See, e.g., Hoffer et al. 2008 and Booch et al. 2007.


Booch, G., Rumbaugh, J. and Jacobson, I. (2007), Object-Oriented Analysis and Design with Applications, 3rd ed., Addison-Wesley.

Galison, P., 1997: Image and logic: A material Culture of Microphysics, The University of Chicago Press.

Harris, T., (2003): “Data Models and the Acquisition and Manipulation of Data”, Philosophy of Science 70:1508.

Hoffer J.A., Joey F. G and Joseph S. V., (2008): Modern Systems Analysis and Design, Pearson Internation Edition.

Mayo D., (1996): Error and Growth of Experimental Knowledge, Chicago, University of Chicago Press.

Rothbart, D., 2003: "Designing Instruments and the Design of Nature." In Scientific Experimentation and its Philosophical Significance, edited by H. Radder, University of Pittsburgh Press, pp. 236-254.

Suppes P., (1962), ‘‘Models of Data’’, in Nagel, Suppes, and Tarski (eds.), Logic, Methodology, and Philosophy of Science. Stanford: Stanford University Press, 252–261.

Experimental Cosmology

James Mattingly

Georgetown University

:There are many reasons to experiment. When we don't know what laws govern some system, experimentation can generate new phenomena that allow us to characterize it better, and perhaps provide the key to discovering those laws. On the other hand when we know very well what laws govern some system, but the equations produced from these laws are intractable, experiment can help to understand better both the laws and the systems governed by those laws. But what do we do when we require experimental input, but cannot experiment on the systems in which we are interested?

In such cases we can turn to analogical experiments, that is, experiments that are carried out on one kind of physical system to learn about another kind of physical system. These are common in experimental practice (for example in animal models of human biological processes) and yet they are situated in a murky conceptual space between standard experiments on the one hand, and thought experiments and simulations on the other. This conceptual space has received relatively little attention from philosophers of experiment, and the purpose of this paper is to contribute to its exploration. I argue that a useful map of this space is provided by characterizing how the different experimental practices control the flow of information in their respective systems.

Analogical experiments are puzzling because the proximal systems---the systems on which we intervene and over which have control---can be of completely different sorts from the distal systems---the systems about which we are trying to generate experimental knowledge. These experiments work because we have enough information about the connection between what we know of the logic (very often the dynamics) governing the distal system and the logic of the proximal system, to convince us that further refinements of the logic of the proximal system will support further refinements in our characterization of the logic of the distal system. However there is an apparent lack of epistemic security that seems to undermine the possibility of knowledge in this case. I argue on the contrary that while this method may be in {\em practice} less secure, it is no less capable, in principle, of providing experimental knowledge. Experiments are more difficult here, but they are still experiments.

A complete analysis of analogical experimentation is beyond my scope here. I therefore restrict myself to a consideration of analogue gravity, an important and growing method for extending our knowledge of gravity into the realm where quantum mechanical effects are important. While the universe at large appears to be well-modeled by General Relativity (GR), there is reason to doubt that GR is the complete story of gravity. There is now near consensus that classical GR fails near the Planck length (~10^{-33} cm), but there is little consensus about what that something is, except that probably the continuum model of spacetime breaks down there and is replaced by something else. But we don't have much observational evidence to appeal to in forming a view. Something we wish we knew about the cosmos is what basic (or more basic) physics underlies the phenomenology of a classical general relativistic spacetime coupled to c-number expectation values of quantum fields (i.e., semiclassical GR). Theory has made some progress in illuminating the issues, but without guidance and feedback from experiment there appears to be an impasse.

We have only one cosmos, and we are incapable of doing controlled trials on it in order to extend our observational knowledge. However there are a number of systems that stand in more or less strong analogical relations to the cosmos. If we can outline the extent and limits of these analogies we can, I argue, produce experimental cosmological knowledge. Here I will focus on Bose-Einstein Condensates (BECs).

Two generic features of spacetimes roughly like ours with its quantum matter do present themselves as possible sources of experimental input: Hawking radiation from black holes and cosmological particle pair production caused by the expansion of the universe. These ``trans-Planckian modes" are promising as sources of information about the breakdown of our theoretical account of the universe at large because they are the result of transitions between the the physics generating the phenomenology of the semiclassical theory and the physics on, so to speak, the other side of the Planck-length barrier. The difficulty with the trans-Planckian modes is to observe them at all, much less to observe them in the kind of detail that would give insight into their fundamental characteristics. And certainly experimental access to these modes seems impossible to obtain because we have no prospects at all for intervening either in the expansion of the universe or in the evaporation of black holes.

However there is reason to be hopeful. Matter is discrete, and yet much of physics is predicated on it being well-described by continuum models, models that we know break down once we look more closely. We have a lot of practice performing experiments that probe the transition between emergent, apparently continuum characteristics of discrete systems and the characteristics that reveal their discrete natures. Many solid, liquid, and gaseous systems are suited to this kind of investigation. BECs however may allow much more than this; they may give just the access we need to see what happens in the specific case of semiclassical cosmology because BECs can be constructed to obey phenomenologically, in the continuum region, the same dynamics that the universe does in its continuum region. In particular BECs can be made to produce phonon pairs using mechanisms whose dynamics appears to be of the same form as the source of cosmological particle pair production in the expanding universe.

By displaying and analyzing the nature of the analogy between BECs and expanding spacetimes with quantum particle production, I will clarify the sense in which BEC experiments can be said to generate experimental cosmological knowledge. This will be my preliminary contribution to understanding the nature of analogical experimentation.

The False Dichotomy Between Experiment and Observation: The Case of Comparative Cognition

Irina Meketa,

Boston University

Abstract:The interdisciplinary study of animal cognition known as “comparative cognition” has suffered setbacks due to the perception within some of its subfields, such as comparative psychology, that the methods of the other subfields, such as ethology and evolutionary biology, are inadequately experimental (Allen & Bekoff 1997). This reliance on the experiment/observation distinction as a measure of epistemic reliability has obscured similarities among its subfields, forestalling a synthesis of the field and wrongly privileging laboratory-based studies over field studies.

In this paper I use comparative cognition as a case study, drawing on a range of examples such as Robert Seyfarth and Dorothy Cheney’s baboon studies, to challenge the traditional Baconian view that experimentation is the most reliable path to scientific knowledge across all scientific contexts. My strategy involves showing that the received view of the observation/experimentation distinction does not track the epistemic reliability of the studies performed in comparative cognition. I do so by identifying those features of experiment that are alleged to separate it from other kinds of scientific activity, showing that observational studies in comparative cognition retain all of the desiderata of experimentation, except where these obstruct empirical work. Lastly, I argue that in comparative cognition, experiment is saturated with observation and observation is aided by tools to a degree where any boundary between experimental and observational is arbitrary.

I single out intervention and (technological) manipulation, which allow for the isolation of signal from noise, as the primary features of experimental study that have traditionally granted it privileged epistemic status (Bogen 2010). Secondary desiderata of experiment include repeatability and its amenability to statistical analysis. I contrast these with the weaknesses imputed to observational research, such the post-positivist suspicions of observation as theory-laden and of the human perceiver as an unreliable instrument for parsing information from the environment. These weaknesses mirror the strengths imputed to experiment: (i) interventionism is contrasted with passivity; (ii) isolation of target feature is contrasted with the reception of undifferentiated environmental signals; (iii) repeatability is contrasted with novelty; (iv) theory-laden observation is contrasted with secure knowledge through manipulation of unobservables (Hacking 1983); and finally, (v) the suspicion of the human perceiver in observational studies is contrasted with the fidelity of the technological instrumentation used in experiments.

With respect to interventionism, I argue that not all experiments are interventionists and not all studies require intervention. For example, Eddington's 1919 expedition to test Einstein’s general relativity theory did not involve intervention (Okasha 2011). In comparative cognition, biologists have questioned the excessive intervention of some studies. For example, Daniel Povinelli’s chimpanzees have been alleged to be developmentally stunted as a result of being raised in socially and ecologically impoverished conditions. Similarly, it has been argued that results from enculturated laboratory animals, such as Irene Pepperberg’s African grey parrot, Alex, cannot be generalized to the wild population because the linguistic scaffolding provided for Alex has altered the conceptual capacities being tested.

With respect distilling information from a noisy environment, the privileging of experiment over observation fails to give credit to the trained observer: While it is true that untrained and unfocused observation is unlikely to produce reliable conclusions about natural phenomena, this is not the sort of observation that is employed by comparative cognition researchers. Consider, for instance, the primatologists Robert Seyfarth and Dorothy Cheney, who employ both free observation and field-based experiments. Importantly, their “free” observation is not “free” in the sense of being unconstrained or inexpert. Their form of observation required the construction, use, and revision, of detailed ethograms, or catalogues of species-typical behaviors, as well as video-recording and frame-by-frame visual analysis of the behavior. The use and refinement of an ethogram requires astute observational skills gained through extensive experience and years of academic training with tools such as the ethogram. These tools structure observation and allow Seyfarth and Cheney to compile a behavioral profile of their group of baboons, which has predictive and explanatory power. Moreover, laboratory experiments are not divorced from theoretical assumptions, as Povinelli's case shows.

The success of Seyfarth and Cheney’s observations demonstrates that skepticism of the human ability to extract information from the environment undervalues the expertise of scientific observers. The human observer is herself a kind of information-processing instrument. Insofar as experiments gain credibility due to the increased accuracy of their instruments, an observational study conducted by experts is not different in kind from an experimental study. What matters is the fidelity of the instruments and their fit for the given task; the constitution of the instruments only matters insofar as it affects fit and fidelity.

On the point of repeatability and statistical analysis, I argue that many observational studies are both repeatable and statistically analyzable, while some experimental studies are not. For example longitudinal studies in humans are not repeatable, but are considered epistemically on par with repeatable experimentation because they are statistically evaluable. In comparative cognition, observational studies in natural or semi-natural environments are also often amenable to statistical analysis. One example comes from Orlaith N. Fraser and Thomas Bugnyar (2010), who tested a population of ravens for consolation behavior after aggressive bouts through video-assisted observation, and analyzed their results using traditional statistical methods.

Finally, I turn to Seyfarth and Cheney’s use of field-based experiments to argue that the difference between their experiments and their observations is not epistemically salient. I focus on their auditory play-back experiments, which corroborated their observation-driven hypothesis that baboons keep track of the social hierarchy of their tribe. I argue that had Seyfarth and Cheney had infinite time to observe the tribe, their artificially created situation may have occurred naturally, providing exactly the same information as the playback experiment. If the two cases are epistemically indistinguishable, then holding the playback experiment above the hypothetical novel occurrence is a mere prejudice.

In conclusion, the case of comparative cognition suggests that the distinction between observational study and experimentation can break down in practice, suggesting the need for a philosophical synthesis of heretofore fragmentary analyses of observation and experimentation.


Extrapolation Claims in Experimental Evolution

Emily Parke

University Of Pennsylvania


This paper addresses the question of when extrapolations from experimental systems to the world outside the laboratory are valid. I focus on examples from experimental evolution, which challenge in revealing ways the traditional treatment of experiment-world relationships in the literature. I argue against the view that experimental systems’ ontological correspondence to their targets in the world is the key to validating extrapolations. Following Parker (2008), I endorse the alternative view that capturing relevant similarities is what matters. I use examples from experimental evolution to offer a framework for thinking about which kinds of similarities matter in different contexts, depending on (i) the kind of scientific question being asked, and (ii) the relative weight of entities, environment, and processes and mechanisms in grounding its exploration via the experimental system.

Experimental evolution studies evolve populations of organisms—usually bacteria—in the laboratory to study their dynamics in real time. A particularly impressive case is Richard Lenski’s long-term evolution experiment (Lenski 2012). Researchers used a single ancestral genome of Escherichia coli to found twelve genetically identical populations in identical environments, and have been letting them evolve for over 20 years (>50,000 generations), studying an amazing array of features of their evolutionary history, adaptation and diversification.

I focus on two examples from the Lenski experiment where extrapolation claims were made from the experimental system to the world outside the laboratory. In the first example, punctuated evolution (Elena et al. 1996), Lenski and colleagues saw that average cell size in the populations was increasing in a step-like pattern, fluctuating being periods of stasis and rapid increase; the latter were associated with new mutations rapidly sweeping through the population. They claim that in natural populations characterized by certain features of their evolutionary history and environment that mirror features in the experimental system, beneficial mutation sweeps can explain trends of punctuated equilibrium observed in the fossil record. In the second example, high mutation rates (Sniegowski et al. 1997), they discuss the evolution of surprisingly high mutation rates in 3 of the 12 lineages. Mutator alleles arose spontaneously in these populations and hitchhiked to fixation due to chance association with beneficial mutations. They claim that this finding can help explain observations of high mutation rates in certain cancers and in pathogenic natural populations of E. coli and Salmonella.

These two cases are revealingly different in the focus of their extrapolation claims. The former makes a claim about an open set of populations in the world characterized by their similarity in evolutionary process and environmental constancy to the experimental system; the latter about a clearly defined set of entities in the world characterized by similarity in asexuality and/or phylogeny to the entities comprising the experimental system. I focus on the question: What about the relationship between this experimental system and these two quite different targets in the world could ground the validity of these extrapolation claims?

Morgan (2005) and Guala (2002), in their discussions of experiments’ external validity, have argued that experiments, unlike models and simulations, generate extrapolation power by being designed to correspond ontologically to, or be “made of the same stuff” as, their targets. Morgan in particular suggests that there is a scale of ontological correspondence on which we can assess the relationship of an experimental system to its target. She also suggests that there is something special about being made of the same kind of stuff as targets in the natural world that entails a relationship of replication, rather than representation, of those targets. I respond to both of these points, drawing on the examples from experimental evolution, and then propose an alternative framework for assessing the validity of extrapolation claims.

First, in response to the point about ontological correspondence, I argue that (i) experiments are not always designed with particular targets in mind, and, more importantly, (ii) even when there is a clear target in mind, whether or not an experimental system and its target are “made of the same stuff” is not at all straightforward to evaluate. The punctuated evolution example illustrates this point: If we break the experimental system down into its various components—environment, evolving populations, and evolutionary processes and mechanisms—we can say something quite different about each with respect to their closeness to the corresponding component of the target.

Second, in response to the point about replication versus representation, I argue that experimental systems can replicate or represent parts of the world. Experiments, unlike models and simulations, are in a position to replicate (parts of) their targets in the world. But this is by no means always the intention. Sometimes being more like the “real world” is undesirable; the experimental evolution cases illustrate this. In the high mutation rates example, the extrapolation claim focuses on a kind of entity; here, being materially closer to the target matters more. In the punctuated evolution case, the extrapolation claim focuses on a process; being made of the same sorts of entities as the target matters less than appropriately capturing the relevant evolutionary pattern. While both of these examples are based on the same experimental system, that system’s experimental subjects, environment and evolutionary processes shift from one case to the next in their roles in grounding the extrapolation claims at hand.

I agree with Parker (2008) that capturing relevant similarities should be the key to grounding extrapolations from experimental systems to the world; this might sometimes involve ontological correspondence, but that depends on the context. I conclude by proposing a framework for evaluating which similarities matter most for different kinds of extrapolation claims, depending on factors such as the kind of scientific question being asked, how specifically the target is defined in advance (if at all), and the weight given to different aspects of what the system is “made of” (understood broadly to mean more than just the experimental subjects). This framework is based on the cases from experimental evolution discussed above, but I indicate ways it might be extendable to discussions of experiments in general.


Elena, S. F., Cooper, V. S., & Lenski, R. E. (1996). Punctuated evolution caused by selection of rare beneficial mutations. Science, 272(5269), 1802.
Guala, F. (2002). Models, simulations, and experiments. In Model-based reasoning: Science, technology, values (pp. 59–74). Kluwer.
Lenski, R. E. (2012). The E. coli long-term experimental evolution project site. (accessed February 2012).
Morgan, M. S. (2005). Experiments versus models: New phenomena, inference and surprise. Journal of Economic Methodology, 12(2), 317–329.
Parker, W. S. (2008). Does matter really matter? C

Experimentation On Analogues

Susan Sterrett

Carnegie Mellon University


In his 2008 paper "Dumb Holes: Analogues for Black Holes" Unruh writes that "one of the most exciting possibilities for dumb holes is the possibility of experimental observation. Experiments with application to the classical black holes are easy. . . one can easily create classical analogues to black holes." He and his colleagues have carried out experiments on such hydrodynamical (sonic) analogues. One of the reasons these experiments are so striking, from the standpoint of philosophy of science, is that philosophers of science have in the past often taken for granted that one cannot experiment on such cosmological entities; the closest one could come to experimentation, it was often thought, was the use of mathematical models and computer simulations.

The "dumb hole" model of black holes is one of the most striking and fascinating examples of experimentation on analogues, but it is not a singular example even in cosmology. There are other analogues of spacetimes, in fact; Visser has discussed a variety of analogue spacetimes, and remarks that "[Unruh's analogue model] and related analogue models for curved spacetime are useful in many ways; Analogue spacetimes provide general relativists with extremely concrete models to help focus their thinking. . . " (Visser 2011)

Despite their usefulness, experimentation on analogues is a relatively unexplored topic in philosophy of science, probably because of the conjunction of two situations: (i) Philosophers of science have only recently been examining what is involved in scientific experimentation in ways that attend to actual scientific practice, and (ii) Philosophers of science (save for a very few exceptions) have generally only briefly noted the existence of, and then set aside, analogue models; even when their existence has been noted, not much attention has been paid to the details of how they are used in scientific experimentation.

In this talk, I first discuss various methods of experimentation on analogue models that have been used to date. There is the general case of an analogue computer, in which electrical circuits can be set up to serve as models of an unlimited number of different kinds of non-electrical engineering systems (e.g., mechanical systems, fluid systems); here mathematical equations mediate between the analogue model and the system modeled. There are also many special kinds of analogue models that have been used in engineering (soap bubble films, photoelastic materials), the methodologies for which have been developed to a very high level of sophistication. Some of these methodologies are indispensible in certain fields. The method of dynamic similarity, which is the basis for scale models in a wide variety of fields, is another widely used method: it is applicable to experimentation on analogue systems in which the analogue differs in kind from the system modeled as well.

I will then examine the bases of these various kinds of experimentation on analogue models. I will end by looking how the basis for drawing conclusions from experiments on analogues might resemble, and how it might differ from, other kinds of scientific experimentation.