Stanford Encyclopedia of Philosophy
This is a file in the archives of the Stanford Encyclopedia of Philosophy.

Experiment in Physics

First published Mon Oct 5, 1998; substantive revision Tue Jan 6, 2009

Physics, and natural science in general, is a reasonable enterprise based on valid experimental evidence, criticism, and rational discussion. It provides us with knowledge of the physical world, and it is experiment that provides the evidence that grounds this knowledge. Experiment plays many roles in science. One of its important roles is to test theories and to provide the basis for scientific knowledge.[1] It can also call for a new theory, either by showing that an accepted theory is incorrect, or by exhibiting a new phenomenon that is in need of explanation. Experiment can provide hints toward the structure or mathematical form of a theory and it can provide evidence for the existence of the entities involved in our theories. Finally, it may also have a life of its own, independent of theory. Scientists may investigate a phenomenon just because it looks interesting. Such experiments may provide evidence for a future theory to explain. [Examples of these different roles will be presented below.] As we shall see below, a single experiment may play several of these roles at once.

If experiment is to play these important roles in science then we must have good reasons to believe experimental results, for science is a fallible enterprise. Theoretical calculations, experimental results, or the comparison between experiment and theory may all be wrong. Science is more complex than “The scientist proposes, Nature disposes.” It may not always be clear what the scientist is proposing. Theories often need to be articulated and clarified. It also may not be clear how Nature is disposing. Experiments may not always give clear-cut results, and may even disagree for a time.

In what follows, the reader will find an epistemology of experiment, a set of strategies that provides reasonable belief in experimental results. Scientific knowledge can then be reasonably based on these experimental results.

1. Experimental Results

1.1 The Case For Learning From Experiment

1.1.1 An Epistemology of Experiment

It has been two decades since Ian Hacking asked, “Do we see through a microscope?” (Hacking 1981). Hacking's question really asked how do we come to believe in an experimental result obtained with a complex experimental apparatus? How do we distinguish between a valid result[2] and an artifact created by that apparatus? If experiment is to play all of the important roles in science mentioned above and to provide the evidential basis for scientific knowledge, then we must have good reasons to believe in those results. Hacking provided an extended answer in the second half of Representing and Intervening (1983). He pointed out that even though an experimental apparatus is laden with, at the very least, the theory of the apparatus, observations remain robust despite changes in the theory of the apparatus or in the theory of the phenomenon. His illustration was the sustained belief in microscope images despite the major change in the theory of the microscope when Abbe pointed out the importance of diffraction in its operation. One reason Hacking gave for this is that in making such observations the experimenters intervened—they manipulated the object under observation. Thus, in looking at a cell through a microscope, one might inject fluid into the cell or stain the specimen. One expects the cell to change shape or color when this is done. Observing the predicted effect strengthens our belief in both the proper operation of the microscope and in the observation. This is true in general. Observing the predicted effect of an intervention strengthens our belief in both the proper operation of the experimental apparatus and in the observations made with it.

Hacking also discussed the strengthening of one's belief in an observation by independent confirmation. The fact that the same pattern of dots—dense bodies in cells—is seen with “different” microscopes, (e.g. ordinary, polarizing, phase-contrast, fluorescence, interference, electron, acoustic etc.) argues for the validity of the observation. One might question whether “different” is a theory-laden term. After all, it is our theory of light and of the microscope that allows us to consider these microscopes as different from each other. Nevertheless, the argument holds. Hacking correctly argues that it would be a preposterous coincidence if the same pattern of dots were produced in two totally different kinds of physical systems. Different apparatuses have different backgrounds and systematic errors, making the coincidence, if it is an artifact, most unlikely. If it is a correct result, and the instruments are working properly, the coincidence of results is understandable.

Hacking's answer is correct as far as it goes. It is, however, incomplete. What happens when one can perform the experiment with only one type of apparatus, such as an electron microscope or a radio telescope, or when intervention is either impossible or extremely difficult? Other strategies are needed to validate the observation.[3] These may include:

  1. Experimental checks and calibration, in which the experimental apparatus reproduces known phenomena. For example, if we wish to argue that the spectrum of a substance obtained with a new type of spectrometer is correct, we might check that this new spectrometer could reproduce the known Balmer series in hydrogen. If we correctly observe the Balmer Series then we strengthen our belief that the spectrometer is working properly. This also strengthens our belief in the results obtained with that spectrometer. If the check fails then we have good reason to question the results obtained with that apparatus.
  2. Reproducing artifacts that are known in advance to be present. An example of this comes from experiments to measure the infrared spectra of organic molecules (Randall et al. 1949). It was not always possible to prepare a pure sample of such material. Sometimes the experimenters had to place the substance in an oil paste or in solution. In such cases, one expects to observe the spectrum of the oil or the solvent, superimposed on that of the substance. One can then compare the composite spectrum with the known spectrum of the oil or the solvent. Observation then of this artifact gives confidence in other measurements made with the spectrometer.
  3. Elimination of plausible sources of error and alternative explanations of the result (the Sherlock Holmes strategy).[4] Thus, when scientists claimed to have observed electric discharges in the rings of Saturn, they argued for their result by showing that it could not have been caused by defects in the telemetry, interaction with the environment of Saturn, lightning, or dust. The only remaining explanation of their result was that it was due to electric discharges in the rings—there was no other plausible explanation of the observation. (In addition, the same result was observed by both Voyager 1 and Voyager 2. This provided independent confirmation. Often, several epistemological strategies are used in the same experiment.)
  4. Using the results themselves to argue for their validity. Consider the problem of Galileo's telescopic observations of the moons of Jupiter. Although one might very well believe that his primitive, early telescope might have produced spurious spots of light, it is extremely implausible that the telescope would create images that they would appear to be a eclipses and other phenomena consistent with the motions of a small planetary system. It would have been even more implausible to believe that the created spots would satisfy Kepler's Third Law (R3/T2 = constant). A similar argument was used by Robert Millikan to support his observation of the quantization of electric charge and his measurement of the charge of the electron. Millikan remarked, “The total number of changes which we have observed would be between one and two thousand, and in not one single instance has there been any change which did not represent the advent upon the drop of one definite invariable quantity of electricity or a very small multiple of that quantity” (Millikan 1911, p. 360). In both of these cases one is arguing that there was no plausible malfunction of the apparatus, or background, that would explain the observations.
  5. Using an independently well-corroborated theory of the phenomena to explain the results. This was illustrated in the discovery of the W±, the charged intermediate vector boson required by the Weinberg-Salam unified theory of electroweak interactions. Although these experiments used very complex apparatuses and used other epistemological strategies (for details see (Franklin 1986, pp. 170-72)). I believe that the agreement of the observations with the theoretical predictions of the particle properties helped to validate the experimental results. In this case the particle candidates were observed in events that contained an electron with high transverse momentum and in which there were no particle jets, just as predicted by the theory. In addition, the measured particle mass of 81 ± 5 GeV/c2 and 80+10-6, GeV/c2, found in the two experiments (note the independent confirmation also), was in good agreement with the theoretical prediction of 82 ± 2.4 GeV/c2. It was very improbable that any background effect, which might mimic the presence of the particle, would be in agreement with theory.
  6. Using an apparatus based on a well-corroborated theory. In this case the support for the theory inspires confidence in the apparatus based on that theory. This is the case with the electron microscope and the radio telescope, whose operations are based on a well-supported theories, although other strategies are also used to validate the observations made with these instruments.
  7. Using statistical arguments. An interesting example of this arose in the 1960s when the search for new particles and resonances occupied a substantial fraction of the time and effort of those physicists working in experimental high-energy physics. The usual technique was to plot the number of events observed as a function of the invariant mass of the final-state particles and to look for bumps above a smooth background. The usual informal criterion for the presence of a new particle was that it resulted in a three standard-deviation effect above the background, a result that had a probability of 0.27% of occurring in a single bin. This criterion was later changed to four standard deviations, which had a probability of 0.0064% when it was pointed out that the number of graphs plotted each year by high-energy physicists made it rather probable, on statistical grounds, that a three standard-deviation effect would be observed.

These strategies along with Hacking's intervention and independent confirmation constitute an epistemology of experiment. They provide us with good reasons for belief in experimental results, They do not, however, guarantee that the results are correct. There are many experiments in which these strategies are applied, but whose results are later shown to be incorrect (examples will be presented below). Experiment is fallible. Neither are these strategies exclusive or exhaustive. No single one of them, or fixed combination of them, guarantees the validity of an experimental result. Physicists use as many of the strategies as they can conveniently apply in any given experiment.

1.1.2 Galison's Elaboration

In How Experiments End (1987), Peter Galison extended the discussion of experiment to more complex situations. In his histories of the measurements of the gyromagnetic ratio of the electron, the discovery of the muon, and the discovery of weak neutral currents, he considered a series of experiments measuring a single quantity, a set of different experiments culminating in a discovery, and two high- energy physics experiments performed by large groups with complex experimental apparatus.

Galison's view is that experiments end when the experimenters believe that they have a result that will stand up in court—a result that I believe includes the use of the epistemological strategies discussed earlier. Thus, David Cline, one of the weak neutral-current experimenters remarked, “At present I don't see how to make these effects [the weak neutral current event candidates] go away” (Galison, 1987, p. 235).

Galison emphasizes that, within a large experimental group, different members of the group may find different pieces of evidence most convincing. Thus, in the Gargamelle weak neutral current experiment, several group members found the single photograph of a neutrino-electron scattering event particularly important, whereas for others the difference in spatial distribution between the observed neutral current candidates and the neutron background was decisive. Galison attributes this, in large part, to differences in experimental traditions, in which scientists develop skill in using certain types of instruments or apparatus. In particle physics, for example, there is the tradition of visual detectors, such as the cloud chamber or the bubble chamber, in contrast to the electronic tradition of Geiger and scintillation counters and spark chambers. Scientists within the visual tradition tend to prefer “golden events” that clearly demonstrate the phenomenon in question, whereas those in the electronic tradition tend to find statistical arguments more persuasive and important than individual events. (For further discussion of this issue see Galison (1997)).

Galison points out that major changes in theory and in experimental practice and instruments do not necessarily occur at the same time. This persistence of experimental results provides continuity across these conceptual changes. Thus, the experiments on the gyromagnetic ratio spanned classical electromagnetism, Bohr's old quantum theory, and the new quantum mechanics of Heisenberg and Schrodinger. Robert Ackermann has offered a similar view in his discussion of scientific instruments.

The advantages of a scientific instrument are that it cannot change theories. Instruments embody theories, to be sure, or we wouldn't have any grasp of the significance of their operation….Instruments create an invariant relationship between their operations and the world, at least when we abstract from the expertise involved in their correct use. When our theories change, we may conceive of the significance of the instrument and the world with which it is interacting differently, and the datum of an instrument may change in significance, but the datum can nonetheless stay the same, and will typically be expected to do so. An instrument reads 2 when exposed to some phenomenon. After a change in theory,[5] it will continue to show the same reading, even though we may take the reading to be no longer important, or to tell us something other than what we thought originally (Ackermann 1985, p. 33).

Galison also discusses other aspects of the interaction between experiment and theory. Theory may influence what is considered to be a real effect, demanding explanation, and what is considered background. In his discussion of the discovery of the muon, he argues that the calculation of Oppenheimer and Carlson, which showed that showers were to be expected in the passage of electrons through matter, left the penetrating particles, later shown to be muons, as the unexplained phenomenon. Prior to their work, physicists thought the showering particles were the problem, whereas the penetrating particles seemed to be understood.

The role of theory as an “enabling theory,” (i.e., one that allows calculation or estimation of the size of the expected effect and also the size of expected backgrounds) is also discussed by Galison. (See also (Franklin 1995b) and the discussion of the Stern-Gerlach experiment below). Such a theory can help to determine whether an experiment is feasible. Galison also emphasizes that elimination of background that might simulate or mask an effect is central to the experimental enterprise, and not a peripheral activity. In the case of the weak neutral current experiments, the existence of the currents depended crucially on showing that the event candidates could not all be due to neutron background.[6]

There is also a danger that the design of an experiment may preclude observation of a phenomenon. Galison points out that the original design of one of the neutral current experiments, which included a muon trigger, would not have allowed the observation of neutral currents. In its original form the experiment was designed to observe charged currents, which produce a high energy muon. Neutral currents do not. Therefore, having a muon trigger precluded their observation. Only after the theoretical importance of the search for neutral currents was emphasized to the experimenters was the trigger changed. Changing the design did not, of course, guarantee that neutral currents would be observed.

Galison also shows that the theoretical presuppositions of the experimenters may enter into the decision to end an experiment and report the result. Einstein and de Haas ended their search for systematic errors when their value for the gyromagnetic ratio of the electron, g = 1, agreed with their theoretical model of orbiting electrons. This effect of presuppositions might cause one to be skeptical of both experimental results and their role in theory evaluation. Galison's history shows, however, that, in this case, the importance of the measurement led to many repetitions of the measurement. This resulted in an agreed-upon result that disagreed with theoretical expectations.

Recently, Galison has modified his views. In Image and Logic, an extended study of instrumentation in 20th-century high-energy physics, Galison (1997) has extended his argument that there are two distinct experimental traditions within that field—the visual (or image) tradition and the electronic (or logic) tradition. The image tradition uses detectors such as cloud chambers or bubble chanbers, which provide detailed and extensive information about each individual event. The electronic detectors used by the logic tradition, such as geiger counters, scintillation counters, and spark chambers, provide less detailed information about individual events, but detect more events. Galison's view is that experimenters working in these two traditions form distinct epistemic and linguistic groups that rely on different forms of argument. The visual tradition emphasizes the single “golden” event. “On the image side resides a deep-seated commitment to the ‘golden event’: the single picture of such clarity and distinctness that it commands acceptance.” (Galison, 1997, p. 22) “The golden event was the exemplar of the image tradition: an individual instance so complete and well defined, so ‘manifestly’ free of distortion and background that no further data had to be involved” (p. 23). Because the individual events provided in the logic detectors containded less detailed information than the pictures of the visual tradition, statistical arguments based on large numbers of events were required.

Kent Staley (1999) disagrees. He argues that the two traditions are not as distinct as Galison believes:

I show that discoveries in both traditions have employed the same statistical [I would add “and/or probabilistic”] form of argument, even when basing discovery claims on single, golden events. Where Galison sees an epistemic divide between two communities that can only be bridged by creole- or pidgin-like ‘interlanguage,’ there is in fact a shared commitment to a statistical form of experimental argument. (p. 96).

Staley believes that although there is certainly epistemic continuity within a given tradition, there is also a continuity between the traditions. This does not, I believe, mean that the shared commitmeny comprises all of the arguments offered in any particular instance, but rather that the same methods are often used by both communities. Galison does not deny that statistical methods are used in the image tradition, but he thinks that they are relatively unimportant. “While statistics could certainly be used within the image tradition, it was by no means necessary for most applications” (Galison, 1997, p. 451). In contrast, Galison believes that arguments in the logic tradition “were inherently and inalienably statistical. Estimation of probable errors and the statistical excess over background is not a side issue in these detectors—it is central to the possibilty of any demonstration at all” (p. 451).

Although a detailed discussion of the disagreement between Staley and Galison would take us too far from the subject of this essay, they both agree that arguments are offered for the correctness of experimental results. Their disagreement concerns the nature of those arguments. (For further discussion see Franklin, (2002), pp. 9-17).

1.2 The Case Against Learning From Experiment

1.2.1 Collins and the Experimenters' Regress

Collins, Pickering, and others, have raised objections to the view that experimental results are accepted on the basis of epistemological arguments. They point out that “a sufficiently determined critic can always find a reason to dispute any alleged ‘result’” (MacKenzie 1989, p. 412). Harry Collins, for example, is well known for his skepticism concerning both experimental results and evidence. He develops an argument that he calls the “experimenters' regress” (Collins 1985, chapter 4, pp. 79–111): What scientists take to be a correct result is one obtained with a good, that is, properly functioning, experimental apparatus. But a good experimental apparatus is simply one that gives correct results. Collins claims that there are no formal criteria that one can apply to decide whether or not an experimental apparatus is working properly. In particular, he argues that calibrating an experimental apparatus by using a surrogate signal cannot provide an independent reason for considering the apparatus to be reliable.

In Collins' view the regress is eventually broken by negotiation within the appropriate scientific community, a process driven by factors such as the career, social, and cognitive interests of the scientists, and the perceived utility for future work, but one that is not decided by what we might call epistemological criteria, or reasoned judgment. Thus, Collins concludes that his regress raises serious questions concerning both experimental evidence and its use in the evaluation of scientific hypotheses and theories. Indeed, if no way out of the regress can be found, then he has a point.

Collins strongest candidate for an example of the experimenters' regress is presented in his history of the early attempts to detect gravitational radiation, or gravity waves. (For more detailed discussion of this episode see (Collins 1985; 1994; Franklin 1994; 1997a) In this case, the physics community was forced to compare Weber's claims that he had observed gravity waves with the reports from six other experiments that failed to detect them. On the one hand, Collins argues that the decision between these conflicting experimental results could not be made on epistemological or methodological grounds—he claims that the six negative experiments could not legitimately be regarded as replications[7] and hence become less impressive. On the other hand, Weber's apparatus, precisely because the experiments used a new type of apparatus to try to detect a hitherto unobserved phenomenon,[8] could not be subjected to standard calibration techniques.

The results presented by Weber's critics were not only more numerous, but they had also been carefully cross-checked. The groups had exchanged both data and analysis programs and confirmed their results. The critics had also investigated whether or not their analysis procedure, the use of a linear algorithm, could account for their failure to observe Weber's reported results. They had used Weber's preferred procedure, a nonlinear algorithm, to analyze their own data, and still found no sign of an effect. They had also calibrated their experimental apparatuses by inserting acoustic pulses of known energy and finding that they could detect a signal. Weber, on the other hand, as well as his critics using his analysis procedure, could not detect such calibration pulses.

There were, in addition, several other serious questions raised about Weber's analysis procedures. These included an admitted programming error that generated spurious coincidences between Weber's two detectors, possible selection bias by Weber, Weber's report of coincidences between two detectors when the data had been taken four hours apart, and whether or not Weber's experimental apparatus could produce the narrow coincidences claimed.

It seems clear that the critics' results were far more credible than Weber's. They had checked their results by independent confirmation, which included the sharing of data and analysis programs. They had also eliminated a plausible source of error, that of the pulses being longer than expected, by analyzing their results using the nonlinear algorithm and by explicitly searching for such long pulses.[9] They had also calibrated their apparatuses by injecting pulses of known energy and observing the output.

Contrary to Collins, I believe that the scientific community made a reasoned judgment and rejected Weber's results and accepted those of his critics. Although no formal rules were applied (e.g. if you make four errors, rather than three, your results lack credibility; or if there are five, but not six, conflicting results, your work is still credible) the procedure was reasonable.

Pickering has argued that the reasons for accepting results are the future utility of such results for both theoretical and experimental practice and the agreement of such results with the existing community commitments. In discussing the discovery of weak neutral currents, Pickering states,

Quite simply, particle physicists accepted the existence of the neutral current because they could see how to ply their trade more profitably in a world in which the neutral current was real. (1984b, p. 87)

Scientific communities tend to reject data that conflict with group commitments and, obversely, to adjust their experimental techniques to tune in on phenomena consistent with those commitments. (1981, p. 236)

The emphasis on future utility and existing commitments is clear. These two criteria do not necessarily agree. For example, there are episodes in the history of science in which more opportunity for future work is provided by the overthrow of existing theory. (See, for example, the history of the overthrow of parity conservation and of CP symmetry discussed below and in (Franklin 1986, Ch. 1, 3)).

1.2.2 Pickering on Communal Opportunism and Plastic Resources

Pickering has recently offered a different view of experimental results. In his view the material procedure (including the experimental apparatus itself along with setting it up, running it, and monitoring its operation), the theoretical model of that apparatus, and the theoretical model of the phenomena under investigation are all plastic resources that the investigator brings into relations of mutual support. (Pickering 1987; Pickering 1989). He says:

Achieving such relations of mutual support is, I suggest, the defining characteristic of the successful experiment. (1987, p. 199)

He uses Morpurgo's search for free quarks, or fractional charges of 1/3 e or 2/3 e, where e is the charge of the electron. (See also (Gooding 1992)). Morpurgo used a modern Millikan-type apparatus and initially found a continuous distribution of charge values. Following some tinkering with the apparatus, Morpurgo found that if he separated the capacitor plates he obtained only integral values of charge. “After some theoretical analysis, Morpurgo concluded that he now had his apparatus working properly, and reported his failure to find any evidence for fractional charges” (Pickering 1987, p. 197).

Pickering goes on to note that Morpurgo did not tinker with the two competing theories of the phenomena then on offer, those of integral and fractional charge:

The initial source of doubt about the adequacy of the early stages of the experiment was precisely the fact that their findings—continuously distributed charges—were consonant with neither of the phenomenal models which Morpurgo was prepared to countenance. And what motivated the search for a new instrumental model was Morpurgo's eventual success in producing findings in accordance with one of the phenomenal models he was willing to accept

The conclusion of Morpurgo's first series of experiments, then, and the production of the observation report which they sustained, was marked by bringing into relations of mutual support of the three elements I have discussed: the material form of the apparatus and the two conceptual models, one instrumental and the other phenomenal. Achieving such relations of mutual support is, I suggest, the defining charactersitic of the successful experiment. (p. 199)

Pickering has made several important and valid points concerning experiment. Most importantly, he has emphasized that an experimental apparatus is initially rarely capable of producing a valid experimental results and that some adjustment, or tinkering, is required before it does. He has also recognized that both the theory of the apparatus and the theory of the phenomena can enter into the production of a valid experimental result. What one may question, however, is the emphasis he places on these theoretical components. From Millikan onwards, experiments had strongly supported the existence of a fundamental unit of charge and charge quantization. The failure of Morpurgo's apparatus produce measurements of integral charge indicated that it was not operating properly and that his theoretical understanding of it was faulty. It was the failure to produce measurements in agreement with what was already known (i.e., the failure of an important experimental check) that caused doubts about Morpurgo's measurements. This was true regardless of the theoretical models available, or those that Morpurgo was willing to accept. It was only when Morpurgo's apparatus could reproduce known measurements that it could be trusted and used to search for fractional charge. To be sure, Pickering has allowed a role for the natural world in the production of the experimental result, but it does not seem to be decisive.

1.2.3 Critical Responses to Pickering

Ackermann has offered a modification of Pickering's view. He suggests that the experimental apparatus itself is a less plastic resource then either the theoretical model of the apparatus or that of the phenomenon.

To repeat, changes in A [the apparatus] can often be seen (in real time, without waiting for accommodation by B [the theoretical model of the apparatus]) as improvements, whereas ‘improvements’ in B don't begin to count unless A is actually altered and realizes the improvements conjectured. It's conceivable that this small asymmetry can account, ultimately, for large scale directions of scientific progress and for the objectivity and rationality of those directions. (Ackermann 1991, p. 456)

Hacking (1992) has also offered a more complex version of Pickering's later view. He suggests that the results of mature laboratory science achieve stability and are self-vindicating when the elements of laboratory science are brought into mutual consistency and support. These are (1) ideas: questions, background knowledge, systematic theory, topical hypotheses, and modeling of the apparatus; (2) things: target, source of modification, detectors, tools, and data generators; and (3) marks and the manipulation of marks: data, data assessment, data reduction, data analysis, and interpretation.

Stable laboratory science arises when theories and laboratory equipment evolve in such a way that they match each other and are mutually self-vindicating. (1992, p. 56)

We invent devices that produce data and isolate or create phenomena, and a network of different levels of theory is true to these phenomena. Conversely we may in the end count them only as phenomena only when the data can be interpreted by theory. (pp. 57–8)

One might ask whether such mutual adjustment between theory and experimental results can always be achieved? What happens when an experimental result is produced by an apparatus on which several of the epistemological strategies, discussed earlier, have been successfully applied, and the result is in disagreement with our theory of the phenomenon? Accepted theories can be refuted. Several examples will be presented below.

Hacking himself worries about what happens when a laboratory science that is true to the phenomena generated in the laboratory, thanks to mutual adjustment and self-vindication, is successfully applied to the world outside the laboratory. Does this argue for the truth of the science. In Hacking's view it does not. If laboratory science does produce happy effects in the “untamed world,… it is not the truth of anything that causes or explains the happy effects” (1992, p. 60).

1.2.4 Pickering and the Dance of Agency

Recently Pickering has offered a somewhat revised account of science. “My basic image of science is a performative one, in which the performances the doings of human and material agency come to the fore. Scientists are human agents in a field of material agency which they struggle to capture in machines (Pickering, 1995, p. 21).” He then discusses the complex interaction between human and material agency, which I interpret as the interaction between experimenters, their apparatus, and the natural world.

The dance of agency, seen asymmetrically from the human end, thus takes the form of a dialectic of resistance and accommodations, where resistance denotes the failure to achieve an intended capture of agency in practice, and accommodation an active human strategy of response to resistance, which can include revisions to goals and intentions as well as to the material form of the machine in question and to the human frame of gestures and social relations that surround it (p. 22).“

Pickering's idea of resistance is illustrated by Morpurgo's observation of continuous, rather than integral or fractional, electrical charge, which did not agree with his expectations. Morpurgo's accommodation consisted of changing his experimental apparatus by using a larger separation between his plates, and also by modifying his theoretical account of the apparatus. That being done, integral charges were observed and the result stabilized by the mutual agreement of the apparatus, the theory of the apparatus, and the theory of the phenomenon. Pickering notes that ”the outcomes depend on how the world is (p. 182).“ ”In this way, then, how the material world is leaks into and infects our representations of it in a nontrivial and consequential fashion. My analysis thus displays an intimate and responsive engagement between scientific knowledge and the material world that is integral to scientific practice (p. 183).“

Nevertheless there is something confusing about Pickering's invocation of the natural world. Although Pickering acknowledges the importance of the natural world, his use of the term ”infects“ seems to indicate that he isn't entirely happy with this. Nor does the natural world seem to have much efficacy. It never seems to be decisive in any of Pickering's case studies. Recall that he argued that physicists accepted the existence of weak neutral currents because ”they could ply their trade more profitably in a world in which the neutral current was real.“ In his account, Morpurgo's observation of continuous charge is important only because it disagrees with his theoretical models of the phenomenon. The fact that it disagreed with numerous previous observations of integral charge doesn't seem to matter. This is further illustrated by Pickering's discussion of the conflict between Morpurgo and Fairbank. As we have seen, Morpurgo reported that he did not observe fractional electrical charges. On the other hand, in the late 1970s and early 1980s, Fairbank and his collaborators published a series of papers in which they claimed to have observed fractional charges (See, for example, LaRue, Phillips et al. 1981 ). Faced with this discord Pickering concludes,

In Chapter 3, I traced out Morpurgo's route to his findings in terms of the particular vectors of cultural extension that he pursued, the particular resistances and accommodations thus precipitated, and the particular interactive stabilizations he achieved. The same could be done, I am sure, in respect of Fairbank. And these tracings are all that needs to said about their divergence. It just happened that the contingencies of resistance and accommodation worked out differently in the two instances. Differences like these are, I think, continually bubbling up in practice, without any special causes behind them (pp. 211-212).

The natural world seems to have disappeared from Pickering's account. There is a real question here as to whether or not fractional charges exist in nature. The conclusions reached by Fairbank and by Morpurgo about their existence cannot both be correct. It seems insufficient to merely state, as Pickering does, that Fairbank and Morpurgo achieved their individual stabilizations and to leave the conflict unresolved. (Pickering does comment that one could follow the subsequent history and see how the conflict was resolved, and he does give some brief statements about it, but its resolution is not important for him). At the very least one should consider the actions of the scientific community. Scientific knowledge is not determined individually, but communally. Pickering seems to acknowledge this. ”One might, therefore, want to set up a metric and say that items of scientific knowledge are more or less objective depending on the extent to which they are threaded into the rest of scientific culture, socially stabilized over time, and so on. I can see nothing wrong with thinking this way…. (p. 196).“ The fact that Fairbank believed in the existence of fractional electrical charges, or that Weber strongly believed that he had observed gravity waves, does not make them right. These are questions about the natural world that can be resolved. Either fractional charges and gravity waves exist or they don't, or to be more cautious we might say that we have good reasons to support our claims about their existence, or we do not.

Another issue neglected by Pickering is the question of whether a particular mutual adjustment of theory, of the apparatus or the phenomenon, and the experimental apparatus and evidence is justified. Pickering seems to believe that any such adjustment that provides stabilization, either for an individual or for the community, is acceptable. Others disagree. They note that experimenters sometimes exclude data and engage in selective analysis procedures in producing experimental results. These practices are, at the very least, questionable as is the use of the results produced by such practices in science. There are, in fact, procedures in the normal practice of science that provide safeguards against them. (For details see Franklin, 2002, Section 1).

The difference in attitudes toward the resolution of discord is one of the important distinctions between Pickering's and Franklin's view of science. Franklin remarks that it is insufficient simply to say that the resolution is socially stabilized. The important question is how that resolution was achieved and what were the reasons offered for that resolution. If we are faced with discordant experimental results and both experimenters have offered reasonable arguments for their correctness, then clearly more work is needed. It seems reasonable, in such cases, for the physics community to search for an error in one, or both, of the experiments.

Pickering discusses yet another difference between his view and that of Franklin. Pickering sees traditional philosophy of science as regarding objectivity “as stemming from a peculiar kind of mental hygiene or policing of thought. This police function relates specifically to theory choice in science, which,… is usually discussed in terms of the rational rules or methods responsible for closure in theoretical debate (p. 197).“ He goes on to remark that,

The most action in recent methodological thought has centered on attempts like Allan Franklin's to extend the methodological approach to experiments by setting up a set of rules for their proper performance. Franklin thus seeks to extend classical discussions of objectivity to the empirical base of science (a topic hitherto neglected in the philosophical tradition but one that, of course the mangle [Pickering's view] also addresses). For an argument between myself and Franklin on the same lines as that laid out below, see (Franklin 1990, Chapter 8; Franklin 1991); and (Pickering 1991); and for commentaries related to that debate, (Ackermann 1991) and (Lynch 1991) (p. 197).”

For further discussion see (Franklin 1993b)). Although Franklin's epistemology of experiment is designed to offer good reasons for belief in experimental results, they are not a set of rules. Franklin regards them as a set of strategies, from which physicists choose, in order to argue for the correctness of their results. As noted above, the strategies offered are neither exclusive or exhaustive.

There is another point of disagreement between Pickering and Franklin. Pickering claims to be dealing with the practice of science, and yet he excludes certain practices from his discussions. One scientific practice is the application of the epistemological strategies outlined above to argue for the correctness of an experimental results. In fact, one of the essential features of an experimental paper is the presentation of such arguments. Writing such papers, a performative act, is also a scientific practice and it would seem reasonable to examine both the structure and content of those papers.

1.2.5 Hacking's The Social Construction of What?

Recently Ian Hacking (1999, chapter 3) has provided an incisive and interesting discussion of the issues that divide the constructivists (Collins, Pickering, etc.) from the rationalists (Stuewer, Franklin, Buchwald, etc.). He sets out three sticking points between the two views: 1) contingency, 2) nominalism, and 3) external explanations of stability.

Contingency is the idea that science is not predetermined, that it could have developed in any one of several successful ways. This is the view adopted by constructivists. Hacking illustrates this with Pickering's account of high-energy physics during the 1970s during which the quark model came to dominate. (See Pickering 1984a).

The constructionist maintains a contingency thesis. In the case of physics, (a) physics theoretical, experimental, material) could have developed in, for example, a nonquarky way, and, by the detailed standards that would have evolved with this alternative physics, could have been as successful as recent physics has been by its detailed standards. Moreover, (b) there is no sense in which this imagined physics would be equivalent to present physics. The physicist denies that. (Hacking 1999, pp. 78-79).

To sum up Pickering's doctrine: there could have been a research program as successful (“progressive”) as that of high-energy physics in the 1970s, but with different theories, phenomenology, schematic descriptions of apparatus, and apparatus, and with a different, and progressive, series of robust fits between these ingredients. Moreover and this is something badly in need of clarification the “different” physics would not have been equivalent to present physics. Not logically incompatible with, just different.

The constructionist about (the idea) of quarks thus claims that the upshot of this process of accommodation and resistance is not fully predetermined. Laboratory work requires that we get a robust fit between apparatus, beliefs about the apparatus, interpretations and analyses of data, and theories. Before a robust fit has been achieved, it is not determined what that fit will be. Not determined by how the world is, not determined by technology now in existence, not determined by the social practices of scientists, not determined by interests or networks, not determined by genius, not determined by anything (pp. 72-73, emphasis added).

Much depends here on what Hacking means by “determined.” If he means entailed then one must agree with him. It is doubtful that the world, or more properly, what we can learn about it, entails a unique theory. If not, as seems more plausible, he means that the way the world is places no restrictions on that successful science, then the rationalists disagree strongly. They to argue that the way the world is restricts the kinds of theories that will fit the phenomena, the kinds of apparatus we can build, and the results we can obtain with such apparatuses. To think otherwise seems silly. Consider a homey example. It seems highly unlikely that someone can come up with a successful theory in which objects whose density is greater than that of air fall upwards. This is not a caricature of the view Hacking describes. Describing Pickering's view, he states, “Physics did not need to take a route that involved Maxwell's Equations, the Second Law of Thermodynamics, or the present values of the velocity of light (p. 70).” Although one may have some sympathy for this view as regards Maxwell's Equations or the Second Law of Thermodynamics, one may not agree about the value of the speed of light. That is determined by the way the world is. Any successful theory of light must give that value for its speed.

At the other extreme are the “inevitablists,” among whom Hacking classifies most scientists. He cites Sheldon Glashow, a Nobel Prize winner, “Any intelligent alien anywhere would have come upon the same logical system as we have to explain the structure of protons and the nature of supernovae (Glashow 1992, p. 28).”

Another difference between Pickering and Franklin on contingency concerns the question of not whether an alternative is possible, but rather whether there are reasons why that alternative should be pursued. Pickering seems to identify can with ought.

In the late 1970s there was a disagreement between the results of low-energy experiments on atomic parity violation (the violation of left-right symmetry) performed at the University of Washington and at Oxford University and the result of a high-energy experiment on the scattering of polarized electrons from deuterium (the SLAC E122 experiment). The atomic-parity violation experiments failed to observe the parity-violating effects predicted by the Weinberg- Salam (W-S) unified theory of electroweak interactions, whereas the SLAC experiment observed the predicted effect. These early atomic physics results were quite uncertain in themselves and that uncertainty was increased by positive results obtained in similar experiments at Berkeley and Novosibirsk. At the time the theory had other evidential support, but was not universally accepted. Pickering and Franklin are in agreement that the W-S theory was accepted on the basis of the SLAC E122 result. They differ dramatically in their discussions of the experiments. Their difference on contingency concerns a particular theoretical alternative that was proposed at the time to explain the discrepancy between the experimental results.

Pickering asked why a theorist might not have attempted to find a variant of electroweak gauge theory that might have reconciled the Washington-Oxford atomic parity results with the positive E122 result. (What such a theorist was supposed to do with the supportive atomic parity results later provided by experiments at Berkeley and at Novosibirsk is never mentioned). “But though it is true that E122 analysed their data in a way that displayed the improbability [the probability of the fit to the hybrid model was 6 × 10−4] of a particular class of variant gauge theories, the so-called ‘hybrid models,’ I do not believe that it would have been impossible to devise yet more variants” (Pickering 1991, p. 462). Pickering notes that open-ended recipes for constructing such variants had been written down as early as 1972 (p. 467). It would have been possible to do so, but one may ask whether or not a scientist might have wished to do so. If the scientist agreed with Franklin's view that the SLAC E122 experiment provided considerable evidential weight in support of the W-S theory and that a set of conflicting and uncertain results from atomic parity-violation experiments gave an equivocal answer on that support, what reason would they have had to invent an alternative?

This is not to suggest that scientists do not, or should not, engage in speculation, but rather that there was no necessity to do so in this case. Theorists often do propose alternatives to existing, well-confirmed theories.

Constructivist case studies always seem to result in the support of existing, accepted theory (Pickering 1984a; 1984b; 1991; Collins 1985; Collins and Pinch 1993). One criticism implied in such cases is that alternatives are not considered, that the hypothesis space of acceptable alternatives is either very small or empty. One may seriously question this. Thus, when the experiment of Christenson et al. (1964) detected Ko2 decay into two pions, which seemed to show that CP symmetry (combined particle-antiparticle and space inversion symmetry) was violated, no fewer than 10 alternatives were offered. These included (1) the cosmological model resulting from the local dysymmetry of matter and antimatter, (2) external fields, (3) the decay of the Ko2 into a Ko1 with the subsequent decay of the Ko1 into two pions, which was allowed by the symmetry, (4) the emission of another neutral particle, “the paritino,” in the Ko2 decay, similar to the emission of the neutrino in beta decay, (5) that one of the pions emitted in the decay was in fact a “spion,” a pion with spin one rather than zero, (6) that the decay was due to another neutral particle, the L, produced coherently with the Ko, (7) the existence of a “shadow” universe, which interacted with out universe only through the weak interactions, and that the decay seen was the decay of the “shadow Ko2,” (8) the failure of the exponential decay law, 9) the failure of the principle of superposition in quantum mechanics, and 10) that the decay pions were not bosons.

As one can see, the limits placed on alternatives were not very stringent. By the end of 1967, all of the alternatives had been tested and found wanting, leaving CP symmetry unprotected. Here the differing judgments of the scientific community about what was worth proposing and pursuing led to a wide variety of alternatives being tested.

Hacking's second sticking point is nominalism, or name-ism. He notes that in its most extreme form nominalism denies that there is anything in common or peculiar to objects selected by a name, such as “Douglas fir” other than that they are called Douglas fir. Opponents contend that good names, or good accounts of nature, tell us something correct about the world. This is related to the realism-antirealism debate concerning the status of unobservable entities that has plagued philosophers for millennia. For example Bas van Fraassen (1980), an antirealist, holds that we have no grounds for belief in unobservable entities such as the electron and that accepting theories about the electron means only that we believe that the things the theory says about observables is true. A realist claims that electrons really exist and that as, for example, Wilfred Sellars remarked, “to have good reason for holding a theory is ipso facto to have good reason for holding that the entities postulated by the theory exist (Sellars 1962, p. 97).” In Hacking's view a scientific nominalist is more radical than an antirealist and is just as skeptical about fir trees as they are about electrons. A nominalist further believes that the structures we conceive of are properties of our representations of the world and not of the world itself. Hacking refers to opponents of that view as inherent structuralists.

Hacking also remarks that this point is related to the question of “scientific facts.” Thus, constructivists Latour and Woolgar originally entitled their book Laboratory Life: The Social Construction of Scientific Facts (1979). Andrew Pickering entitled his history of the quark model Constructing Quarks (Pickering 1984a). Physicists argue that this demeans their work. Steven Weinberg, a realist and a physicist, criticized Pickering's title by noting that no mountaineer would ever name a book Constructing Everest. For Weinberg, quarks and Mount Everest have the same ontological status. They are both facts about the world. Hacking argues that constructivists do not, despite appearances, believe that facts do not exist, or that there is no such thing as reality. He cites Latour and Woolgar “that ‘out-there-ness' is a consequence of scientific work rather than its cause (Latour and Woolgar 1986, p. 180).” Hacking reasonably concludes that,

Latour and Woolgar were surely right. We should not explain why some people believe that p by saying that p is true, or corresponds to a fact, or the facts. For example: someone believes that the universe began with what for brevity we call a big bang. A host of reasons now supports this belief. But after you have listed all the reasons, you should not add, as if it were an additional reason for believing in the big bang, ‘and it is true that the universe began with a big bang.’ Or ‘and it is a fact.'This observation has nothing peculiarly to do with social construction. It could equally have been advanced by an old-fashioned philosopher of language. It is a remark about the grammar of the verb ‘to explain’ (Hacking 1999, pp. 80–81).

One might add, however, that the reasons Hacking cites as supporting that belief are given to us by valid experimental evidence and not by the social and personal interests of scientists. Latour and Woolgar might not agree. Franklin argues that we have good reasons to believe in facts, and in the entities involved in our theories, always remembering, of course, that science is fallible.

Hacking's third sticking point is the external explanations of stability.

The constructionist holds that explanations for the stability of scientific belief involve, at least in part, elements that are external to the content of science. These elements typically include social factors, interests, networks, or however they be described. Opponents hold that whatever be the context of discovery, the explanation of stability is internal to the science itself (Hacking 1999, p. 92).

Rationalists think that most science proceeds as it does in the light of good reasons produced by research. Some bodies of knowledge become stable because of the wealth of good theoretical and experimental reasons that can be adduced for them. Constructivists think that the reasons are not decisive for the course of science. Nelson (1994) concludes that this issue will never be decided. Rationalists, at least retrospectively, can always adduce reasons that satisfy them. Constructivists, with equal ingenuity, can always find to their own satisfaction an openness where the upshot of research is settled by something other than reason. Something external. That is one way of saying we have found an irresoluble “sticking point” (pp. 91-92)

Thus, there is a rather severe disagreement on the reasons for the acceptance of experimental results. For some, like Staley, Galison and Franklin, it is because of epistemological arguments. For others, like Pickering, the reasons are utility for future practice and agreement with existing theoretical commitments. Although the history of science shows that the overthrow of a well-accepted theory leads to an enormous amount of theoretical and experimental work, proponents of this view seem to accept it as unproblematical that it is always agreement with existing theory that has more future utility. Hacking and Pickering also suggest that experimental results are accepted on the basis of the mutual adjustment of elements which includes the theory of the phenomenon.

Nevertheless, everyone seems to agree that a consensus does arise on experimental results.

2. The Roles of Experiment

2.1 A Life of Its Own

Although experiment often takes its importance from its relation to theory, Hacking pointed out that it often has a life of its own, independent of theory. He notes the pristine observations of Carolyn Herschel's discovery of comets, William Herschel's work on “radiant heat,” and Davy's observation of the gas emitted by algae and the flaring of a taper in that gas. In none of these cases did the experimenter have any theory of the phenomenon under investigation. One may also note the nineteenth century measurements of atomic spectra and the work on the masses and properties on elementary particles during the 1960s. Both of these sequences were conducted without any guidance from theory.

In deciding what experimental investigation to pursue, scientists may very well be influenced by the equipment available and their own ability to use that equipment (McKinney 1992). Thus, when the Mann-O'Neill collaboration was doing high energy physics experiments at the Princeton-Pennsylvania Accelerator during the late 1960s, the sequence of experiments was (1) measurement of the K+ decay rates, (2) measurement of the K+e3 branching ratio and decay spectrum, (3) measurement of the K+e2 branching ratio, and (4) measurement of the form factor in K+e3 decay. These experiments were performed with basically the same experimental apparatus, but with relatively minor modifications for each particular experiment. By the end of the sequence the experimenters had become quite expert in the use of the apparatus and knowledgeable about the backgrounds and experimental problems. This allowed the group to successfully perform the technically more difficult experiments later in the sequence. We might refer to this as “instrumental loyalty” and the “recycling of expertise” (Franklin 1997b). This meshes nicely with Galison's view of experimental traditions. Scientists, both theorists and experimentalists, tend to pursue experiments and problems in which their training and expertise can be used.

Hacking also remarks on the “noteworthy observations” on Iceland Spar by Bartholin, on diffraction by Hooke and Grimaldi, and on the dispersion of light by Newton. “Now of course Bartholin, Grimaldi, Hooke, and Newton were not mindless empiricists without an ‘idea’ in their heads. They saw what they saw because they were curious, inquisitive, reflective people. They were attempting to form theories. But in all these cases it is clear that the observations preceded any formulation of theory” (Hacking 1983, p. 156). In all of these cases we may say that these were observations waiting for, or perhaps even calling for, a theory. The discovery of any unexpected phenomenon calls for a theoretical explanation.

2.2 Confirmation and Refutation

Nevertheless several of the important roles of experiment involve its relation to theory. Experiment may confirm a theory, refute a theory, or give hints to the mathematical structure of a theory.

2.2.1 The Discovery of Parity Nonconservation: A Crucial Experiment

Let us consider first an episode in which the relation between theory and experiment was clear and straightforward. This was a “crucial” experiment, one that decided unequivocally between two competing theories, or classes of theory. The episode was that of the discovery that parity, mirror-reflection symmetry or left-right symmetry, is not conserved in the weak interactions. (For details of this episode see Franklin (1986, Ch. 1) and Appendix 1). Experiments showed that in the beta decay of nuclei the number of electrons emitted in the same direction as the nuclear spin was different from the number emitted opoosite to the spin direction. This was a clear demonstartion of parity vilation in the weak interactions.

2.2.2 The Discovery of CP Violation: A Persuasive Experiment

After the discovery of parity and charge conjugation nonconservation, and following a suggestion by Landau, physicists considered CP (combined parity and particle-antiparticle symmetry), which was still conserved in the experiments, as the appropriate symmetry. One consequence of this scheme, if CP were conserved, was that the K1o meson could decay into two pions, whereas the K2o meson could not.[10] Thus, observation of the decay of K2o into two pions would indicate CP violation. The decay was observed by a group at Princeton University. Although several alternative explanations were offered, experiments eliminated each of the alternatives leaving only CP violation as an explanation of the experimental result. (For details of this episode see Franklin (1986, Ch. 3) and Appendix 2.)

2.2.3 The Discovery of Bose-Einstein Condensation: Confirmation After 70 Years

In both of the episodes discussed previously, those of parity nonconservation and of CP violation, we saw a decision between two competing classes of theories. This episode, the discovery of Bose-Einstein condensation (BEC), illustrates the confirmation of a specific theoretical prediction 70 years after the theoretical prediction was first made. Bose (1924) and Einstein (1924; 1925) predicted that a gas of noninteracting bosonic atoms will, below a certain temperature, suddenly develop a macroscopic population in the lowest energy quantum state.[11] (For details of this episode see Appendix 3.)

2.3 Complications

In the three episodes discussed in the previous section, the relation between experiment and theory was clear. The experiments gave unequivocal results and there was no ambiguity about what theory was predicting. None of the conclusions reached has since been questioned. Parity and CP symmetry are violated in the weak interactions and Bose-Einstein condensation is an accepted phenomenon. In the practice of science things are often more complex. Experimental results may be in conflict, or may even be incorrect. Theoretical calculations may also be in error or a correct theory may be incorrectly applied. There are even cases in which both experiment and theory are wrong. As noted earlier, science is fallible. In this section I will briefly discuss several episodes which illustrate these complexities.

2.3.1 The Fall of the Fifth Force

The episode of the fifth force is the case of a refutation of an hypoothesis, but only after a disagreement between experimental results was resolved. The “Fifth Force” was a proposed modification of Newton's Law of Universal Gravitation. The initial experiments gave conflicting results: one supported the existence of the Fifth Force whereas the other argued against it. After numerous repetitions of the experiment, the discord was resolved and a consensus reached that the Fifth Force did not exist. (For details of this episode see Appendix 4.)

2.3.2 Right Experiment, Wrong Theory: The Stern-Gerlach Experiment[12]

The Stern-Gerlach experiment was regarded as crucial at the time it was performed, but, in fact, wasn't. In the view of the physics community it decided the issue between two theories, refuting one and supporting the other. In the light of later work, however, the refutation stood, but the confirmation was questionable. In fact, the experimental result posed problems for the theory it had seemingly confirmed. A new theory was proposed and although the Stern-Gerlach result initially also posed problems for the new theory, after a modification of that new theory, the result confirmed it. In a sense, it was crucial after all. It just took some time.

The Stern-Gerlach experiment provides evidence for the existence of electron spin. These experimental results were first published in 1922, although the idea of electron spin wasn't proposed by Goudsmit and Uhlenbeck until 1925 (1925; 1926). One might say that electron spin was discovered before it was invented. (For details of this episode see Appendix 5).

2.3.3 Sometimes Refutation Doesn't Work: The Double-Scattering of Electrons

In the last section we saw some of the difficulty inherent in experiment-theory comparison. One is sometimes faced with the question of whether the experimental apparatus satisfies the conditions required by theory, or conversely, whether the appropriate theory is being compared to the experimental result. A case in point is the history of experiments on the double-scattering of electrons by heavy nuclei (Mott scattering) during the 1930s and the relation of these results to Dirac's theory of the electron, an episode in which the question of whether or not the experiment satisfied the conditions of the theoretical calculation was central. Initially, experiments disagreed with Mott's calculation, casting doubt on the underlying Dirac theory. After more than a decade of work, both experimental and theoretical, it was realized that there was a background effect in the experiments that masked the predicted effect. When the background was eliminated experiment and theory agreed. (Appendix 6)

2.4 Other Roles

2.4.1 Evidence for a New Entity: J.J. Thomson and the Electron

Experiment can also provide us with evidence for the existence of the entities involved in our theories. J.J. Thomson's experiments on cathode rays provided grounds for belief in the existence of electrons. (For details of this episode see Appendix 7).

2.4.2 The Articulation of Theory: Weak Interactions

Experiment can also help to articulate a theory. Experiments on beta decay during from the 1930s to the 1950s detremined the precise mathematical form of Fermi's theory of beta decay. (For details of this episode see Appendix 8.)

2.5 Some Thoughts on Experiment in Biology

2.5.1 Epistemological Strategies and the Peppered Moth Experiment

One comment that has been made concerning the philosophy of experiment is that all of the examples are taken from physics and are therefore limited. In this section arguments will be presented that these discussions also apply to biology.

Although all of the illustrations of the epistemology of experiment come from physics, David Rudge (1998; 2001) has shown that they are also used in biology. His example is Kettlewell's (1955; 1956; 1958) evolutionary biology experiments on the Peppered Moth, Biston betularia. The typical form of the moth has a pale speckled appearance and there are two darker forms, f. carbonaria, which is nearly black, and f. insularia, which is intermediate in color. The typical form of the moth was most prevalent in the British Isles and Europe until the middle of the nineteenth century. At that time things began to change. Increasing industrial pollution had both darkened the surfaces of trees and rocks and had also killed the lichen cover of the forests downwind of pollution sources. Coincident with these changes, naturalists had found that rare, darker forms of several moth species, in particular the Peppered Moth, had become common in areas downwind of pollution sources.

Kettlewell attempted to test a selectionist explanation of this phenomenon. E.B. Ford (1937; 1940) had suggested a two-part explanation of this effect: 1) darker moths had a superior physiology and 2) the spread of the melanic gene was confined to industrial areas because the darker color made carbonaria more conspicuous to avian predators in rural areas and less conspicuous in polluted areas. Kettlewell believed that Ford had established the superior viability of darker moths and he wanted to test the hypothesis that the darker form of the moth was less conspicuous to predators in industrial areas.

Kettlewell's investigations consisted of three parts. In the first part he used human observers to investigate whether his proposed scoring method would be accurate in assessing the relative conspicuousness of different types of moths against different backgrounds. The tests showed that moths on “correct” backgrounds, typical on lichen covered backgrounds and dark moths on soot-blackened backgrounds were almost always judged inconspicuous, whereas moths on “incorrect” backgrounds were judged conspicuous.

The second step involved releasing birds into a cage containing all three types of moth and both soot-blackened and lichen covered pieces of bark as resting places. After some difficulties (see Rudge 1998 for details), Kettlewell found that birds prey on moths in an order of conspicuousness similar to that gauged by human observers.

The third step was to investigate whether birds preferentially prey on conspicuous moths in the wild. Kettlewell used a mark-release-recapture experiment in both a polluted environment (Birmingham) and later in an unpolluted wood. He released 630 marked male moths of all three types in an area near Birmingham, which contained predators and natural boundaries. He then recaptured the moths using two different types of trap, each containing virgin females of all three types to guard against the possibility of pheromone differences.

Kettlewell found that carbonaria was twice as likely to survive in soot-darkened environments (27.5 percent) as was typical (12.7 percent). He worried, however, that his results might be an artifact of his experimental procedures. Perhaps the traps used were more attractive to one type of moth, that one form of moth was more likely to migrate, or that one type of moth just lived longer. He eliminated the first alternative by showing that the recapture rates were the same for both types of trap. The use of natural boundaries and traps placed beyond those boundaries eliminated the second, and previous experiments had shown no differences in longevity. Further experiments in polluted environments confirmed that carbonaria was twice as likely to survive as typical. An experiment in an unpolluted environment showed that typical was three times as likely to survive as carbonaria. Kettlewell concluded that such selection was the cause of the prevalence of carbonaria in polluted environments.

Rudge also demonstrates that the strategies used by Kettlewell are those described above in the epistemology of experiment. His examples are given in Table 1. (For more details see Rudge 1998).

Epistemological strategies Examples from Kettlewell
1. Experimental checks and calibration in which the apparatus reproduces known phenomena. Use of the scoring experiment to verify that the proposed scoring methods would be feasible and objective.
2. Reproducing artifacts that are known in advance to be present. Analysis of recapture figures for endemic betularia populations.
3. Elimination of plausible sources of background and alternative explanations of the result. Use of natural barriers to minimize migration.
4. Using the results themselves to argue for their validity. Filming the birds preying on the moths.
5. Using an independently well-corroborated theory of the phenomenon to explain the results. Use of Ford's theory of the spread of industrial melanism.
6. Using an apparatus based on a well- corroborated theory. Use of Fisher, Ford, and Shepard techniques. [The mark-release-capture method had been used in several earlier experiments]
7. Using statistical arguments. Use and analysis of large numbers of moths.
8. Blind analysis Not used.
9. Intervention, in which the experimenter manipulates the object under observation Not present
10. Independent confirmation using different experiments. Use of two different types of traps to recapture the moths.

Table 1. Examples of epistemological strategies used by experimentalists in evolutionary biology, from H.B.D. Kettlewell's (1955, 1956, 1958) investigations of industrial melanism. (See Rudge 1998).

2.5.2 The Meselson-Stahl Experiment: “The Most Beautiful Experiment in Biology”

The roles that experiment plays in physics are also those it plays in biology. In the previous section we have seen that Ketllewell's experiments both test and confirm a theory. I discussed earlier a set of crucial experiments that decided between two competing classes of theories, those that conserved parity and those that did not. In this section I will discuss an experiment that decided among three competing mechanisms for the replication of DNA, the molecule now believed to be responsible for heredity. This is another crucial experiment. It strongly supported one proposed mechanism and argued against the other two. (For details of this episode see (Holmes 2001)).

In 1953 Francis Crick and James Watson proposed a three-dimensional structure for deoxyribonucleic acid (DNA) (Watson and Crick 1953a). Their proposed structure consisted of two polynucleotide chains helically wound about a common axis. This was the famous “Double Helix”. The chains were bound together by combinations of four nitrogen bases — adenine, thymine, cytosine, and guanine. Because of structural requirements only the base pairs adenine-thymine and cytosine-guanine are allowed. Each chain is thus complementary to the other. If there is an adenine base at a location in one chain there is a thymine base at the same location on the other chain, and vice versa. The same applies to cytosine and guanine. The order of the bases along a chain is not, however, restricted in any way, and it is the precise sequence of bases that carries the genetic information.

The significance of the proposed structure was not lost on Watson and Crick when they made their suggestion. They remarked, “It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.”

Possible mechanisms for DNA replication

Figure 21: Possible mechanisms for DNA replication. (Left) Conservative replication. Each of the two strands of the parent DNA is replicated to yield the unchanged parent DNA and one newly synthesized DNA. The second generation consists of one parent DNA and three new DNAs. (Center) Semiconservative replication. Each first generation DNA molecule contains one strand of the parent DNA and one newly synthesized strand. The second generation consists of two hybrid DNAs and two new DNAs. (Right) Dispersive replication. The parent chains break at intervals, and the parental segments combine with new segments to form the daughter chains. The darker segments are parental DNA and the lighter segments are newly synthesized DNA. From Lehninger (1975).

If DNA was to play this crucial role in genetics, then there must be a mechanism for the replication of the molecule. Within a short period of time following the Watson-Crick suggestion, three different mechanisms for the replication of the DNA molecule were proposed (Delbruck and Stent 1957). These are illustrated in Figure 21. The first, proposed by Gunther Stent and known as conservative replication, suggested that each of the two strands of the parent DNA molecule is replicated in new material. This yields a first generation which consists of the original parent DNA molecule and one newly-synthesized DNA molecule. The second generation will consist of the parental DNA and three new DNAs.

The second proposed mechanism, known as semiconservative replication is when each strand of the parental DNA acts as a template for a second newly-synthesized complementary strand, which then combines with the original strand to form a DNA molecule. This was proposed by Watson and Crick (1953b). The first generation consists of two hybrid molecules, each of which contains one strand of parental DNA and one newly synthesized strand. The second generation consists of two hybrid molecules and two totally new DNAs. The third mechanism, proposed by Max Delbruck, was dispersive replication, in which the parental DNA chains break at intervals and the parental segments combine with new segments to form the daughter strands.

In this section the experiment performed by Matthew Meselson and Franklin Stahl, which has been called “the most beautiful experiment in biology”, and which was designed to answer the question of the correct DNA replication mechanism will be discussed (Meselson and Stahl 1958). Meselson and Stahl described their proposed method. “We anticipated that a label which imparts to the DNA molecule an increased density might permit an analysis of this distribution by sedimentation techniques. To this end a method was developed for the detection of small density differences among macromolecules. By use of this method, we have observed the distribution of the heavy nitrogen isotope 15N among molecules of DNA following the transfer of a uniformly 15N-labeled, exponentially growing bacterial population to a growth medium containing the ordinary nitrogen isotope 14N” (Meselson and Stahl 1958, pp. 671-672).

Meselson-Stahl schematic
Figure 22: Schematic representation of the Meselson-Stahl experiment. From Watson (1965).

The experiment is described schematically in Figure 22. Meselson and Stahl placed a sample of DNA in a solution of cesium chloride. As the sample is rotated at high speed the denser material travels further away from the axis of rotation than does the less dense material. This results in a solution of cesium chloride that has increasing density as one goes further away from the axis of rotation. The DNA reaches equilibrium at the position where its density equals that of the solution. Meselson and Stahl grew E. coli bacteria in a medium that contained ammonium chloride (NH4Cl) as the sole source of nitrogen. They did this for media that contained either 14N, ordinary nitrogen, or 15N, a heavier isotope. By destroying the cell membranes they could obtain samples of DNA which contained either 14N or 15N. They first showed that they could indeed separate the two different mass molecules of DNA by centrifugation (Figure 23). The separation of the two types of DNA is clear in both the photograph obtained by absorbing ultraviolet light and in the graph showing the intensity of the signal, obtained with a densitometer. In addition, the separation between the two peaks suggested that they would be able to distinguish an intermediate band composed of hybrid DNA from the heavy and light bands. These early results argued both that the experimental apparatus was working properly and that all of the results obtained were correct. It is difficult to imagine either an apparatus malfunction or a source of experimental background that could reproduce those results. This is similar, although certainly not identical, to Galileo's observation of the moons of Jupiter or to Millikan's measurement of the charge of the electron. In both of those episodes it was the results themselves that argued for their correctness.

Meselson-Stahl schematic
Figure 23: The separation of 14N DNA from 15N DNA by centrifugation. The band on the left is 14N DNA and that on the right is from 15N DNA. From Meselson and Stahl (1958).

Meselson and Stahl then produced a sample of E coli bacteria containing only 15N by growing it in a medium containing only ammonium chloride with 15N (15NH4Cl) for fourteen generations. They then abruptly changed the medium to 14N by adding a tenfold excess of 14NH4CL. Samples were taken just before the addition of 14N and at intervals afterward for several generations. The cell membranes were broken to release the DNA into the solution and the samples were centrifuged and ultraviolet absorption photographs taken. In addition, the photographs were scanned with a recording densitometer. The results are shown in Figure 24, showing both the photographs and the densitometer traces. The figure shows that one starts only with heavy (fully-labeled) DNA. As time proceeds one sees more and more half-labeled DNA, until at one generation time only half-labeled DNA is present. “Subsequently only half labeled DNA and completely unlabeled DNA are found. When two generation times have elapsed after the addition of 14N half-labeled and unlabeled DNA are present in equal amounts” (p. 676). (This is exactly what the semiconservative replication mechanism predicts). By four generations the sample consists almost entirely of unlabeled DNA. A test of the conclusion that the DNA in the intermediate density band was half labeled was provided by examination of a sample containing equal amounts of generations 0 and 1.9. If the semiconservative mechanism is correct then Generation 1.9 should have approximately equal amounts of unlabeled and half-labeled DNA, whereas Generation 0 contains only fully-labeled DNA. As one can see, there are three clear density bands and Meselson and Stahl found that the intermediate band was centered at (50 ± 2) percent of the difference between the 14N and 15N bands, shown in the bottom photograph (Generations 0 and 4.1). This is precisely what one would expect if that DNA were half labeled.

Absorption photgraphs and densitometer traces

Figure 24: (Left) Utraviolet absorption photographs showing DNA bands from centrifugation of DNA from E. Coli sampled at various times after the addition of an excess of 14N substrates to a growing 15N culture. (Right) Densitometer traces of the photographs. The initial sample is all heavy (15N DNA). As time proceeds a second intermediate band begins to appear until at one generation all of the sample is of intermediate mass (Hybrid DNA). At longer times a band of light DNA appears, until at 4.1 generations the sample is almost all lighter DNA. This is exactly what is predicted by the Watson-Crick semiconservative mechanism. From Meselson and Stahl (1958)

Meselson and Stahl stated their results as follows, “The nitrogen of DNA is divided equally between two subunits which remain intact through many generations…. Following replication, each daughter molecule has received one parental subunit” (p. 676).

Meselson and Stahl also noted the implications of their work for deciding among the proposed mechanisms for DNA replication. In a section labeled “The Watson-Crick Model” they noted that, “This [the structure of the DNA molecule] suggested to Watson and Crick a definite and structurally plausible hypothesis for the duplication of the DNA molecule. According to this idea, the two chains separate, exposing the hydrogen-bonding sites of the bases. Then, in accord with base-pairing restrictions, each chain serves as a template for the synthesis of its complement. Accordingly, each daughter molecule contains one of the parental chains paired with a newly synthesized chain…. The results of the present experiment are in exact accord with the expectations of the Watson-Crick model for DNA replication” (pp. 677-678).

It also showed that the dispersive replication mechanism proposed by Delbruck, which had smaller subunits, was incorrect. “Since the apparent molecular weight of the subunits so obtained is found to be close to half that of the intact molecule, it may be further concluded that the subunits of the DNA molecule which are conserved at duplication are single, continuous structures. The scheme for DNA duplication proposed by Delbruck is thereby ruled out” (p. 681). Later work by John Cairns and others showed that the subunits of DNA were the entire single polynucleotide chains of the Watson-Crick model of DNA structure.

The Meselson-Stahl experiment is a crucial experiment in biology. It decided between three proposed mechanisms for the replication of DNA. It supported the Watson-Crick semiconservative mechanism and eliminated the conservative and dispersive mechanisms. It played a similar role in biology to that of the experiments that demonstrated the nonconservation of parity did in physics. Thus, we have seen evidence that experiment plays similar roles in both biology and physics and also that the same epistemological strategies are used in both disciplines.

3. Conclusion

In this essay varying views on the nature of experimental results have been presented. Some argue that the acceptance of experimental results is based on epistemological arguments, whereas others base acceptance on future utility, social interests, or agreement with existing community commitments. Everyone agrees , however, that for whatever reasons, a consensus is reached on experimental results. These results then play many important roles in physics and we have examined several of these roles, although certainly not all of them. We have seen experiment deciding between two competing theories, calling for a new theory, confirming a theory, refuting a theory, providing evidence that determined the mathematical form of a theory, and providing evidence for the existence of an elementary particle involved in an accepted theory. We have also seen that experiment has a life of its own, independent of theory. If, as I believe, epistemological procedures provide grounds for reasonable belief in experimental results, then experiment can legitimately play the roles I have discussed and can provide the basis for scientific knowledge.


Principal Works:

Other Suggested Reading

Other Internet Resources

[Please contact the author with suggestions.]

Related Entries

confirmation | logic: inductive | rationalism vs. empiricism | scientific method | scientific realism


I am grateful to Professor Carl Craver for both his comments on the manuscript and for his suggestions for further reading.