Measurement in Science

First published Mon Jun 15, 2015

Measurement is an integral part of modern science as well as of engineering, commerce, and daily life. Measurement is often considered a hallmark of the scientific enterprise and a privileged source of knowledge relative to qualitative modes of inquiry.[1] Despite its ubiquity and importance, there is little consensus among philosophers as to how to define measurement, what sorts of things are measurable, or which conditions make measurement possible. Most (but not all) contemporary authors agree that measurement is an activity that involves interaction with a concrete system with the aim of representing aspects of that system in abstract terms (e.g., in terms of classes, numbers, vectors etc.) But this characterization also fits various kinds of perceptual and linguistic activities that are not usually considered measurements, and is therefore too broad to count as a definition of measurement. Moreover, if “concrete” implies “real”, this characterization is also too narrow, as measurement often involves the representation of ideal systems such as the average household or an electron at complete rest.

Philosophers have written on a variety of conceptual, metaphysical, semantic and epistemological issues related to measurement. This entry will survey the central philosophical standpoints on the nature of measurement, the notion of measurable quantity and related epistemological issues. It will refrain from elaborating on the many discipline-specific problems associated with measurement and focus on issues that have a general character.

1. Overview

Modern philosophical discussions about measurement—spanning from the late nineteenth century to the present day—may be divided into several strands of scholarship. These strands reflect different perspectives on the nature of measurement and the conditions that make measurement possible and reliable. The main strands are mathematical theories of measurement, operationalism, conventionalism, realism, information-theoretic accounts and model-based accounts. These strands of scholarship do not, for the most part, constitute directly competing views. Instead, they are best understood as highlighting different and complementary aspects of measurement. The following is a very rough overview of these perspectives:

  1. Mathematical theories of measurement view measurement as the mapping of qualitative empirical relations to relations among numbers (or other mathematical entities).
  2. Operationalists and conventionalists view measurement as a set of operations that shape the meaning and/or regulate the use of a quantity-term.
  3. Realists view measurement as the estimation of mind-independent properties and/or relations.
  4. Information-theoretic accounts view measurement as the gathering and interpretation of information about a system.
  5. Model-based accounts view measurement as the coherent assignment of values to parameters in a theoretical and/or statistical model of a process.

These perspectives are in principle consistent with each other. While mathematical theories of measurement deal with the mathematical foundations of measurement scales, operationalism and conventionalism are primarily concerned with the semantics of quantity terms, realism is concerned with the metaphysical status of measurable quantities, and information-theoretic and model-based accounts are concerned with the epistemological aspects of measuring. Nonetheless, the subject domain is not as neatly divided as the list above suggests. Issues concerning the metaphysics, epistemology, semantics and mathematical foundations of measurement are interconnected and often bear on one another. Hence, for example, operationalists and conventionalists have often adopted anti-realist views, and proponents of model-based accounts have argued against the prevailing empiricist interpretation of mathematical theories of measurement. These subtleties will become clear in the following discussion.

The list of strands of scholarship is neither exclusive nor exhaustive. It reflects the historical trajectory of the philosophical discussion thus far, rather than any principled distinction among different levels of analysis of measurement. Some philosophical works on measurement belong to more than one strand, while many other works do not squarely fit either. This is especially the case since the early 2000s, when measurement returned to the forefront of philosophical discussion after several decades of relative neglect. This recent body of scholarship is sometimes called “the epistemology of measurement”, and includes a rich array of works that cannot yet be classified into distinct schools of thought. The last section of this entry will be dedicated to surveying some of these developments.

2. Quantity and Magnitude: A Brief History

Although the philosophy of measurement formed as a distinct area of inquiry only during the second half of the nineteenth century, fundamental concepts of measurement such as magnitude and quantity have been discussed since antiquity. According to Euclid’s Elements, a magnitude—such as a line, a surface or a solid—measures another when the latter is a whole multiple of the former (Book V, def. 1 & 2). Two magnitudes have a common measure when they are both whole multiples of some magnitude, and are incommensurable otherwise (Book X, def. 1). The discovery of incommensurable magnitudes allowed Euclid and his contemporaries to develop the notion of a ratio of magnitudes. Ratios can be either rational or irrational, and therefore the concept of ratio is more general than that of measure (Michell 2003, 2004; Grattan-Guinness 1996).

Aristotle distinguished between quantities and qualities. Examples of quantities are numbers, lines, surfaces, bodies, time and place, whereas examples of qualities are justice, health, hotness and paleness (Categories §6 and §8). According to Aristotle, quantities admit of equality and inequality but not of degrees, as “one thing is not more four-foot than another” (ibid. 6.6a19). Qualities, conversely, do not admit of equality or inequality but do admit of degrees, “for one thing is called more pale or less pale than another” (ibid. 8.10b26). Aristotle did not clearly specify whether degrees of qualities such as paleness correspond to distinct qualities, or whether the same quality, paleness, was capable of different intensities. This topic was at the center of an ongoing debate in the thirteenth and fourteenth centuries (Jung 2011). Duns Scotus supported the “addition theory”, according to which a change in the degree of a quality can be explained by the addition or subtraction of smaller degrees of that quality (2011: 553). This theory was later refined by Nicole Oresme, who used geometrical figures to represent changes in the intensity of qualities such as velocity (Clagett 1968; Sylla 1971). Oresme’s geometrical representations established a subset of qualities that were amenable to quantitative treatment, thereby challenging the strict Aristotelian dichotomy between quantities and qualities. These developments made possible the formulation of quantitative laws of motion during the sixteenth and seventeenth centuries (Grant 1996).

The concept of qualitative intensity was further developed by Leibniz and Kant. Leibniz’s “principle of continuity” stated that all natural change is produced by degrees. Leibniz argued that this principle applies not only to changes in extended magnitudes such as length and duration, but also to intensities of representational states of consciousness, such as sounds (Jorgensen 2009; Diehl 2012). Kant is thought to have relied on Leibniz’s principle of continuity to formulate his distinction between extensive and intensive magnitudes. According to Kant, extensive magnitudes are those “in which the representation of the parts makes possible the representation of the whole” (1787: A162/B203). An example is length: a line can only be mentally represented by a successive synthesis in which parts of the line join to form the whole. For Kant, the possibility of such synthesis was grounded in the forms of intuition, namely space and time. Intensive magnitudes, like warmth or colors, also come in continuous degrees, but their apprehension takes place in an instant rather than through a successive synthesis of parts. The degrees of intensive magnitudes “can only be represented through approximation to negation” (1787: A 168/B210), that is, by imagining their gradual diminution until their complete absence.

Scientific developments during the nineteenth century challenged the distinction between extensive and intensive magnitudes. Thermodynamics and wave optics showed that differences in temperature and hue corresponded to differences in spatio-temporal magnitudes such as velocity and wavelength. Electrical magnitudes such as resistance and conductance were shown to be capable of addition and division despite not being extensive in the Kantian sense, i.e., not synthesized from spatial or temporal parts. Moreover, early experiments in psychophysics suggested that intensities of sensation such as brightness and loudness could be represented as sums of “just noticeable differences” among stimuli, and could therefore be thought of as composed of parts (see Section 3.3). These findings, along with advances in the axiomatization of branches of mathematics, motivated some of the leading scientists of the late nineteenth century to attempt to clarify the mathematical foundations of measurement (Maxwell 1873; von Kries 1882; Helmholtz 1887; Mach 1896; Poincaré 1898; Hölder 1901; for historical surveys see Darrigol 2003; Michell 1993, 2003; Cantù and Schlaudt 2013). These works are viewed today as precursors to the body of scholarship known as “measurement theory”.

3. Mathematical Theories of Measurement (“Measurement Theory”)

Mathematical theories of measurement (often referred to collectively as “measurement theory”) concern the conditions under which relations among numbers (and other mathematical entities) can be used to express relations among objects.[2] In order to appreciate the need for mathematical theories of measurement, consider the fact that relations exhibited by numbers—such as equality, sum, difference and ratio—do not always correspond to relations among the objects measured by those numbers. For example, 60 is twice 30, but one would be mistaken in thinking that an object measured at 60 degrees Celsius is twice as hot as an object at 30 degrees Celsius. This is because the zero point of the Celsius scale is arbitrary and does not correspond to an absence of temperature.[3] Similarly, numerical intervals do not always carry empirical information. When subjects are asked to rank on a scale from 1 to 7 how strongly they agree with a given statement, there is no prima facie reason to think that the intervals between 5 and 6 and between 6 and 7 correspond to equal increments of strength of opinion. To provide a third example, equality among numbers is transitive [if (a=b & b=c) then a=c] but empirical comparisons among physical magnitudes reveal only approximate equality, which is not a transitive relation. These examples suggest that not all of the mathematical relations among numbers used in measurement are empirically significant, and that different kinds of measurement scale convey different kinds of empirically significant information.

The study of measurement scales and the empirical information they convey is the main concern of mathematical theories of measurement. In his seminal 1887 essay, “Counting and Measuring”, Hermann von Helmholtz phrased the key question of measurement theory as follows:

[W]hat is the objective meaning of expressing through denominate numbers the relations of real objects as magnitudes, and under what conditions can we do this? (1887: 4)

Broadly speaking, measurement theory sets out to (i) identify the assumptions underlying the use of various mathematical structures for describing aspects of the empirical world, and (ii) draw lessons about the adequacy and limits of using these mathematical structures for describing aspects of the empirical world. Following Otto Hölder (1901), measurement theorists often tackle these goals through formal proofs, with the assumptions in (i) serving as axioms and the lessons in (ii) following as theorems. A key insight of measurement theory is that the empirically significant aspects of a given mathematical structure are those that mirror relevant relations among the objects being measured. For example, the relation “bigger than” among numbers is empirically significant for measuring length insofar as it mirrors the relation “longer than” among objects. This mirroring, or mapping, of relations between objects and mathematical entities constitutes a measurement scale. As will be clarified below, measurement scales are usually thought of as isomorphisms or homomorphisms between objects and mathematical entities.

Other than these broad goals and claims, measurement theory is a highly heterogeneous body of scholarship. It includes works that span from the late nineteenth century to the present day and endorse a wide array of views on the ontology, epistemology and semantics of measurement. Two main differences among mathematical theories of measurement are especially worth mentioning. The first concerns the nature of the relata, or “objects”, whose relations numbers are supposed to mirror. These relata may be understood in at least four different ways: as concrete individual objects, as qualitative observations of concrete individual objects, as abstract representations of individual objects, or as universal properties of objects. Which interpretation is adopted depends in large part on the author’s metaphysical and epistemic commitments. This issue will be especially relevant to the discussion of realist accounts of measurement (Section 5). Second, different measurement theorists have taken different stands on the kind of empirical evidence that is required to establish mappings between objects and numbers. As a result, measurement theorists have come to disagree about the necessary conditions for establishing the measurability of attributes, and specifically about whether psychological attributes are measurable. Debates about measurability have been highly fruitful for the development of measurement theory, and the following subsections will introduce some of these debates and the central concepts developed therein.

3.1 Fundamental and derived measurement

During the late nineteenth and early twentieth centuries several attempts were made to provide a universal definition of measurement. Although accounts of measurement varied, the consensus was that measurement is a method of assigning numbers to magnitudes. For example, Helmholtz (1887: 17) defined measurement as the procedure by which one finds the denominate number that expresses the value of a magnitude, where a “denominate number” is a number together with a unit, e.g., 5 meters, and a magnitude is a quality of objects that is amenable to ordering from smaller to greater, e.g., length. Bertrand Russell similarly stated that measurement is

any method by which a unique and reciprocal correspondence is established between all or some of the magnitudes of a kind and all or some of the numbers, integral, rational or real. (1903: 176)

Norman Campbell defined measurement simply as “the process of assigning numbers to represent qualities”, where a quality is a property that admits of non-arbitrary ordering (1920: 267).

Defining measurement as numerical assignment raises the question: which assignments are adequate, and under what conditions? Early measurement theorists like Helmholtz (1887), Hölder (1901) and Campbell (1920) argued that numbers are adequate for expressing magnitudes insofar as algebraic operations among numbers mirror empirical relations among magnitudes. For example, the qualitative relation “longer than” among rigid rods is (roughly) transitive and asymmetrical, and in this regard shares structural features with the relation “larger than” among numbers. Moreover, the end-to-end concatenation of rigid rods shares structural features—such as associativity and commutativity—with the mathematical operation of addition. A similar situation holds for the measurement of weight with an equal-arms balance. Here deflection of the arms provides ordering among weights and the heaping of weights on one pan constitutes concatenation.

Early measurement theorists formulated axioms that describe these qualitative empirical structures, and used these axioms to prove theorems about the adequacy of assigning numbers to magnitudes that exhibit such structures. Specifically, they proved that ordering and concatenation are together sufficient for the construction of an additive numerical representation of the relevant magnitudes. An additive representation is one in which addition is empirically meaningful, and hence also multiplication, division etc. Campbell called measurement procedures that satisfy the conditions of additivity “fundamental” because they do not involve the measurement of any other magnitude (1920: 277). Kinds of magnitudes for which a fundamental measurement procedure has been found—such as length, area, volume, duration, weight and electrical resistance—Campbell called “fundamental magnitudes”. A hallmark of such magnitudes is that it is possible to generate them by concatenating a standard sequence of equal units, as in the example of a series of equally spaced marks on a ruler.

Although they viewed additivity as the hallmark of measurement, most early measurement theorists acknowledged that additivity is not necessary for measuring. Other magnitudes exist that admit of ordering from smaller to greater, but whose ratios and/or differences cannot currently be determined except through their relations to other, fundamentally measurable magnitudes. Examples are temperature, which may be measured by determining the volume of a mercury column, and density, which may be measured as the ratio of mass and volume. Such indirect determination came to be called “derived” measurement and the relevant magnitudes “derived magnitudes” (Campbell 1920: 275–7).

At first glance, the distinction between fundamental and derived measurement may seem reminiscent of the distinction between extensive and intensive magnitudes, and indeed fundamental measurement is sometimes called “extensive”. Nonetheless, it is important to note that the two distinctions are based on significantly different criteria of measurability. As discussed in Section 2, the extensive-intensive distinction focused on the intrinsic structure of the quantity in question, i.e., whether or not it is composed of spatio-temporal parts. The fundamental-derived distinction, by contrast, focuses on the properties of measurement operations. A fundamentally measurable magnitude is one for which a fundamental measurement operation has been found. Consequently, fundamentality is not an intrinsic property of a magnitude: a derived magnitude can become fundamental with the discovery of new operations for its measurement. Moreover, in fundamental measurement the numerical assignment need not mirror the structure of spatio-temporal parts. Electrical resistance, for example, can be fundamentally measured by connecting resistors in a series (Campbell 1920: 293). This is considered a fundamental measurement operation because it has a shared structure with numerical addition, even though objects with equal resistance are not generally equal in size.

The distinction between fundamental and derived measurement was revised by subsequent authors. Brian Ellis (1966: Ch. 5–8) distinguished among three types of measurement: fundamental, associative and derived. Fundamental measurement requires ordering and concatenation operations satisfying the same conditions specified by Campbell. Associative measurement procedures are based on a correlation of two ordering relationships, e.g., the correlation between the volume of a mercury column and its temperature. Derived measurement procedures consist in the determination of the value of a constant in a physical law. The constant may be local, as in the determination of the specific density of water from mass and volume, or universal, as in the determination of the Newtonian gravitational constant from force, mass and distance. Henry Kyburg (1984: Ch. 5–7) proposed a somewhat different threefold distinction among direct, indirect and systematic measurement, which does not completely overlap with that of Ellis.[4] A more radical revision of the distinction between fundamental and derived measurement was offered by R. Duncan Luce and John Tukey (1964) in their work on conjoint measurement, which will be discussed in Section 3.4.

3.2 The classification of scales

The previous subsection discussed the axiomatization of empirical structures, a line of inquiry that dates back to the early days of measurement theory. A complementary line of inquiry within measurement theory concerns the classification of measurement scales. The psychophysicist S.S. Stevens (1946, 1951) distinguished among four types of scales: nominal, ordinal, interval and ratio. Nominal scales represent objects as belonging to classes that have no particular order, e.g., male and female. Ordinal scales represent order but no further algebraic structure. For example, the Mohs scale of mineral hardness represents minerals with numbers ranging from 1 (softest) to 10 (hardest), but there is no empirical significance to equality among intervals or ratios of those numbers.[5] Celsius and Fahrenheit are examples of interval scales: they represent equality or inequality among intervals of temperature, but not ratios of temperature, because their zero points are arbitrary. The Kelvin scale, by contrast, is a ratio scale, as are the familiar scales representing mass in kilograms, length in meters and duration in seconds. Stevens later refined this classification and distinguished between linear and logarithmic interval scales (1959: 31–34) and between ratio scales with and without a natural unit (1959: 34). Ratio scales with a natural unit, such as those used for counting discrete objects and for representing probabilities, were named “absolute” scales.

As Stevens notes, scale types are individuated by the families of transformations they can undergo without loss of empirical information. Empirical relations represented on ratio scales, for example, are invariant under multiplication by a positive number, e.g., multiplication by 2.54 converts from inches to centimeters. Linear interval scales allow both multiplication by a positive number and a constant shift, e.g., the conversion from Celsius to Fahrenheit in accordance with the formula °C × 9/5 + 32 = °F. Ordinal scales admit of any transformation function as long as it is monotonic and increasing, and nominal scales admit of any one-to-one substitution. Absolute scales admit of no transformation other than identity. Stevens’ classification of scales was later generalized by Louis Narens (1981, 1985: Ch. 2) and Luce et al. (1990: Ch. 20) in terms of the homogeneity and uniqueness of the relevant transformation groups.

While Stevens’ classification of scales met with general approval in scientific and philosophical circles, its wider implications for measurement theory became the topic of considerable debate. Two issues were especially contested. The first was whether classification and ordering operations deserve to be called “measurement” operations, and accordingly whether the representation of magnitudes on nominal and ordinal scales should count as measurement. Several physicists, including Campbell, argued that classification and ordering operations did not provide a sufficiently rich structure to warrant the use of numbers, and hence should not count as measurement operations. The second contested issue was whether a concatenation operation had to be found for a magnitude before it could be fundamentally measured on a ratio scale. The debate became especially heated when it re-ignited a longer controversy surrounding the measurability of intensities of sensation. It is to this debate we now turn.

3.3 The measurability of sensation

One of the main catalysts for the development of mathematical theories of measurement was an ongoing debate surrounding measurability in psychology. The debate is often traced back to Gustav Fechner’s (1860) Elements of Psychophysics, in which he described a method of measuring intensities of sensation. Fechner’s method was based on the recording of “just noticeable differences” between sensations associated with pairs of stimuli, e.g., two sounds of different intensity. These differences were assumed to be equal increments of intensity of sensation. As Fechner showed, under this assumption a stable linear relationship is revealed between the intensity of sensation and the logarithm of the intensity of the stimulus, a relation that came to be known as “Fechner’s law” (Heidelberger 1993a: 203; Luce and Suppes 2004: 11–2). This law in turn provides a method for indirectly measuring the intensity of sensation by measuring the intensity of the stimulus, and hence, Fechner argued, provides justification for measuring intensities of sensation on the real numbers.

Fechner’s claims concerning the measurability of sensation became the subject of a series of debates that lasted nearly a century and proved extremely fruitful for the philosophy of measurement, involving key figures such as Mach, Helmholtz, Campbell and Stevens (Heidelberger 1993a: Ch. 6 and 1993b; Michell 1999: Ch. 6). Those objecting to the measurability of sensation, such as Campbell, stressed the necessity of an empirical concatenation operation for fundamental measurement. Since intensities of sensation cannot be concatenated to each other in the manner afforded by lengths and weights, there could be no fundamental measurement of sensation intensity. Moreover, Campbell claimed that none of the psychophysical regularities discovered thus far are sufficiently universal to count as laws in the sense required for derived measurement (Campbell in Ferguson et al. 1940: 347). All that psychophysicists have shown is that intensities of sensation can be consistently ordered, but order by itself does not yet warrant the use of numerical relations such as sums and ratios to express empirical results.

The central opponent of Campbell in this debate was Stevens, whose distinction between types of measurement scale was discussed above. Stevens defined measurement as the “assignment of numerals to objects or events according to rules” (1951: 1) and claimed that any consistent and non-random assignment counts as measurement in the broad sense (1975: 47). In useful cases of scientific inquiry, Stevens claimed, measurement can be construed somewhat more narrowly as a numerical assignment that is based on the results of matching operations, such as the coupling of temperature to mercury volume or the matching of sensations to each other. Stevens argued against the view that relations among numbers need to mirror qualitative empirical structures, claiming instead that measurement scales should be regarded as arbitrary formal schemas and adopted in accordance with their usefulness for describing empirical data. For example, adopting a ratio scale for measuring the sensations of loudness, volume and density of sounds leads to the formulation of a simple linear relation among the reports of experimental subjects: loudness = volume × density (1975: 57–8). Such assignment of numbers to sensations counts as measurement because it is consistent and non-random, because it is based on the matching operations performed by experimental subjects, and because it captures regularities in the experimental results. According to Stevens, these conditions are together sufficient to justify the use of a ratio scale for measuring sensations, despite the fact that “sensations cannot be separated into component parts, or laid end to end like measuring sticks” (1975: 38; see also Hempel 1952: 68–9).

3.4 Representational Theory of Measurement

In the mid-twentieth century the two main lines of inquiry in measurement theory, the one dedicated to the empirical conditions of quantification and the one concerning the classification of scales, converged in the work of Patrick Suppes (1951; Scott and Suppes 1958; for historical surveys see Savage and Ehrlich 1992; Diez 1997a,b). Suppes’ work laid the basis for the Representational Theory of Measurement (RTM), which remains the most influential mathematical theory of measurement to date (Krantz et al. 1971; Suppes et al. 1989; Luce et al. 1990). RTM defines measurement as the construction of mappings from empirical relational structures into numerical relational structures (Krantz et al. 1971: 9). An empirical relational structure consists of a set of empirical objects (e.g., rigid rods) along with certain qualitative relations among them (e.g., ordering, concatenation), while a numerical relational structure consists of a set of numbers (e.g., real numbers) and specific mathematical relations among them (e.g., “equal to or bigger than”, addition). Simply put, a measurement scale is a many-to-one mapping—a homomorphism—from an empirical to a numerical relational structure, and measurement is the construction of scales.[6] RTM goes into great detail in clarifying the assumptions underlying the construction of different types of measurement scales. Each type of scale is associated with a set of assumptions about the qualitative relations obtaining among objects represented on that type of scale. From these assumptions, or axioms, the authors of RTM derive the representational adequacy of each scale type, as well as the family of permissible transformations making that type of scale unique. In this way RTM provides a conceptual link between the empirical basis of measurement and the typology of scales.[7]

On the issue of measurability, the Representational Theory takes a middle path between the liberal approach adopted by Stevens and the strict emphasis on concatenation operations espoused by Campbell. Like Campbell, RTM accepts that rules of quantification must be grounded in known empirical structures and should not be chosen arbitrarily to fit the data. However, RTM rejects the idea that additive scales are adequate only when concatenation operations are available (Luce and Suppes 2004: 15). Instead, RTM argues for the existence of fundamental measurement operations that do not involve concatenation. The central example of this type of operation is known as “additive conjoint measurement” (Luce and Tukey 1964; Krantz et al. 1971: 17–21 and Ch. 6–7). Here, measurements of two or more different types of attribute, such as the temperature and pressure of a gas, are obtained by observing their joint effect, such as the volume of the gas. Luce and Tukey showed that by establishing certain qualitative relations among volumes under variations of temperature and pressure, one can construct additive representations of temperature and pressure, without invoking any antecedent method of measuring volume. This sort of procedure is generalizable to any suitably related triplet of attributes, such as the loudness, intensity and frequency of pure tones, or the preference for a reward, it size and the delay in receiving it (Luce and Suppes 2004: 17). The discovery of additive conjoint measurement led the authors of RTM to divide fundamental measurement into two kinds: traditional measurement procedures based on concatenation operations, which they called “extensive measurement”, and conjoint or “nonextensive” fundamental measurement. Under this new conception of fundamentality, all the traditional physical attributes can be measured fundamentally, as well as many psychological attributes (Krantz et al. 1971: 502–3).

4. Operationalism and Conventionalism

Above we saw that mathematical theories of measurement are primarily concerned with the mathematical properties of measurement scales and the conditions of their application. A related but distinct strand of scholarship concerns the meaning and use of quantity terms. Scientific theories and models are commonly expressed in terms of quantitative relations among parameters, bearing names such as “length”, “unemployment rate” and “introversion”. A realist about one of these terms would argue that it refers to a set of properties or relations that exist independently of being measured. An operationalist or conventionalist would argue that the way such quantity-terms apply to concrete particulars depends on nontrivial choices made by humans, and specifically on choices that have to do with the way the relevant quantity is measured. Note that under this broad construal, realism is compatible with operationalism and conventionalism. That is, it is conceivable that choices of measurement method regulate the use of a quantity-term and that, given the correct choice, this term succeeds in referring to a mind-independent property or relation. Nonetheless, many operationalists and conventionalists adopted stronger views, according to which there are no facts of the matter as to which of several and nontrivially different operations is correct for applying a given quantity-term. These stronger variants are inconsistent with realism about measurement. This section will be dedicated to operationalism and conventionalism, and the next to realism about measurement.

Operationalism (or “operationism”) about measurement is the view that the meaning of quantity-concepts is determined by the set of operations used for their measurement. The strongest expression of operationalism appears in the early work of Percy Bridgman (1927), who argued that

we mean by any concept nothing more than a set of operations; the concept is synonymous with the corresponding set of operations. (1927: 5)

Length, for example, would be defined as the result of the operation of concatenating rigid rods. According to this extreme version of operationalism, different operations measure different quantities. Length measured by using rulers and by timing electromagnetic pulses should, strictly speaking, be distinguished into two distinct quantity-concepts labeled “length-1” and “length-2” respectively. This conclusion led Bridgman to claim that currently accepted quantity concepts have “joints” where different operations overlap in their domain of application. He warned against dogmatic faith in the unity of quantity concepts across these “joints”, urging instead that unity be checked against experiments whenever the application of a quantity-concept is to be extended into a new domain. Nevertheless, Bridgman conceded that as long as the results of different operations agree within experimental error it is pragmatically justified to label the corresponding quantities with the same name (1927: 16).[8]

Operationalism became influential in psychology, where it was well-received by behaviorists like Edwin Boring (1945) and B.F. Skinner (1945). Indeed, Skinner maintained that behaviorism is “nothing more than a thoroughgoing operational analysis of traditional mentalistic concepts” (1945: 271). Stevens, who was Boring’s student, was a key promoter of operationalism in psychology, and argued that psychological concepts have empirical meaning only if they stand for definite and concrete operations (1935: 517). The idea that concepts are defined by measurement operations is consistent with Stevens’ liberal views on measurability, which were discussed above (Section 3.3). As long as the assignment of numbers to objects is performed in accordance with concrete and consistent rules, Stevens maintained that such assignment has empirical meaning and does not need to satisfy any additional constraints. Nonetheless, Stevens probably did not embrace an anti-realist view about psychological attributes. Instead, there are good reasons to think that he understood operationalism as a methodological attitude that was valuable to the extent that it allowed psychologists to justify the conclusions they drew from experiments (Feest 2005). For example, Stevens did not treat operational definitions as a priori but as amenable to improvement in light of empirical discoveries, implying that he took psychological attributes to exist independently of such definitions (Stevens 1935: 527). This suggests that Stevens’ operationalism was of a more moderate variety than that found in the early writings of Bridgman.[9]

Operationalism met with initial enthusiasm by logical positivists, who viewed it as akin to verificationism. Nonetheless, it was soon revealed that any attempt to base a theory of meaning on operationalist principles was riddled with problems. Among such problems were the automatic reliability operationalism conferred on measurement operations, the ambiguities surrounding the notion of operation, the overly restrictive operational criterion of meaningfulness, and the fact that many useful theoretical concepts lack clear operational definitions (Chang 2009).[10] In particular, Carl Hempel (1956, 1966) criticized operationalists for being unable to define dispositional terms such as “solubility in water”, and for multiplying the number of scientific concepts in a manner that runs against the need for systematic and simple theories. Accordingly, most writers on the semantics of quantity-terms have avoided espousing an operational analysis.[11]

A more widely advocated approach admitted a conventional element to the use of quantity-terms, while resisting attempts to reduce the meaning of quantity terms to measurement operations. These accounts are classified under the general heading “conventionalism”, though they differ in the particular aspects of measurement they deem conventional and in the degree of arbitrariness they ascribe to such conventions.[12] An early precursor of conventionalism was Ernst Mach, who examined the notion of equality among temperature intervals (1896: 52). Mach noted that different types of thermometric fluid expand at different (and nonlinearly related) rates when heated, raising the question: which fluid expands most uniformly with temperature? According to Mach, there is no fact of the matter as to which fluid expands more uniformly, since the very notion of equality among temperature intervals has no determinate application prior to a conventional choice of standard thermometric fluid. Mach coined the term “principle of coordination” for this sort of conventionally chosen principle for the application of a quantity concept. The concepts of uniformity of time and space received similar treatments by Henri Poincaré (1898, 1902: Part 2). Poincaré argued that procedures used to determine equality among durations stem from scientists’ unconscious preference for descriptive simplicity, rather than from any fact about nature. Similarly, scientists’ choice to represent space with either Euclidean or non-Euclidean geometries is not determined by experience but by considerations of convenience.

Conventionalism with respect to measurement reached its most sophisticated expression in logical positivism. Logical positivists like Hans Reichenbach and Rudolf Carnap proposed “coordinative definitions” or “correspondence rules” as the semantic link between theoretical and observational terms. These a priori, definition-like statements were intended to regulate the use of theoretical terms by connecting them with empirical procedures (Reichenbach 1927: 14–19; Carnap 1966: Ch. 24). An example of a coordinative definition is the statement: “a measuring rod retains its length when transported”. According to Reichenbach, this statement cannot be empirically verified, because a universal and experimentally undetectable force could exist that equally distorts every object’s length when it is transported. In accordance with verificationism, statements that are unverifiable are neither true nor false. Instead, Reichenbach took this statement to expresses an arbitrary rule for regulating the use of the concept of equality of length, namely, for determining whether particular instances of length are equal (Reichenbach 1927: 16). At the same time, coordinative definitions were not seen as replacements, but rather as necessary additions, to the familiar sort of theoretical definitions of concepts in terms of other concepts (1927: 14). Under the conventionalist viewpoint, then, the specification of measurement operations did not exhaust the meaning of concepts such as length or length-equality, thereby avoiding many of the problems associated with operationalism.[13]

5. Realist Accounts of Measurement

Realists about measurement maintain that measurement is best understood as the empirical estimation of an objective property or relation. A few clarificatory remarks are in order with respect to this characterization of measurement. First, the term “objective” is not meant to exclude mental properties or relations, which are the objects of psychological measurement. Rather, measurable properties or relations are taken to be objective inasmuch as they are independent of the beliefs and conventions of the humans performing the measurement and of the methods used for measuring. For example, a realist would argue that the ratio of the length of a given solid rod to the standard meter has an objective value regardless of whether and how it is measured. Second, the term “estimation” is used by realists to highlight the fact that measurement results are mere approximations of true values (Trout 1998: 46). Third, according to realists, measurement is aimed at obtaining knowledge about properties and relations, rather than at assigning values directly to individual objects. This is significant because observable objects (e.g., levers, chemical solutions, humans) often instantiate measurable properties and relations that are not directly observable (e.g., amount of mechanical work, more acidic than, intelligence). Knowledge claims about such properties and relations must presuppose some background theory. By shifting the emphasis from objects to properties and relations, realists highlight the theory-laden character of measurements.

Realism about measurement should not be confused with realism about entities (e.g., electrons). Nor does realism about measurement necessarily entail realism about properties (e.g., temperature), since one could in principle accept only the reality of relations (e.g., ratios among quantities) without embracing the reality of underlying properties. Nonetheless, most philosophers who have defended realism about measurement have done so by arguing for some form of realism about properties (Byerly and Lazara 1973; Swoyer 1987; Mundy 1987; Trout 1998, 2000). These realists argue that at least some measurable properties exist independently of the beliefs and conventions of the humans who measure them, and that the existence and structure of these properties provides the best explanation for key features of measurement, including the usefulness of numbers in expressing measurement results and the reliability of measuring instruments.

For example, a typical realist about length measurement would argue that the empirical regularities displayed by individual objects’ lengths when they are ordered and concatenated are best explained by assuming that length is an objective property that has an extensive structure (Swoyer 1987: 271–4). That is, relations among lengths such as “longer than” and “sum of” exist independently of whether any objects happen to be ordered and concatenated by humans, and indeed independently of whether objects of some particular length happen to exist at all. The existence of an extensive property structure means that lengths share much of their structure with the positive real numbers, and this explains the usefulness of the positive reals in representing lengths. Moreover, if measurable properties are analyzed in dispositional terms, it becomes easy to explain why some measuring instruments are reliable. For example, if one assumes that a certain amount of electric current in a wire entails a disposition to deflect an ammeter needle by a certain angle, it follows that the ammeter’s indications counterfactually depend on the amount of electric current in the wire, and therefore that the ammeter is reliable (Trout 1998: 65).

A different argument for realism about measurement is due to Joel Michell (1994, 2005), who proposes a realist theory of number based on the Euclidean concept of ratio. According to Michell, numbers are ratios between quantities, and therefore exist in space and time. Specifically, real numbers are ratios between pairs of infinite standard sequences, e.g., the sequence of lengths normally denoted by “1 meter”, “2 meters” etc. and the sequence of whole multiples of the length we are trying to measure. Measurement is the discovery and estimation of such ratios. An interesting consequence of this empirical realism about numbers is that measurement is not a representational activity, but rather the activity of approximating mind-independent numbers (Michell 1994: 400).

Realist accounts of measurement are largely formulated in opposition to strong versions of operationalism and conventionalism, which dominated philosophical discussions of measurement from the 1930s until the 1960s. In addition to the drawbacks of operationalism already discussed in the previous section, realists point out that anti-realism about measurable quantities fails to make sense of scientific practice. If quantities had no real values independently of one’s choice of measurement procedure, it would be difficult to explain what scientists mean by “measurement accuracy” and “measurement error”, and why they try to increase accuracy and diminish error. By contrast, realists can easily make sense of the notions of accuracy and error in terms of the distance between real and measured values (Byerly and Lazara 1973: 17–8; Swoyer 1987: 239; Trout 1998: 57). A closely related point is the fact that newer measurement procedures tend to improve on the accuracy of older ones. If choices of measurement procedure were merely conventional it would be difficult to make sense of such progress. In addition, realism provides an intuitive explanation for why different measurement procedures often yield similar results, namely, because they are sensitive to the same facts (Swoyer 1987: 239; Trout 1998: 56). Finally, realists note that the construction of measurement apparatus and the analysis of measurement results are guided by theoretical assumptions concerning causal relationships among quantities. The ability of such causal assumptions to guide measurement suggests that quantities are ontologically prior to the procedures that measure them.[14]

While their stance towards operationalism and conventionalism is largely critical, realists are more charitable in their assessment of mathematical theories of measurement. Brent Mundy (1987) and Chris Swoyer (1987) both accept the axiomatic treatment of measurement scales, but object to the empiricist interpretation given to the axioms by prominent measurement theorists like Campbell (1920) and Ernest Nagel (1931; Cohen and Nagel 1934: Ch. 15). Rather than interpreting the axioms as pertaining to concrete objects or to observable relations among such objects, Mundy and Swoyer reinterpret the axioms as pertaining to universal magnitudes, e.g., to the universal property of being 5 meter long rather than to the concrete instantiations of that property. This construal preserves the intuition that statements like “the size of x is twice the size of y” are first and foremost about two sizes, and only derivatively about the objects x and y themselves (Mundy 1987: 34).[15] Mundy and Swoyer argue that their interpretation is more general, because it logically entails all the first-order consequences of the empiricist interpretation along with additional, second-order claims about universal magnitudes. Moreover, under their interpretation measurement theory becomes a genuine scientific theory, with explanatory hypotheses and testable predictions. Despite these virtues, the realist interpretation has been largely ignored in the wider literature on measurement theory.

6. Information-Theoretic Accounts of Measurement

Information-theoretic accounts of measurement are based on an analogy between measuring systems and communication systems. In a simple communication system, a message (input) is encoded into a signal at the transmitter’s end, sent to the receiver’s end, and then decoded back (output). The accuracy of the transmission depends on features of the communication system as well as on features of the environment, i.e., the level of background noise. Similarly, measuring instruments can be thought of as “information machines” (Finkelstein 1977) that interact with an object in a given state (input), encode that state into an internal signal, and convert that signal into a reading (output). The accuracy of a measurement similarly depends on the instrument as well as on the level of noise in its environment. Conceived as a special sort of information transmission, measurement becomes analyzable in terms of the conceptual apparatus of information theory (Hartley 1928; Shannon 1948; Shannon and Weaver 1949). For example, the information that reading \(y_i\) conveys about the occurrence of a state \(x_k\) of the object can be quantified as \(\log \left[\frac{p(x_k \mid y_i)}{p(x_k)}\right]\), namely as a function of the decrease of uncertainty about the object’s state (Finkelstein 1975: 222; for alternative formulations see Brillouin 1962: Ch. 15; Kirpatovskii 1974; and Mari 1999: 185).

Ludwik Finkelstein (1975, 1977) and Luca Mari (1999) suggested the possibility of a synthesis between Shannon-Weaver information theory and measurement theory. As they argue, both theories centrally appeal to the idea of mapping: information theory concerns the mapping between symbols in the input and output messages, while measurement theory concerns the mapping between objects and numbers. If measurement is taken to be analogous to symbol-manipulation, then Shannon-Weaver theory could provide a formalization of the syntax of measurement while measurement theory could provide a formalization of its semantics. Nonetheless, Mari (1999: 185) also warns that the analogy between communication and measurement systems is limited. Whereas a sender’s message can be known with arbitrary precision independently of its transmission, the state of an object cannot be known with arbitrary precision independently of its measurement.

Information-theoretic accounts of measurement were originally developed by metrologists with little involvement from philosophers. Metrology, officially defined as the “science of measurement and its application” (JCGM 2012: 2.2), is a field of study concerned with the design, maintenance and improvement of measuring instruments in the natural sciences and engineering. Metrologists typically work at standardization bureaus or at specialized laboratories that are responsible for the calibration of measurement equipment, the comparison of standards and the evaluation of measurement uncertainties, among other tasks. It is only recently that philosophers have begun to engage with the rich conceptual issues underlying metrological practice, and particularly with the inferences involved in evaluating and improving the accuracy of measurement standards (Chang 2004; Boumans 2005a: Chap. 5, 2005b, 2007a; Frigerio et al. 2010; Tal forthcoming-a; Teller 2013a,b; Riordan 2014). Further philosophical work is required to explore the assumptions and consequences of information-theoretic accounts of measurement, their implications for metrological practice, and their connections with other accounts of measurement.

Independently of developments in metrology, Bas van Fraassen (2008: 141–185) has recently proposed a conception of measurement in which information plays a key role. He views measurement as composed of two levels: on the physical level, the measuring apparatus interacts with an object and produces a reading, e.g., a pointer position.[16] On the abstract level, background theory represents the object’s possible states on a parameter space. Measurement locates an object on a sub-region of this abstract parameter space, thereby reducing the range of possible states (2008: 164 and 172). This reduction of possibilities amounts to the collection of information about the measured object. Van Fraassen’s analysis of measurement differs from information-theoretic accounts developed in metrology in its explicit appeal to background theory, and in the fact that it does not invoke the symbolic conception of information developed by Shannon and Weaver.

7. Model-Based Accounts of Measurement

Since the early 2000s a new wave of philosophical scholarship has emerged that emphasizes the relationships between measurement and theoretical and statistical modeling. According to model-based accounts, measurement consists of two levels: (i) a concrete process involving interactions between an object of interest, an instrument, and the environment; and (ii) a theoretical and/or statistical model of that process, where “model” denotes an abstract and local representation constructed from simplifying assumptions. The central goal of measurement according to this view is to assign values to one or more parameters of interest in the model in a manner that satisfies certain epistemic desiderata, in particular coherence and consistency.

A central motivation for the development of model-based accounts is the attempt to clarify the epistemological principles underlying aspects of measurement practice. For example, metrologists employ a variety of methods for the calibration of measuring instruments, the standardization and tracing of units and the evaluation of uncertainties (for a discussion of metrology, see the previous section). Traditional philosophical accounts such as mathematical theories of measurement do not elaborate on the assumptions, inference patterns, evidential grounds or success criteria associated with such methods. As Frigerio et al. (2010) argue, measurement theory is ill-suited for clarifying these aspects of measurement because it abstracts away from the process of measurement and focuses solely on the mathematical properties of scales. By contrast, model-based accounts take scale construction to be merely one of several tasks involved in measurement, alongside the definition of measured parameters, instrument design and calibration, object sampling and preparation, error detection and uncertainty evaluation, among others (2010: 145–7).

7.1 The roles of models in measurement

According to model-based accounts, measurement involves interaction between an object of interest (the “system under measurement”), an instrument (the “measurement system”) and an environment, which includes the measuring subjects. Other, secondary interactions may also be relevant for the determination of a measurement outcome, such as the interaction between the measuring instrument and the reference standards used for its calibration, and the chain of comparisons that trace the reference standard back to primary measurement standards (Mari 2003: 25). Measurement proceeds by representing these interactions with a set of parameters, and assigning values to a subset of those parameters (known as “measurands”) based on the results of the interactions. When measured parameters are numerical they are called “quantities”. Although measurands need not be quantities, a quantitative measurement scenario will be supposed in what follows.

Two sorts of measurement outputs are distinguished by model-based accounts [JCGM 2012: 2.9 & 4.1; Giordani and Mari 2012: 2146; Tal 2013]:

  1. Instrument indications (or “readings”): these are properties of the measuring instrument in its final state after the measurement process is complete. Examples are digits on a display, marks on a multiple-choice questionnaire and bits stored in a device’s memory. Indications may be represented by numbers, but such numbers describe states of the instrument and should not be confused with measurement outcomes, which concern states of the object being measured.
  2. Measurement outcomes (or “results”): these are knowledge claims about the values of one or more quantities attributed to the object being measured, and are typically accompanied by a specification of the measurement unit and scale and an estimate of measurement uncertainty. For example, a measurement outcome may be expressed by the sentence “the mass of object a is 20±1 grams with a probability of 68%”.

As proponents of model-based accounts stress, inferences from instrument indications to measurement outcomes are nontrivial and depend on a host of theoretical and statistical assumptions about the object being measured, the instrument, the environment and the calibration process. Measurement outcomes are often obtained through statistical analysis of multiple indications, thereby involving assumptions about the shape of the distribution of indications and the randomness of environmental effects (Bogen and Woodward 1988: 307–310). Measurement outcomes also incorporate corrections for systematic effects, and such corrections are based on theoretical assumptions concerning the workings of the instrument and its interactions with the object and environment. For example, length measurements need to be corrected for the change of the measuring rod’s length with temperature, a correction which is derived from a theoretical equation of thermal expansion. Systematic corrections involve uncertainties of their own, for example in the determination of the values of constants, and these uncertainties are assessed through secondary experiments involving further theoretical and statistical assumptions. Moreover, the uncertainty associated with a measurement outcome depends on the methods employed for the calibration of the instrument. Calibration involves additional assumptions about the instrument, the calibrating apparatus, the quantity being measured and the properties of measurement standards (Rothbart and Slayden 1994; Franklin 1997; Baird 2004: Ch. 4; Soler et al. 2011). Another component of uncertainty originates from vagueness in the definition of the measurand, and is known as “definitional uncertainty” (Mari and Giordani 2013). Finally, measurement involves background assumptions about the scale type and unit system being used, and these assumptions are often tied to broader theoretical and technological considerations relating to the definition and realization of scales and units.

These various theoretical and statistical assumptions form the basis for the construction of one or more models of the measurement process. Unlike mathematical theories of measurement, where the term “model” denotes a set-theoretical structure that interprets a formal language, here the term “model” denotes an abstract and local representation of a target system that is constructed from simplifying assumptions.[17] The relevant target system in this case is a measurement process, that is, a system composed of a measuring instrument, objects or events to be measured, the environment (including human operators), secondary instruments and reference standards, the time-evolution of these components, and their various interactions with each other. Measurement is viewed as a set of procedures whose aim is to coherently assign values to model parameters based on instrument indications. Models are therefore seen as necessary preconditions for the possibility of inferring measurement outcomes from instrument indications, and as crucial for determining the content of measurement outcomes. As proponents of model-based accounts emphasize, the same indications produced by the same measurement process may be used to establish different measurement outcomes depending on how the measurement process is modeled, e.g., depending on which environmental influences are taken into account, which statistical assumptions are used to analyze noise, and which approximations are used in applying background theory. As Luca Mari puts it,

any measurement result reports information that is meaningful only in the context of a metrological model, such a model being required to include a specification for all the entities that explicitly or implicitly appear in the expression of the measurement result. (2003: 25)

Similarly, models are said to provide the necessary context for evaluating various aspects of the goodness of measurement outcomes, including accuracy, precision, error and uncertainty (Boumans 2006, 2007a, 2009, 2012b; Mari 2005b).

Model-based accounts diverge from empiricist interpretations of measurement theory in that they do not require relations among measurement outcomes to be isomorphic or homomorphic to observable relations among the items being measured (Mari 2000). Indeed, according to model-based accounts relations among measured objects need not be observable at all prior to their measurement (Frigerio et al. 2010: 125). Instead, the key normative requirement of model-based accounts is that values be assigned to model parameters in a coherent manner. The coherence criterion may be viewed as a conjunction of two sub-criteria: (i) coherence of model assumptions with relevant background theories or other substantive presuppositions about the quantity being measured; and (ii) objectivity, i.e., the mutual consistency of measurement outcomes across different measuring instruments, environments and models[18] (Frigerio et al. 2010; Teller 2013b; Tal forthcoming-b). The first sub-criterion is meant to ensure that the intended quantity is being measured, while the second sub-criterion is meant to ensure that measurement outcomes can be reasonably attributed to the measured object rather than to some artifact of the measuring instrument, environment or model. Taken together, these two requirements ensure that measurement outcomes remain valid independently of the specific assumptions involved in their production, and hence that the context-dependence of measurement outcomes does not threaten their general applicability.

7.2 Models and measurement in economics

Besides their applicability to physical measurement, model-based analyses also shed light on measurement in economics. Like physical quantities, values of economic variables often cannot be observed directly and must be inferred from observations based on abstract and idealized models. The nineteenth century economist William Jevons, for example, measured changes in the value of gold by postulating certain causal relationships between the value of gold, the supply of gold and the general level of prices (Hoover and Dowell 2001: 155–159; Morgan 2001: 239). As Julian Reiss (2001) shows, Jevons’ measurements were made possible by using two models: a causal-theoretical model of the economy, which is based on the assumption that the quantity of gold has the capacity to raise or lower prices; and a statistical model of the data, which is based on the assumption that local variations in prices are mutually independent and therefore cancel each other out when averaged. Taken together, these models allowed Jevons to infer the change in the value of gold from data concerning the historical prices of various goods.[19]

The ways in which models function in economic measurement have led some philosophers to view certain economic models as measuring instruments in their own right, analogously to rulers and balances (Boumans 1999, 2005c, 2006, 2007a, 2009, 2012a; Morgan 2001). Marcel Boumans explains how macroeconomists are able to isolate a variable of interest from external influences by tuning parameters in a model of the macroeconomic system. This technique frees economists from the impossible task of controlling the actual system. As Boumans argues, macroeconomic models function as measuring instruments insofar as they produce invariant relations between inputs (indications) and outputs (outcomes), and insofar as this invariance can be tested by calibration against known and stable facts.

7.3 Psychometric models and construct validity

Another area where models play a central role in measurement is psychology. The measurement of most psychological attributes, such as intelligence, anxiety and depression, does not rely on homomorphic mappings of the sort espoused by the Representational Theory of Measurement (Wilson 2013: 3766). Instead, psychometric theory relies predominantly on the development of abstract models that are meant to predict subjects’ performance in certain tasks. These models are constructed from substantive and statistical assumptions about the psychological attribute being measured and its relation to each measurement task. For example, Item Response Theory, a popular approach to psychological measurement, employs a variety of models to evaluate the validity of questionnaires. Consider a questionnaire that is meant to assess English language comprehension (the “ability”), by presenting subjects with a series of yes/no questions (the “items”). One of the simplest models used to validate such questionnaires is the Rasch model (Rasch 1960). This model supposes a straightforward algebraic relation—known as the “log of the odds”—between the probability that a subject will answer a given item correctly, the difficulty of that particular item, and the subject’s ability. New questionnaires are calibrated by testing the fit between their indications and the predictions of the Rasch model and assigning difficulty levels to each item accordingly. The model is then used in conjunction with the questionnaire to infer levels of English language comprehension (outcomes) from raw questionnaire scores (indications) (Wilson 2013; Mari and Wilson 2014).

The sort of statistical calibration (or “scaling”) provided by Rasch models yields repeatable results, but it is often only a first step towards full-fledged psychological measurement. Psychologists are typically interested in the results of a measure not for its own sake, but for the sake of assessing some underlying and latent psychological attribute. It is therefore desirable to be able to test whether different measures, such as different questionnaires or multiple controlled experiments, all measure the same latent attribute. Such testing is known as “construct validation”. A construct is an abstract representation of the latent attribute intended to be measured, and

reflects a hypothesis […] that a variety of behaviors will correlate with one another in studies of individual differences and/or will be similarly affected by experimental manipulations. (Nunnally & Bernstein 1994: 85)

Constructs are denoted by variables in a model that predicts which correlations would be observed among the indications of different measures if they are indeed measures of the same attribute. Such models involve substantive assumptions about the attribute, including its internal structure and its relations to other attributes, and statistical assumptions about the correlation among different measures (Campbell & Fiske 1959; Nunnally & Bernstein 1994: Ch. 3; Angner 2008).

Several scholars have pointed out similarities between the ways models are used to standardize measurable quantities in the natural and social sciences. For example, Mark Wilson (2013) argues that psychometric models can be viewed as tools for constructing measurement standards in the same sense of “measurement standard” used by metrologists. Others have raised doubts about the feasibility and desirability of adopting the example of the natural sciences when standardizing constructs in the social sciences. As Anna Alexandrova (2008) points out, ethical considerations bear on questions about construct validity no less than considerations of reproducibility. Such ethical considerations are context sensitive, and can only be applied piecemeal. Nancy Cartwright and Rosa Runhardt (2014) make a similar point about “Ballung” concepts, a term they borrow from Otto Neurath to denote concepts with a fuzzy and context-dependent scope. Examples of Ballung concepts are race, poverty, social exclusion, and the quality of PhD programs. Such concepts are too multifaceted to be measured on a single metric without loss of meaning, and must be represented either by a matrix of indices or by several different measures depending on which goals and values are at play (see also Cartwright and Bradburn 2010). In a similar vein, Leah McClimans (2010) argues that uniformity is not always an appropriate goal for designing questionnaires, as the open-endedness of questions is often both unavoidable and desirable for obtaining relevant information from subjects.[20] These insights highlight the interdependence between epistemic, pragmatic and ethical considerations characteristic of the standardization of constructs in the social sciences.

8. The Epistemology of Measurement

The development of model-based accounts discussed in the previous section is part of a larger, “epistemic turn” in the philosophy of measurement that occurred in the early 2000s. Rather than emphasizing the mathematical foundations, metaphysics or semantics of measurement, philosophical work in recent years tends to focus on the presuppositions and inferential patterns involved in concrete practices of measurement, and on the historical, social and material dimensions of measuring. The philosophical study of these topics has been referred to as the “epistemology of measurement” (Mari 2003, 2005a; Leplège 2003; Tal forthcoming-b). In the broadest sense, the epistemology of measurement is the study of the relationships between measurement and knowledge. Central topics that fall under the purview of the epistemology of measurement include the conditions under which measurement produces knowledge; the content, scope, justification and limits of such knowledge; the reasons why particular methodologies of measurement and standardization succeed or fail in supporting particular knowledge claims, and the relationships between measurement and other knowledge-producing activities such as observation, theorizing, experimentation, modelling and calculation. In pursuing these objectives, philosophers are drawing on the work of historians and sociologists of science, who have been investigating measurement practices for a longer period (Wise and Smith 1986; Latour 1987: Ch. 6; Schaffer 1992; Porter 1995, 2007; Wise 1995; Alder 2002; Galison 2003; Gooday 2004; Crease 2011), as well as on the history and philosophy of scientific experimentation (Harré 1981; Hacking 1983; Franklin 1986; Cartwright 1999). The following subsections survey some of the topics discussed in this burgeoning body of literature.

8.1 Standardization and scientific progress

A topic that has attracted considerable philosophical attention in recent years is the selection and improvement of measurement standards. Generally speaking, to standardize a quantity concept is to prescribe a determinate way in which that concept is to be applied to concrete particulars.[21] To standardize a measuring instrument is to assess how well the outcomes of measuring with that instrument fit the prescribed mode of application of the relevant concept. [22] The term “measurement standard” accordingly has at least two meanings: on the one hand, it is commonly used to refer to abstract rules and definitions that regulate the use of quantity concepts, such as the definition of the meter. On the other hand, the term “measurement standard” is also commonly used to refer to the concrete artifacts and procedures that are deemed exemplary of the application of a quantity concept, such as the metallic bar that served as the standard meter until 1960. This duality in meaning reflects the dual nature of standardization, which involves both abstract and concrete aspects.

In Section 4 it was noted that standardization involves choices among nontrivial alternatives, such as the choice among different thermometric fluids or among different ways of marking equal duration. These choices are nontrivial in the sense that they affect whether or not the same temperature (or time) intervals are deemed equal, and hence affect whether or not statements of natural law containing the term “temperature” (or “time”) come out true. Appealing to theory to decide which standard is more accurate would be circular, since the theory cannot be determinately applied to particulars prior to a choice of measurement standard. This circularity has been variously called the “problem of coordination” (van Fraassen 2008: Ch. 5) and the “problem of nomic measurement” (Chang 2004: Ch. 2). As already mentioned, conventionalists attempted to escape the circularity by positing a priori statements, known as “coordinative definitions”, which were supposed to link quantity-terms with specific measurement operations. A drawback of this solution is that it supposes that choices of measurement standard are arbitrary and static, whereas in actual practice measurement standards tend to be chosen based on empirical considerations and are eventually improved or replaced with standards that are deemed more accurate.

A new strand of writing on the problem of coordination has emerged in recent years, consisting most notably of the works of Hasok Chang (2001, 2004, 2007) and Bas van Fraassen (2008: Ch. 5; 2009, 2012). These works take a historical and coherentist approach to the problem. Rather than attempting to avoid the problem of circularity completely, as their predecessors did, they set out to show that the circularity is not vicious. Chang argues that constructing a quantity-concept and standardizing its measurement are co-dependent and iterative tasks. Each “epistemic iteration” in the history of standardization respects existing traditions while at the same time correcting them (Chang 2004: Ch. 5). The pre-scientific concept of temperature, for example, was associated with crude and ambiguous methods of ordering objects from hot to cold. Thermoscopes, and eventually thermometers, helped modify the original concept and made it more precise. With each such iteration the quantity concept was re-coordinated to a more stable set of standards, which in turn allowed theoretical predictions to be tested more precisely, facilitating the subsequent development of theory and the construction of more stable standards, and so on.

How this process avoids vicious circularity becomes clear when we look at it either “from above”, i.e., in retrospect given our current scientific knowledge, or “from within”, by looking at historical developments in their original context (van Fraassen 2008: 122). From either vantage point, coordination succeeds because it increases coherence among elements of theory and instrumentation. The questions “what counts as a measurement of quantity X?” and “what is quantity X?”, though unanswerable independently of each other, are addressed together in a process of mutual refinement. It is only when one adopts a foundationalist view and attempts to find a starting point for coordination free of presupposition that this historical process erroneously appears to lack epistemic justification (2008: 137).

The new literature on coordination shifts the emphasis of the discussion from the definitions of quantity-terms to the realizations of those definitions. In metrological jargon, a “realization” is a physical instrument or procedure that approximately satisfies a given definition (cf. JCGM 2012: 5.1). Examples of metrological realizations are the official prototypes of the kilogram and the cesium fountain clocks used to standardize the second. Recent studies suggest that the methods used to design, maintain and compare realizations have a direct bearing on the practical application of concepts of quantity, unit and scale, no less than the definitions of those concepts (Tal forthcoming-a; Riordan 2014).

8.2 Theory-ladenness of measurement

As already discussed above (Sections 7 and 8.1), theory and measurement are interdependent both historically and conceptually. On the historical side, the development of theory and measurement proceeds through iterative and mutual refinements. On the conceptual side, the specification of measurement procedures shapes the empirical content of theoretical concepts, while theory provides a systematic interpretation for the indications of measuring instruments. This interdependence of measurement and theory may seem like a threat to the evidential role that measurement is supposed to play in the scientific enterprise. After all, measurement outcomes are thought to be able to test theoretical hypotheses, and this seems to require some degree of independence of measurement from theory. This threat is especially clear when the theoretical hypothesis being tested is already presupposed as part of the model of the measuring instrument. To cite an example from Franklin et al. (1989: 230):

There would seem to be, at first glance, a vicious circularity if one were to use a mercury thermometer to measure the temperature of objects as part of an experiment to test whether or not objects expand as their temperature increases.

Nonetheless, Franklin et al. conclude that the circularity is not vicious. The mercury thermometer could be calibrated against another thermometer whose principle of operation does not presuppose the law of thermal expansion, such as a constant-volume gas thermometer, thereby establishing the reliability of the mercury thermometer on independent grounds. To put the point more generally, in the context of local hypothesis-testing the threat of circularity can usually be avoided by appealing to other kinds of instruments and other parts of theory.

A different sort of worry about the evidential function of measurement arises on the global scale, when the testing of entire theories is concerned. As Thomas Kuhn (1961) argues, scientific theories are usually accepted long before quantitative methods for testing them become available. The reliability of newly introduced measurement methods is typically tested against the predictions of the theory rather than the other way around. In Kuhn’s words, “The road from scientific law to scientific measurement can rarely be traveled in the reverse direction” (1961: 189). For example, Dalton’s Law, which states that the weights of elements in a chemical compound are related to each other in whole-number proportions, initially conflicted with some of the best known measurements of such proportions. It is only by assuming Dalton’s Law that subsequent experimental chemists were able to correct and improve their measurement techniques (1961: 173). Hence, Kuhn argues, the function of measurement in the physical sciences is not to test the theory but to apply it with increasing scope and precision, and eventually to allow persistent anomalies to surface that would precipitate the next crisis and scientific revolution. Note that Kuhn is not claiming that measurement has no evidential role to play in science. Instead, he argues that measurements cannot test a theory in isolation, but only by comparison to some alternative theory that is proposed in an attempt to account for the anomalies revealed by increasingly precise measurements (for an illuminating discussion of Kuhn’s thesis see Hacking 1983: 243–5).

Traditional discussions of theory-ladenness, like those of Kuhn, were conducted against the background of the logical positivists’ distinction between theoretical and observational language. The theory-ladenness of measurement was correctly perceived as a threat to the possibility of a clear demarcation between the two languages. Contemporary discussions, by contrast, no longer present theory-ladenness as an epistemological threat but take for granted that some level of theory-ladenness is a prerequisite for measurements to have any evidential power. Without some minimal substantive assumptions about the quantity being measured, such as its amenability to manipulation and its relations to other quantities, it would be impossible to interpret the indications of measuring instruments and hence impossible to ascertain the evidential relevance of those indications. This point was already made by Pierre Duhem (1906: 153–6; see also Carrier 1994: 9–19). Moreover, contemporary authors emphasize that theoretical assumptions play crucial roles in correcting for measurement errors and evaluating measurement uncertainties. Indeed, physical measurement procedures become more accurate when the model underlying them is de-idealized, a process which involves increasing the theoretical richness of the model (Tal 2011).

The acknowledgment that theory is crucial for guaranteeing the evidential reliability of measurement draws attention to the “problem of observational grounding”, which is an inverse challenge to the traditional threat of theory-ladenness (Tal 2013: 1168). The challenge is to specify what role observation plays in measurement, and particularly what sort of connection with observation is necessary and/or sufficient to allow measurement to play an evidential role in the sciences. This problem is especially clear when one attempts to account for the increasing use of computational methods for performing tasks that were traditionally accomplished by measuring instruments. As Margaret Morrison (2009) and Wendy Parker (forthcoming) argue, there are cases where reliable quantitative information is gathered about a target system with the aid of a computer simulation, but in a manner that satisfies some of the central desiderata for measurement such as being empirically grounded and backward-looking. Such information does not rely on signals transmitted from the particular object of interest to the instrument, but on the use of theoretical and statistical models to process empirical data about related objects. For example, data assimilation methods are customarily used to estimate past atmospheric temperatures in regions where thermometer readings are not available. Some methods do this by fitting a computational model of the atmosphere’s behavior to a combination of available data from nearby regions and a model-based forecast of conditions at the time of observation (Parker forthcoming). These estimations are then used in various ways, including as data for evaluating forward-looking climate models. Regardless of whether one calls these estimations “measurements”, they challenge the idea that producing reliable quantitative evidence about the state of an object requires observing that object, however loosely one understands the term “observation”.[23]

8.3 Accuracy and precision

Two key aspects of the reliability of measurement outcomes are accuracy and precision. Consider a series of repeated weight measurements performed on a particular object with an equal-arms balance. From a realist, “error-based” perspective, the outcomes of these measurements are accurate if they are close to the true value of the quantity being measured—in our case, the true ratio of the object’s weight to the chosen unit—and precise if they are close to each other. An analogy often cited to clarify the error-based distinction is that of arrows shot at a target, with accuracy analogous to the closeness of hits to the bull’s eye and precision analogous to the tightness of spread of hits (cf. JCGM 2012: 2.13 & 2.15, Teller 2013a: 192). Though intuitive, the error-based way of carving the distinction raises an epistemological difficulty. It is commonly thought that the exact true values of most quantities of interest to science are unknowable, at least when those quantities are measured on continuous scales. If this assumption is granted, the accuracy with which such quantities are measured cannot be known with exactitude, but only estimated by comparing inaccurate measurements to each other. And yet it is unclear why convergence among inaccurate measurements should be taken as an indication of truth. After all, the measurements could be plagued by a common bias that prevents their individual inaccuracies from cancelling each other out when averaged. In the absence of cognitive access to true values, how is the evaluation of measurement accuracy possible?

In answering this question, philosophers have benefited from studying the various senses of the term “measurement accuracy” as used by practicing scientists. At least five different senses have been identified: metaphysical, epistemic, operational, comparative and pragmatic (Tal 2011: 1084–5). In particular, the epistemic or “uncertainty-based” sense of the term is metaphysically neutral and does not presuppose the existence of true values. Instead, the accuracy of a measurement outcome is taken to be the closeness of agreement among values reasonably attributed to a quantity given available empirical data and background knowledge (cf. JCGM 2012: 2.13 Note 3; Giordani & Mari 2012). Thus construed, measurement accuracy can be evaluated by establishing robustness among the consequences of models representing different measurement processes.

Under the uncertainty-based conception, imprecision is a special type of inaccuracy. For example, the inaccuracy of weight measurements is the breadth of spread of values that are reasonably attributed to the object’s weight given the indications of the balance and available background knowledge about the way the balance works and the standard weights used. The imprecision of these measurements is the component of inaccuracy arising from uncontrolled variations to the indications of the balance over repeated trials. Other sources of inaccuracy besides imprecision include imperfect corrections to systematic errors, inaccurately known physical constants, and vague measurand definitions, among others (see Section 7.1).

Paul Teller (2013b) raises a different objection to the error-based conception of measurement accuracy. He argues against an assumption he calls “measurement accuracy realism”, according to which measurable quantities have definite values in reality. Teller argues that this assumption is false insofar as it concerns the quantities habitually measured in physics, because any specification of definite values (or value ranges) for such quantities involves idealization and hence cannot refer to anything in reality. For example, the concept usually understood by the phrase “the velocity of sound in air” involves a host of implicit idealizations concerning the uniformity of the air’s chemical composition, temperature and pressure as well as the stability of units of measurement. Removing these idealizations completely would require adding infinite amount of detail to each specification. As Teller argues, measurement accuracy should itself be understood as a useful idealization, namely as a concept that allows scientists to assess coherence and consistency among measurement outcomes as if the linguistic expression of these outcomes latched onto anything in the world. Precision is similarly an idealized concept, which is based on an open-ended and indefinite specification of what counts as repetition of measurement under “the same” circumstances (Teller 2013a: 194).


  • Alder, K., 2002, The Measure of All Things: The Seven-Year Odyssey and Hidden Error That Transformed the World, New York: The Free Press.
  • Alexandrova, A., 2008, “First Person Reports and the Measurement of Happiness”, Philosophical Psychology, 21(5): 571–583.
  • Angner, E., 2008, “The Philosophical Foundations of Subjective Measures of Well-Being”, in Capabilities and Happiness, L. Bruni, F. Comim, and M. Pugno (eds.), Oxford: Oxford University Press.
  • –––, 2013, “Is it Possible to Measure Happiness? The argument from measurability”, European Journal for Philosophy of Science, 3: 221–240.
  • Aristotle, Categories, in The Complete Works of Aristotle, Volume I, J. Barnes (ed.), Princeton: Princeton University Press, 1984.
  • Baird, D., 2004, Thing Knowledge: A Philosophy of Scientific Instruments, Berkeley: University of California Press.
  • Bogen, J. and J. Woodward, 1988, “Saving the Phenomena”, The Philosophical Review, 97(3): 303–352.
  • Boring, E.G., 1945, “The use of operational definitions in science”, in Boring et al. 1945: 243–5.
  • Boring, E.G., P.W. Bridgman, H. Feigl, H. Israel, C.C Pratt, and B.F. Skinner, 1945, “Symposium on Operationism”, The Psychological Review, 52: 241–294.
  • Boumans, M., 1999, “Representation and Stability in Testing and Measuring Rational Expectations”, Journal of Economic Methodology, 6(3): 381–401.
  • –––, 2005a, How Economists Model the World into Numbers, New York: Routledge.
  • –––, 2005b, “Truth versus Precision”, in Logic, Methodology and Philosophy of Science: Proceedings of the Twelfth International Congress, P. Hájek, L. Valdés-Villanueva, and D. Westerstahl (eds.), London: College Publications, pp. 257–269.
  • –––, 2005c, “Measurement outside the laboratory”, Philosophy of Science, 72: 850–863.
  • –––, 2006, “The difference between answering a ‘why’ question and answering a ‘how much’ question”, in Simulation: Pragmatic Construction of Reality, J. Lenhard, G Küppers, and T Shinn (eds.), Dordrecht: Springer, pp. 107–124.
  • –––, 2007a, “Invariance and Calibration”, in 2007: 231–248.
  • ––– (ed.), 2007b, Measurement in Economics: A Handbook, London: Elsevier.
  • –––, 2009, “Grey-Box Understanding in Economics”, in Scientific Understanding: Philosophical Perspectives, H.W. de Regt, S. Leonelli, and K. Eigner, Pittsburgh: University of Pittsburgh Press, pp. 210–229.
  • –––, 2012a, “Modeling Strategies for Measuring Phenomena In- and Outside the Laboratory”, in EPSA Philosophy of Science: Amsterdam 2009, H.W. de Regt, S. Hartmann, and S. Okasha (eds.), (The European Philosophy of Science Association Proceedings), Dordrecht: Springer, pp. 1–11.
  • –––, 2012b, “Measurement in Economics”, in Philosophy of Economics (Vol. 13 of Handbook of the Philosophy of Science), U. Mäki (ed.), Oxford: Elsevier, pp. 395–423.
  • Bridgman, P.W., 1927, The Logic of Modern Physics, New York: Macmillan.
  • –––, 1938, “Operational Analysis”, Philosophy of Science, 5: 114–131.
  • –––, 1945, “Some General Principles of Operational Analysis”, in Boring et al. 1945: 246–249.
  • –––, 1956, “The Present State of Operationalism”, in Frank 1956: 74–79.
  • Brillouin, L., 1962, Science and information theory, New York: Academic Press, 2nd edition.
  • Byerly, H.C. and V.A. Lazara, 1973, “Realist Foundations of Measurement”, Philosophy of Science, 40(1): 10–28.
  • Campbell, N.R., 1920, Physics: the Elements, London: Cambridge University Press.
  • Campbell, D.T. and D.W. Fiske, 1959, “Convergent and discriminant validation by the multitrait-multimethod matrix”, Psychological Bulletin, 56(2): 81–105.
  • Cantù, P. and O. Schlaudt (eds.), 2013, “The Epistemological Thought of Otto Hölder”, special issue of Philosophia Scientiæ, 17(1).
  • Carnap, R., 1966, Philosophical foundations of physics, G. Martin (ed.), reprinted as An Introduction to the Philosophy of Science, NY: Dover, 1995.
  • Carrier, M., 1994, The Completeness of Scientific Theories: On the Derivation of Empirical Indicators Within a Theoretical Framework: the Case of Physical Geometry, The University of Western Ontario Series in Philosophy of Science Vol. 53, Dordrecht: Kluwer.
  • Cartwright, N.L., 1999, The Dappled World: A Study of the Boundaries of Science, Cambridge: Cambridge University Press.
  • Cartwright, N.L. and N.M. Bradburn, 2010, “A Theory of Measurement”, URL=<>. (A summary of this paper appears in R.M. Li (ed), The Importance of Common Metrics for Advancing Social Science Theory and Research: A Workshop Summary, Washington, DC: National Academies Press, 2011, pp. 53–70.)
  • Cartwright, N.L. and R. Runhardt, 2014, “Measurement”, in N.L. Cartwright and E. Montuschi (eds.), Philosophy of Social Science: A New Introduction, Oxford: Oxford University Press, pp. 265–287.
  • Chang, H., 2001, “Spirit, air, and quicksilver: The search for the ‘real’ scale of temperature”, Historical Studies in the Physical and Biological Sciences, 31(2): 249–284.
  • –––, 2004, Inventing Temperature: Measurement and Scientific Progress, Oxford: Oxford University Press.
  • –––, 2007, “Scientific Progress: Beyond Foundationalism and Coherentism”, Royal Institute of Philosophy Supplement, 61: 1–20.
  • –––, 2009, “Operationalism”, The Stanford Encyclopedia of Philosophy (Fall 2009 Edition), E.N. Zalta (ed.), URL= <>
  • Chang, H. and N.L. Cartwright, 2008, “Measurement”, in The Routledge Companion to Philosophy of Science, S. Psillos and M. Curd (eds.), New York: Routledge, pp. 367–375.
  • Clagett, M., 1968, Nicole Oresme and the medieval geometry of qualities and motions, Madison: University of Wisconsin Press.
  • Cohen, M.R. and E. Nagel, 1934, An introduction to logic and scientific method, USA: Harcourt, Brace & World.
  • Crease, R.P., 2011, World in the Balance: The Historic Quest for an Absolute System of Measurement, New York and London: W.W. Norton.
  • Darrigol, O., 2003, “Number and measure: Hermann von Helmholtz at the crossroads of mathematics, physics, and psychology”, Studies in History and Philosophy of Science Part A, 34(3): 515–573.
  • Diehl, C.E., 2012, The Theory of Intensive Magnitudes in Leibniz and Kant, PhD Dissertation, Princeton University. [Diehl 2012 available online]
  • Diez, J.A., 1997a, “A Hundred Years of Numbers. An Historical Introduction to Measurement Theory 1887–1990—Part 1”, Studies in History and Philosophy of Science, 28(1): 167–185.
  • –––, 1997b, “A Hundred Years of Numbers. An Historical Introduction to Measurement Theory 1887–1990—Part 2”, Studies in History and Philosophy of Science, 28(2): 237–265.
  • Dingle, H., 1950, “A Theory of Measurement”, The British Journal for the Philosophy of Science, 1(1): 5–26.
  • Duhem, P., 1906, The Aim and Structure of Physical Theory, P.P. Wiener (trans.), New York: Atheneum, 1962.
  • Ellis, B., 1966, Basic Concepts of Measurement, Cambridge: Cambridge University Press.
  • Euclid, Elements, in The Thirteen Books of Euclid’s Elements, T.L. Heath (trans.), Cambridge: Cambridge University Press, 1908.
  • Fechner, G., 1860, Elements of Psychophysics, H.E. Adler (trans.), New York: Holt, Reinhart & Winston, 1966.
  • Feest, U., 2005, “Operationism in Psychology: What the Debate Is About, What the Debate Should Be About”, Journal of the History of the Behavioral Sciences, 41(2): 131–149.
  • Ferguson, A., C.S. Myers, R.J. Bartlett, H. Banister, F.C. Bartlett, W. Brown, N.R. Campbell, K.J.W. Craik, J. Drever, J. Guild, R.A. Houstoun, J.O. Irwin, G.W.C. Kaye, S.J.F. Philpott, L.F. Richardson, J.H. Shaxby, T. Smith, R.H. Thouless, and W.S. Tucker, 1940, “Quantitative estimates of sensory events”, Advancement of Science, 2: 331–349. (The final report of a committee appointed by the British Association for the Advancement of Science in 1932 to consider the possibility of measuring intensities of sensation. See Michell 1999, Ch 6. for a detailed discussion.)
  • Finkelstein, L., 1975, “Representation by symbol systems as an extension of the concept of measurement”, Kybernetes, 4(4): 215–223.
  • –––, 1977, “Introductory article”, (instrument science), Journal of Physics E: Scientific Instruments, 10(6): 566–572.
  • Frank, P.G. (ed.), 1956, The Validation of Scientific Theories. Boston: Beacon Press. (Chapter 2, “The Present State of Operationalism” contains papers by H. Margenau, G. Bergmann, C.G. Hempel, R.B. Lindsay, P.W. Bridgman, R.J. Seeger, and A. Grünbaum)
  • Franklin, A., 1986, The Neglect of Experiment, Cambridge: Cambridge University Press.
  • –––, 1997, “Calibration”, Perspectives on Science, 5(1): 31–80.
  • Franklin, A., M. Anderson, D. Brock, S. Coleman, J. Downing, A. Gruvander, J. Lilly, J. Neal, D. Peterson, M. Price, R. Rice, L. Smith, S. Speirer, and D. Toering, 1989, “Can a Theory-Laden Observation Test the Theory?”, The British Journal for the Philosophy of Science, 40(2): 229–231.
  • Frigerio, A., A. Giordani, and L. Mari, 2010, “Outline of a general model of measurement”, Synthese, 175(2): 123–149.
  • Galison, P., 2003, Einstein’s Clocks, Poincaré’s Maps: Empires of Time, New York and London: W.W. Norton.
  • Gillies, D.A., 1972, “Operationalism”, Synthese, 25(1): 1–24.
  • Giordani, A., and L. Mari, 2012, “Measurement, models, and uncertainty”, IEEE Transactions on Instrumentation and Measurement, 61(8): 2144–2152.
  • Gooday, G., 2004, The Morals of Measurement: Accuracy, Irony and Trust in Late Victorian Electrical Practice, Cambridge: Cambridge University Press.
  • Grant, E., 1996, The foundations of modern science in the middle ages, Cambridge: Cambridge University Press.
  • Grattan-Guinness, I., 1996, “Numbers, magnitudes, ratios, and proportions in Euclid's Elements: How did he handle them?”, Historia Mathematica, 23: 355–375.
  • Guala, F., 2008, “Paradigmatic Experiments: The Ultimatum Game from Testing to Measurement Device”, Philosophy of Science, 75: 658–669.
  • Hacking, I, 1983, Representing and Intervening, Cambridge: Cambridge University Press.
  • Harré, R., 1981, Great Scientific Experiments: Twenty Experiments that Changed our View of the World, Oxford: Phaidon Press.
  • Hartley, R.V., 1928, “Transmission of information”, Bell System technical journal, 7(3): 535–563.
  • Heidelberger, M., 1993a, Nature from Within: Gustav Theodore Fechner and His Psychophysical Worldview, C. Klohr (trans.), Pittsburgh: University of Pittsburgh Press, 2004.
  • –––, 1993b, “Fechner’s impact for measurement theory”, commentary on D.J. Murray, “A perspective for viewing the history of psychophysics”, Behavioural and Brain Sciences, 16(1): 146–148.
  • von Helmholtz, H., 1887, Counting and measuring, C.L. Bryan (trans.), New Jersey: D. Van Nostrand, 1930.
  • Hempel, C.G., 1952, Fundamentals of concept formation in empirical science, International Encyclopedia of Unified Science, Vol. II. No. 7, Chicago and London: University of Chicago Press.
  • –––, 1956, “A logical appraisal of operationalism”, in Frank 1956: 52–67.
  • –––, 1966, Philosophy of Natural Science, Englewood Cliffs, N.J.: Prentice-Hall.
  • Hölder, O., 1901, “Die Axiome der Quantität und die Lehre vom Mass”, Berichte über die Verhandlungen der Königlich Sächsischen Gesellschaft der Wissenschaften zu Leipzig, Mathematische-Physische Klasse, 53: 1–64. (for an excerpt translated into English see Michell and Ernst 1996)
  • Hoover, K. and M. Dowell, 2001, “Measuring Causes: Episodes in the Quantitative Assessment of the Value of Money”, in The Age of Economic Measurement, Annual supplement to vol. 33 of History of Political Economy, J. Klein and M. Morgan (eds.), pp. 137–161.
  • Israel-Jost, V., 2011, “The Epistemological Foundations of Scientific Observation”, South African Journal of Philosophy, 30(1): 29–40.
  • JCGM (Joint Committee for Guides in Metrology), 2012, International Vocabulary of Metrology—Basic and general concepts and associated terms (VIM), 3rd edition with minor corrections. Sèvres: JCGM. [JCGM 2012 available online]
  • Jorgensen, L.M., 2009, “The Principle of Continuity and Leibniz's Theory of Consciousness”, Journal of the History of Philosophy, 47(2): 223–248.
  • Jung, E., 2011, “Intension and Remission of Forms”, in Encyclopedia of Medieval Philosophy, H. Lagerlund (ed.), Netherlands: Springer, pp. 551–555.
  • Kant, I., 1787, Critique of Pure Reason, P. Guyer and A.W. Wood (trans.), Cambridge: Cambridge University Press, 1998.
  • Kirpatovskii, S.I., 1974, “Principles of the information theory of measurements”, Izmeritel'naya Tekhnika, 5: 11–13, English translation in Measurement Techniques, 17(5): 655–659.
  • Krantz, D.H., R.D. Luce, P. Suppes, and A. Tversky, 1971, Foundations of Measurement Vol 1: Additive and Polynomial Representations, San Diego and London: Academic Press. (for references to the two other volumes see Suppes et al. 1989 and Luce et al. 1990)
  • von Kries, J., 1882, “Über die Messung intensiver Grösse und über das sogenannte psychophysiches Gesetz”, Vierteljahrschrift für wissenschaftliche Philosophie (Leipzig), 6: 257–294.
  • Kuhn, T.S., 1961, “The Function of Measurement in Modern Physical Sciences”, Isis, 52(2): 161–193.
  • Kyburg, H.H. Jr., 1984, Theory and Measurement, Cambridge: Cambridge University Press.
  • Latour, B., 1987, Science in Action, Cambridge: Harvard University Press.
  • Leplège, A., 2003, “Epistemology of Measurement in the Social Sciences: Historical and Contemporary Perspectives”, Social Science Information, 42: 451–462.
  • Luce, R.D., D.H. Krantz, P. Suppes, and A. Tversky, 1990, Foundations of Measurement Vol 3: Representation, Axiomatization, and Invariance, San Diego and London: Academic Press. (for references to the two other volumes see Krantz et al. 1971 and Suppes et al. 1989)
  • Luce, R.D., and J.W. Tukey, 1964, “Simultaneous conjoint measurement: A new type of fundamental measurement”, Journal of mathematical psychology, 1(1): 1–27.
  • Luce, R.D. and P. Suppes, 2004, “Representational Measurement Theory”, in Stevens' Handbook of Experimental Psychology, vol. 4: Methodology in Experimental Psychology, J. Wixted and H. Pashler (eds.), New York: Wiley, 3rd edition, pp. 1–41.
  • Mach, E., 1896, Principles of the Theory of Heat, T.J. McCormack (trans.), Dordrecht: D. Reidel, 1986.
  • Mari, L., 1999, “Notes towards a qualitative analysis of information in measurement results”, Measurement, 25(3): 183–192.
  • –––, 2000, “Beyond the representational viewpoint: a new formalization of measurement”, Measurement, 27: 71–84.
  • –––, 2003, “Epistemology of Measurement”, Measurement, 34: 17–30.
  • –––, 2005a, “The problem of foundations of measurement”, Measurement, 38: 259–266.
  • –––, 2005b, “Models of the Measurement Process”, in Handbook of Measuring Systems Design, vol. 2, P. Sydenman and R. Thorn (eds.), Wiley, Ch. 104.
  • Mari, L., and M. Wilson, 2014, “An introduction to the Rasch measurement approach for metrologists”, Measurement, 51: 315–327.
  • Mari, L. and A. Giordani, 2013, “Modeling measurement: error and uncertainty,’, in Error and Uncertainty in Scientific Practice, M. Boumans, G. Hon, and A. Petersen (eds.), Ch. 4.
  • Maxwell, J.C., 1873, A Treatise on Electricity and Magnetism, Oxford: Clarendon Press.
  • McClimans, L., 2010, “A theoretical framework for patient-reported outcome measures”, Theoretical Medicine and Bioethics, 31: 225–240.
  • McClimans, L. and P. Browne, 2012, “Quality of life is a process not an outcome”, Theoretical Medicine and Bioethics, 33: 279–292.
  • Michell, J., 1993, “The origins of the representational theory of measurement: Helmholtz, Hölder, and Russell”, Studies in History and Philosophy of Science Part A, 24(2): 185–206.
  • –––, 1994, “Numbers as Quantitative Relations and the Traditional Theory of Measurement”, British Journal for the Philosophy of Science, 45: 389–406.
  • –––, 1999, Measurement in Psychology: Critical History of a Methodological Concept, Cambridge: Cambridge University Press.
  • –––, 2003, “Epistemology of Measurement: the Relevance of its History for Quantification in the Social Sciences”, Social Science Information, 42(4): 515–534.
  • –––, 2004, “History and philosophy of measurement: A realist view”, in Proceedings of the 10th IMEKO TC7 International symposium on advances of measurement science, [Michell 2004 available online]
  • –––, 2005, “The logic of measurement: A realist overview”, Measurement, 38(4): 285–294.
  • Michell, J. and C. Ernst, 1996, “The Axioms of Quantity and the Theory of Measurement”, Journal of Mathematical Psychology, 40: 235–252. (This article contains a translation into English of a long excerpt from Hölder 1901)
  • Morgan, M., 2001, “Making measuring instruments”, in The Age of Economic Measurement, Annual supplement to vol. 33 of History of Political Economy, J.L. Klein and M. Morgan (eds.), pp. 235–251.
  • Morgan, M. and M. Morrison (eds.), 1999, Models as Mediators: Perspectives on Natural and Social Science, Cambridge: Cambridge University Press.
  • Morrison, M., 1999, “Models as Autonomous Agents”, in Morgan and Morrison 1999: 38–65.
  • –––, 2009, “Models, measurement and computer simulation: the changing face of experimentation”, Philosophical Studies, 143: 33–57.
  • Morrison, M. and M. Morgan, 1999, “Models as Mediating Instruments”, in Morgan and Morrison 1999: 10–37.
  • Mundy, B., 1987, “The metaphysics of quantity”, Philosophical Studies, 51(1): 29–54.
  • Nagel, E., 1931, “Measurement”, Erkenntnis, 2(1): 313–333.
  • Narens, L., 1981, “On the scales of measurement”, Journal of Mathematical Psychology, 24: 249–275.
  • –––, 1985, Abstract Measurement Theory, Cambridge, MA: MIT Press.
  • Nunnally, J.C., and I.H. Bernstein, 1994, Psychometric Theory, New York: McGraw-Hill, 3rd edition.
  • Parker, W., forthcoming, “Computer Simulation, Measurement and Data Assimilation”, British Journal for the Philosophy of Science.
  • Poincaré, H., 1898, “The Measure of Time”, in The Value of Science, New York: Dover, 1958, pp. 26–36.
  • –––, 1902, Science and Hypothesis, W.J. Greenstreet (trans.), New York: Cosimo, 2007.
  • Porter, T.M., 1995, Trust in Numbers: The Pursuit of Objectivity in Science and Public Life, New Jersey: Princeton University Press.
  • –––, 2007, “Precision”, in Boumans 2007b: 343–356.
  • Rasch, G., 1960, Probabilistic Models for Some Intelligence and Achievement Tests, Copenhagen: Danish Institute for Educational Research.
  • Reiss, J., 2001, “Natural Economic Quantities and Their Measurement”, Journal of Economic Methodology, 8(2): 287–311.
  • Riordan, S., 2014, “The Objectivity of Scientific Measures”, Studies in History and Philosophy of Science Part A. Riordan 2014 available online]
  • Reichenbach, H., 1927, The Philosophy of Space and Time, New York: Dover Publications, 1958.
  • Rothbart, D. and S.W. Slayden, 1994, “The Epistemology of a Spectrometer”, Philosophy of Science, 61: 25–38.
  • Russell, B., 1903, The Principles of Mathematics, New York: W.W. Norton.
  • Savage, C.W. and P. Ehrlich, 1992, “A brief introduction to measurement theory and to the essays”, in Philosophical and Foundational Issues in Measurement Theory, C.W. Savage and P. Ehrlich (eds.), New Jersey: Lawrence Erlbaum, pp. 1–14.
  • Schaffer, S., 1992, “Late Victorian metrology and its instrumentation: a manufactory of Ohms”, in Invisible Connections: Instruments, Institutions, and Science, R. Bud and S.E. Cozzens (eds.), Cardiff: SPIE Optical Engineering, pp. 23–56.
  • Scott, D. and P. Suppes, 1958, “Foundational aspects of theories of measurement”, Journal of Symbolic logic, 23(2): 113–128.
  • Shannon, C.E., 1948, “A Mathematical Theory of Communication”, The Bell System Technical Journal, 27: 379–423 and 623–656.
  • Shannon, C.E. and W. Weaver, 1949, A Mathematical Theory of Communication, Urbana: The University of Illinois Press.
  • Shapere, D., 1982, “The Concept of Observation in Science and Philosophy”, Philosophy of Science, 49(4): 485–525.
  • Skinner, B.F., 1945, “The operational analysis of psychological terms”, in Boring et al. 1945: 270–277.
  • Soler, L., C. Allamel-Raffin, F. Wieber, and J.L. Gangloff, 2011, “Calibration in everyday scientific practice: a conceptual framework”, paper presented at the 3rd Biennial Conference of the Society for Philosophy of Science in Practice, Exeter, UK. [Soler et al. 2011 available online]
  • Stevens, S.S., 1935, “The operational definition of psychological concepts”, Psychological Review, 42(6): 517–527.
  • –––, 1946, “On the theory of scales of measurement”, Science, 103: 677–680.
  • –––, 1951, “Mathematics, Measurement, Psychophysics”, in Handbook of Experimental Psychology, S.S. Stevens (ed.), New York: Wiley & Sons, pp. 1–49.
  • –––, 1959, “Measurement, psychophysics and utility”, in Measurement: Definitions and Theories, C.W. Churchman and P. Ratoosh (eds.), New York: Wiley & Sons, pp. 18–63.
  • –––, 1975, Psychophysics: Introduction to Its Perceptual, Neural and Social Prospects, New York: Wiley & Sons.
  • Suppes, P., 1951, “A set of independent axioms for extensive quantities”, Portugaliae Mathematica, 10(4): 163–172.
  • –––, 1960, “A Comparison of the Meaning and Uses of Models in Mathematics and the Empirical Sciences”, Synthese, 12(2): 287–301.
  • –––, 1962, “Models of Data”, in Logic, methodology and philosophy of science: proceedings of the 1960 International Congress, E. Nagel (ed.), Stanford: Stanford University Press, pp. 252–261.
  • –––, 1967, “What is a Scientific Theory?”, in Philosophy of Science Today, S. Morgenbesser (ed.), New York: Basic Books, pp. 55–67.
  • Suppes, P., D.H. Krantz, R.D. Luce, and A. Tversky, 1989, Foundations of Measurement Vol 2: Geometrical, Threshold and Probabilistic Representations, San Diego and London: Academic Press. (for references to the two other volumes see Krantz et al. 1971 and Luce et al. 1990)
  • Swoyer, C., 1987, “The Metaphysics of Measurement”, in Measurement, Realism and Objectivity, J. Forge (ed.), Reidel, pp. 235–290.
  • Sylla, E., 1971, “Medieval quantifications of qualities: The ‘Merton School’”, Archive for history of exact sciences, 8(1): 9–39.
  • Tabor, D., 1970, “The hardness of solids”, Review of Physics in Technology, 1(3): 145–179.
  • Tal, E., 2011, “How Accurate Is the Standard Second?”, Philosophy of Science, 78(5): 1082–96.
  • –––, 2013, “Old and New Problems in Philosophy of Measurement”, Philosophy Compass, 8(12): 1159–1173.
  • –––, forthcoming-a, “Making Time: A Study in the Epistemology of Measurement”, British Journal for the Philosophy of Science, doi 10.1093/bjps/axu037.
  • –––, forthcoming-b, “A Model-Based Epistemology of Measurement”, in Reasoning in Measurement, N. Mößner and A. Nordmann (eds.), London: Pickering & Chatto Publishers.
  • Teller, P., 2013a, “The concept of measurement-precision”, Synthese, 190: 189–202.
  • –––, 2013b, “Measurement accuracy realism”, paper presented at Foundations of Physics 2013: The 17th UK and European Meeting on the Foundations of Physics. [Teller 2013a available online]
  • Thomson, W., 1889, “Electrical Units of Measurement”, in Popular Lectures and Addresses, vol. 1, London: MacMillan, pp. 73–136.
  • Trout, J.D., 1998, Measuring the intentional world: Realism, naturalism, and quantitative methods in the behavioral sciences, Oxford: Oxford University Press.
  • –––, 2000, “Measurement”, in A Companion to the Philosophy of Science, W.H. Newton-Smith (ed.), Malden, MA: Blackwell, pp. 265–276.
  • van Fraassen, B.C., 1980, The Scientific Image, Oxford: Clarendon Press.
  • –––, 2008, Scientific Representation: Paradoxes of Perspective, Oxford: Oxford University Press.
  • –––, 2009, “The perils of Perrin, in the hands of philosophers”, Philosophical Studies, 143: 5–24.
  • –––, 2012, “Modeling and Measurement: The Criterion of Empirical Grounding”, Philosophy of Science, 79(5): 773–784.
  • Wilson, M., 2013, “Using the concept of a measurement system to characterize measurement models used in psychometrics”, Measurement, 46(9): 3766–3774.
  • Wise, M.N. (ed.), 1995, The Values of Precision, NJ: Princeton University Press.
  • Wise, M.N. and C. Smith, 1986, “Measurement, Work and Industry in Lord Kelvin's Britain”, Historical Studies in the Physical and Biological Sciences, 17(1): 147–173.

Other Internet Resources


The author would like to thank Stephan Hartmann, Wendy Parker, Paul Teller, Alessandra Basso, Sally Riordan, Johanna Wolff, Conrad Heilmann and participants of the History and Philosophy of Physics reading group at the Department of History and Philosophy of Science at the University of Cambridge for helpful feedback on drafts of this entry. The author is also indebted to Joel Michell and Oliver Schliemann for useful bibliographical advice, and to John Wiley and Sons Publishers for permission to reproduce excerpt from Tal (2013). Work on this entry was supported by an Alexander von Humboldt Postdoctoral Research Fellowship and a Marie Curie Intra-European Fellowship within the 7th European Community Framework Programme.

Copyright © 2015 by
Eran Tal <>

Open access to the SEP is made possible by a world-wide funding initiative.
Please Read How You Can Help Keep the Encyclopedia Free