# Measurement in Science

*First published Mon Jun 15, 2015*

Measurement is an integral part of modern science as well as of
engineering, commerce, and daily life. Measurement is often
considered a hallmark of the scientific enterprise and a privileged
source of knowledge relative to qualitative modes of
inquiry.^{[1]}
Despite its ubiquity and importance, there is little consensus among
philosophers as to how to define measurement, what sorts of things
are measurable, or which conditions make measurement possible. Most
(but not all) contemporary authors agree that measurement is an
activity that involves interaction with a concrete system with the
aim of representing aspects of that system in abstract terms (e.g.,
in terms of classes, numbers, vectors etc.) But this characterization
also fits various kinds of perceptual and linguistic activities that
are not usually considered measurements, and is therefore too broad
to count as a definition of measurement. Moreover, if
“concrete” implies “real”, this
characterization is also too narrow, as measurement often involves
the representation of ideal systems such as the average household or
an electron at complete rest.

Philosophers have written on a variety of conceptual, metaphysical, semantic and epistemological issues related to measurement. This entry will survey the central philosophical standpoints on the nature of measurement, the notion of measurable quantity and related epistemological issues. It will refrain from elaborating on the many discipline-specific problems associated with measurement and focus on issues that have a general character.

- 1. Overview
- 2. Quantity and Magnitude: A Brief History
- 3. Mathematical Theories of Measurement (“Measurement Theory”)
- 4. Operationalism and Conventionalism
- 5. Realist Accounts of Measurement
- 6. Information-Theoretic Accounts of Measurement
- 7. Model-Based Accounts of Measurement
- 8. The Epistemology of Measurement
- Bibliography
- Academic Tools
- Other Internet Resources
- Related Entries

## 1. Overview

Modern philosophical discussions about measurement—spanning from the late nineteenth century to the present day—may be divided into several strands of scholarship. These strands reflect different perspectives on the nature of measurement and the conditions that make measurement possible and reliable. The main strands are mathematical theories of measurement, operationalism, conventionalism, realism, information-theoretic accounts and model-based accounts. These strands of scholarship do not, for the most part, constitute directly competing views. Instead, they are best understood as highlighting different and complementary aspects of measurement. The following is a very rough overview of these perspectives:

**Mathematical theories****of measurement**view measurement as the mapping of qualitative empirical relations to relations among numbers (or other mathematical entities).**Operationalists and conventionalists**view measurement as a set of operations that shape the meaning and/or regulate the use of a quantity-term.**Realists**view measurement as the estimation of mind-independent properties and/or relations.**Information-theoretic accounts**view measurement as the gathering and interpretation of information about a system.**Model-based accounts**view measurement as the coherent assignment of values to parameters in a theoretical and/or statistical model of a process.

These perspectives are in principle consistent with each other. While mathematical theories of measurement deal with the mathematical foundations of measurement scales, operationalism and conventionalism are primarily concerned with the semantics of quantity terms, realism is concerned with the metaphysical status of measurable quantities, and information-theoretic and model-based accounts are concerned with the epistemological aspects of measuring. Nonetheless, the subject domain is not as neatly divided as the list above suggests. Issues concerning the metaphysics, epistemology, semantics and mathematical foundations of measurement are interconnected and often bear on one another. Hence, for example, operationalists and conventionalists have often adopted anti-realist views, and proponents of model-based accounts have argued against the prevailing empiricist interpretation of mathematical theories of measurement. These subtleties will become clear in the following discussion.

The list of strands of scholarship is neither exclusive nor exhaustive. It reflects the historical trajectory of the philosophical discussion thus far, rather than any principled distinction among different levels of analysis of measurement. Some philosophical works on measurement belong to more than one strand, while many other works do not squarely fit either. This is especially the case since the early 2000s, when measurement returned to the forefront of philosophical discussion after several decades of relative neglect. This recent body of scholarship is sometimes called “the epistemology of measurement”, and includes a rich array of works that cannot yet be classified into distinct schools of thought. The last section of this entry will be dedicated to surveying some of these developments.

## 2. Quantity and Magnitude: A Brief History

Although the philosophy of measurement formed as a distinct area of
inquiry only during the second half of the nineteenth century,
fundamental concepts of measurement such as magnitude and quantity
have been discussed since antiquity. According to
Euclid’s *Elements*, a magnitude—such as a line, a
surface or a solid—measures another when the latter is a whole
multiple of the former (Book V, def. 1 & 2). Two magnitudes have a
common measure when they are both whole multiples of some magnitude,
and are incommensurable otherwise (Book X, def. 1). The discovery of
incommensurable magnitudes allowed Euclid and his contemporaries to
develop the notion of a *ratio* of magnitudes. Ratios can be
either rational or irrational, and therefore the concept of ratio is
more general than that of measure (Michell 2003, 2004;
Grattan-Guinness 1996).

Aristotle distinguished between quantities and qualities. Examples
of quantities are numbers, lines, surfaces, bodies, time and place,
whereas examples of qualities are justice, health, hotness and
paleness (*Categories* §6 and §8). According to
Aristotle, quantities admit of equality and inequality but not of
degrees, as “one thing is not more four-foot than another”
(ibid. 6.6a19). Qualities, conversely, do not admit of equality or
inequality but do admit of degrees, “for one thing is called
more pale or less pale than another” (ibid. 8.10b26). Aristotle
did not clearly specify whether degrees of qualities such as paleness
correspond to distinct qualities, or whether the same quality,
paleness, was capable of different intensities. This topic was at the
center of an ongoing debate in the thirteenth and fourteenth centuries
(Jung 2011). Duns Scotus supported the “addition theory”,
according to which a change in the degree of a quality can be
explained by the addition or subtraction of smaller degrees of that
quality (2011: 553). This theory was later refined by Nicole Oresme,
who used geometrical figures to represent changes in the intensity of
qualities such as velocity (Clagett 1968; Sylla
1971). Oresme’s geometrical representations established a
subset of qualities that were amenable to quantitative treatment,
thereby challenging the strict Aristotelian dichotomy between
quantities and qualities. These developments made possible the
formulation of quantitative laws of motion during the sixteenth and
seventeenth centuries (Grant 1996).

The concept of qualitative intensity was further developed by Leibniz and Kant. Leibniz’s “principle of continuity” stated that all natural change is produced by degrees. Leibniz argued that this principle applies not only to changes in extended magnitudes such as length and duration, but also to intensities of representational states of consciousness, such as sounds (Jorgensen 2009; Diehl 2012). Kant is thought to have relied on Leibniz’s principle of continuity to formulate his distinction between extensive and intensive magnitudes. According to Kant, extensive magnitudes are those “in which the representation of the parts makes possible the representation of the whole” (1787: A162/B203). An example is length: a line can only be mentally represented by a successive synthesis in which parts of the line join to form the whole. For Kant, the possibility of such synthesis was grounded in the forms of intuition, namely space and time. Intensive magnitudes, like warmth or colors, also come in continuous degrees, but their apprehension takes place in an instant rather than through a successive synthesis of parts. The degrees of intensive magnitudes “can only be represented through approximation to negation” (1787: A 168/B210), that is, by imagining their gradual diminution until their complete absence.

Scientific developments during the nineteenth century challenged the distinction between extensive and intensive magnitudes. Thermodynamics and wave optics showed that differences in temperature and hue corresponded to differences in spatio-temporal magnitudes such as velocity and wavelength. Electrical magnitudes such as resistance and conductance were shown to be capable of addition and division despite not being extensive in the Kantian sense, i.e., not synthesized from spatial or temporal parts. Moreover, early experiments in psychophysics suggested that intensities of sensation such as brightness and loudness could be represented as sums of “just noticeable differences” among stimuli, and could therefore be thought of as composed of parts (see Section 3.3). These findings, along with advances in the axiomatization of branches of mathematics, motivated some of the leading scientists of the late nineteenth century to attempt to clarify the mathematical foundations of measurement (Maxwell 1873; von Kries 1882; Helmholtz 1887; Mach 1896; Poincaré 1898; Hölder 1901; for historical surveys see Darrigol 2003; Michell 1993, 2003; Cantù and Schlaudt 2013). These works are viewed today as precursors to the body of scholarship known as “measurement theory”.

## 3. Mathematical Theories of Measurement (“Measurement Theory”)

Mathematical theories of measurement (often referred to
collectively as “measurement theory”) concern the
conditions under which relations among numbers (and other
mathematical entities) can be used to express relations among
objects.^{[2]} In
order to appreciate the need for mathematical theories of
measurement, consider the fact that relations exhibited by
numbers—such as equality, sum, difference and ratio—do
not always correspond to relations among the objects measured by
those numbers. For example, 60 is twice 30, but one would be mistaken
in thinking that an object measured at 60 degrees Celsius is twice as
hot as an object at 30 degrees Celsius. This is because the zero
point of the Celsius scale is arbitrary and does not correspond to an
absence of
temperature.^{[3]}
Similarly, numerical intervals do not
always carry empirical information. When subjects are asked to rank
on a scale from 1 to 7 how strongly they agree with a given
statement, there is no *prima facie* reason to think that the
intervals between 5 and 6 and between 6 and 7 correspond to equal
increments of strength of opinion. To provide a third example,
equality among numbers is transitive [if (a=b & b=c) then a=c]
but empirical comparisons among physical magnitudes reveal only
approximate equality, which is not a transitive relation. These
examples suggest that not all of the mathematical relations among
numbers used in measurement are empirically significant, and that
different kinds of measurement scale convey different kinds of
empirically significant information.

The study of measurement scales and the empirical information they convey is the main concern of mathematical theories of measurement. In his seminal 1887 essay, “Counting and Measuring”, Hermann von Helmholtz phrased the key question of measurement theory as follows:

[W]hat is the objective meaning of expressing through denominate numbers the relations of real objects as magnitudes, and under what conditions can we do this? (1887: 4)

Broadly speaking, measurement theory sets out to
(i) identify the assumptions underlying the use of various
mathematical structures for describing aspects of the empirical
world, and (ii) draw lessons about the adequacy and limits of using
these mathematical structures for describing aspects of the empirical
world. Following Otto Hölder (1901), measurement theorists often
tackle these goals through formal proofs, with the assumptions in (i)
serving as axioms and the lessons in (ii) following as theorems. A
key insight of measurement theory is that the empirically significant
aspects of a given mathematical structure are those that *mirror
relevant relations* among the objects being measured. For
example, the relation “bigger than” among numbers is
empirically significant for measuring length insofar as it mirrors
the relation “longer than” among objects. This mirroring,
or mapping, of relations between objects and mathematical entities
constitutes a measurement scale. As will be clarified below,
measurement scales are usually thought of as isomorphisms or
homomorphisms between objects and mathematical entities.

Other than these broad goals and claims, measurement theory is a
highly heterogeneous body of scholarship. It includes works that span
from the late nineteenth century to the present day and endorse a wide
array of views on the ontology, epistemology and semantics of
measurement. Two main differences among mathematical theories of
measurement are especially worth mentioning. The first concerns the
nature of the *relata*, or “objects”, whose
relations numbers are supposed to mirror. These *relata* may be
understood in at least four different ways: as concrete individual
objects, as qualitative observations of concrete individual objects,
as abstract representations of individual objects, or as universal
properties of objects. Which interpretation is adopted depends in
large part on the author’s metaphysical and epistemic
commitments. This issue will be especially relevant to the discussion
of realist accounts of measurement (Section
5). Second, different measurement theorists have taken different
stands on the kind of empirical evidence that is required to establish
mappings between objects and numbers. As a result, measurement
theorists have come to disagree about the necessary conditions for
establishing the measurability of attributes, and specifically about
whether psychological attributes are measurable. Debates about
measurability have been highly fruitful for the development of
measurement theory, and the following subsections will introduce some
of these debates and the central concepts developed therein.

### 3.1 Fundamental and derived measurement

During the late nineteenth and early twentieth centuries several
attempts were made to provide a universal definition of measurement.
Although accounts of measurement varied, the consensus was that
measurement is a method of *assigning numbers to magnitudes*.
For example, Helmholtz (1887: 17) defined measurement as the procedure
by which one finds the denominate number that expresses the value of a
magnitude, where a “denominate number” is a number
together with a unit, e.g., 5 meters, and a magnitude is a quality of
objects that is amenable to ordering from smaller to greater, e.g.,
length. Bertrand Russell similarly stated that measurement is

any method by which a unique and reciprocal correspondence is established between all or some of the magnitudes of a kind and all or some of the numbers, integral, rational or real. (1903: 176)

Norman Campbell defined measurement simply as “the process of assigning numbers to represent qualities”, where a quality is a property that admits of non-arbitrary ordering (1920: 267).

Defining measurement as numerical assignment raises the question: which assignments are adequate, and under what conditions? Early measurement theorists like Helmholtz (1887), Hölder (1901) and Campbell (1920) argued that numbers are adequate for expressing magnitudes insofar as algebraic operations among numbers mirror empirical relations among magnitudes. For example, the qualitative relation “longer than” among rigid rods is (roughly) transitive and asymmetrical, and in this regard shares structural features with the relation “larger than” among numbers. Moreover, the end-to-end concatenation of rigid rods shares structural features—such as associativity and commutativity—with the mathematical operation of addition. A similar situation holds for the measurement of weight with an equal-arms balance. Here deflection of the arms provides ordering among weights and the heaping of weights on one pan constitutes concatenation.

Early measurement theorists formulated axioms that describe these
qualitative empirical structures, and used these axioms to prove
theorems about the adequacy of assigning numbers to magnitudes that
exhibit such structures. Specifically, they proved that ordering and
concatenation are together sufficient for the construction of
an *additive* numerical representation of the relevant
magnitudes. An additive representation is one in which addition is
empirically meaningful, and hence also multiplication, division
etc. Campbell called measurement procedures that satisfy the
conditions of additivity “fundamental” because they do not
involve the measurement of any other magnitude (1920: 277). Kinds of
magnitudes for which a fundamental measurement procedure has been
found—such as length, area, volume, duration, weight and
electrical resistance—Campbell called “fundamental
magnitudes”. A hallmark of such magnitudes is that it is
possible to generate them by concatenating a standard sequence of
equal units, as in the example of a series of equally spaced marks on
a ruler.

Although they viewed additivity as the hallmark of measurement, most early measurement theorists acknowledged that additivity is not necessary for measuring. Other magnitudes exist that admit of ordering from smaller to greater, but whose ratios and/or differences cannot currently be determined except through their relations to other, fundamentally measurable magnitudes. Examples are temperature, which may be measured by determining the volume of a mercury column, and density, which may be measured as the ratio of mass and volume. Such indirect determination came to be called “derived” measurement and the relevant magnitudes “derived magnitudes” (Campbell 1920: 275–7).

At first glance, the distinction between fundamental and derived
measurement may seem reminiscent of the distinction between extensive
and intensive magnitudes, and indeed fundamental measurement is
sometimes called “extensive”. Nonetheless, it is important
to note that the two distinctions are based on significantly different
criteria of measurability. As discussed
in Section 2, the extensive-intensive
distinction focused on the intrinsic structure of the quantity in
question, i.e., whether or not it is composed of spatio-temporal
parts. The fundamental-derived distinction, by contrast, focuses on
the properties of measurement *operations*. A fundamentally
measurable magnitude is one for which a fundamental measurement
operation has been found. Consequently, fundamentality is not an
intrinsic property of a magnitude: a derived magnitude can become
fundamental with the discovery of new operations for its
measurement. Moreover, in fundamental measurement the numerical
assignment need not mirror the structure of spatio-temporal parts.
Electrical resistance, for example, can be fundamentally measured by
connecting resistors in a series (Campbell 1920: 293). This is
considered a fundamental measurement operation because it has a shared
structure with numerical addition, even though objects with equal
resistance are not generally equal in size.

The distinction between fundamental and derived measurement was
revised by subsequent authors. Brian Ellis (1966: Ch. 5–8)
distinguished among three types of measurement: fundamental,
associative and derived. Fundamental measurement requires ordering
and concatenation operations satisfying the same conditions specified
by Campbell. Associative measurement procedures are based on a
correlation of two ordering relationships, e.g., the correlation
between the volume of a mercury column and its temperature. Derived
measurement procedures consist in the determination of the value of a
constant in a physical law. The constant may be local, as in the
determination of the specific density of water from mass and volume,
or universal, as in the determination of the Newtonian gravitational
constant from force, mass and distance. Henry Kyburg (1984:
Ch. 5–7) proposed a somewhat different threefold distinction
among direct, indirect and systematic measurement, which does not
completely overlap with that of
Ellis.^{[4]}
A more radical revision of the distinction
between fundamental and derived measurement was offered by R. Duncan
Luce and John Tukey (1964) in their work on conjoint measurement,
which will be discussed in Section 3.4.

### 3.2 The classification of scales

The previous subsection discussed the axiomatization of empirical
structures, a line of inquiry that dates back to the early days of
measurement theory. A complementary line of inquiry within
measurement theory concerns the classification of measurement
scales. The psychophysicist S.S. Stevens (1946, 1951) distinguished
among four types of scales: nominal, ordinal, interval and
ratio. Nominal scales represent objects as belonging to classes that
have no particular order, e.g., male and female. Ordinal scales
represent order but no further algebraic structure. For example, the
Mohs scale of mineral hardness represents minerals with numbers
ranging from 1 (softest) to 10 (hardest), but there is no empirical
significance to equality among intervals or ratios of those
numbers.^{[5]}
Celsius and Fahrenheit are examples of interval scales: they
represent equality or inequality among intervals of temperature, but
not ratios of temperature, because their zero points are
arbitrary. The Kelvin scale, by contrast, is a ratio scale, as are
the familiar scales representing mass in kilograms, length in meters
and duration in seconds. Stevens later refined this classification
and distinguished between linear and logarithmic interval scales
(1959: 31–34) and
between ratio scales with and without a natural unit (1959:
34). Ratio scales with a
natural unit, such as those used for counting discrete objects and
for representing probabilities, were named “absolute”
scales.

As Stevens notes, scale types are individuated by the families of transformations they can undergo without loss of empirical information. Empirical relations represented on ratio scales, for example, are invariant under multiplication by a positive number, e.g., multiplication by 2.54 converts from inches to centimeters. Linear interval scales allow both multiplication by a positive number and a constant shift, e.g., the conversion from Celsius to Fahrenheit in accordance with the formula °C × 9/5 + 32 = °F. Ordinal scales admit of any transformation function as long as it is monotonic and increasing, and nominal scales admit of any one-to-one substitution. Absolute scales admit of no transformation other than identity. Stevens’ classification of scales was later generalized by Louis Narens (1981, 1985: Ch. 2) and Luce et al. (1990: Ch. 20) in terms of the homogeneity and uniqueness of the relevant transformation groups.

While Stevens’ classification of scales met with general approval in scientific and philosophical circles, its wider implications for measurement theory became the topic of considerable debate. Two issues were especially contested. The first was whether classification and ordering operations deserve to be called “measurement” operations, and accordingly whether the representation of magnitudes on nominal and ordinal scales should count as measurement. Several physicists, including Campbell, argued that classification and ordering operations did not provide a sufficiently rich structure to warrant the use of numbers, and hence should not count as measurement operations. The second contested issue was whether a concatenation operation had to be found for a magnitude before it could be fundamentally measured on a ratio scale. The debate became especially heated when it re-ignited a longer controversy surrounding the measurability of intensities of sensation. It is to this debate we now turn.

### 3.3 The measurability of sensation

One of the main catalysts for the development of mathematical
theories of measurement was an ongoing debate surrounding
measurability in psychology. The debate is often traced back to Gustav
Fechner’s (1860) *Elements of Psychophysics*, in which he
described a method of measuring intensities of sensation.
Fechner’s method was based on the recording of “just
noticeable differences” between sensations associated with pairs
of stimuli, e.g., two sounds of different intensity. These differences
were assumed to be equal increments of intensity of sensation. As
Fechner showed, under this assumption a stable linear relationship is
revealed between the intensity of sensation and the logarithm of the
intensity of the stimulus, a relation that came to be known as
“Fechner’s law” (Heidelberger 1993a: 203; Luce and
Suppes 2004: 11–2). This law in turn provides a method for
indirectly measuring the intensity of sensation by measuring the
intensity of the stimulus, and hence, Fechner argued, provides
justification for measuring intensities of sensation on the real
numbers.

Fechner’s claims concerning the measurability of sensation became the subject of a series of debates that lasted nearly a century and proved extremely fruitful for the philosophy of measurement, involving key figures such as Mach, Helmholtz, Campbell and Stevens (Heidelberger 1993a: Ch. 6 and 1993b; Michell 1999: Ch. 6). Those objecting to the measurability of sensation, such as Campbell, stressed the necessity of an empirical concatenation operation for fundamental measurement. Since intensities of sensation cannot be concatenated to each other in the manner afforded by lengths and weights, there could be no fundamental measurement of sensation intensity. Moreover, Campbell claimed that none of the psychophysical regularities discovered thus far are sufficiently universal to count as laws in the sense required for derived measurement (Campbell in Ferguson et al. 1940: 347). All that psychophysicists have shown is that intensities of sensation can be consistently ordered, but order by itself does not yet warrant the use of numerical relations such as sums and ratios to express empirical results.

The central opponent of Campbell in this debate was Stevens, whose
distinction between types of measurement scale was discussed above.
Stevens defined measurement as the “assignment of numerals to
objects or events according to rules” (1951: 1) and claimed that
any consistent and non-random assignment counts as measurement in the
broad sense (1975: 47). In useful cases of scientific inquiry, Stevens
claimed, measurement can be construed somewhat more narrowly as a
numerical assignment that is based on the results of *matching*
operations, such as the coupling of temperature to mercury volume or
the matching of sensations to each other. Stevens argued against the
view that relations among numbers need to mirror qualitative empirical
structures, claiming instead that measurement scales should be
regarded as arbitrary formal schemas and adopted in accordance with
their usefulness for describing empirical data. For example, adopting
a ratio scale for measuring the sensations of loudness, volume and
density of sounds leads to the formulation of a simple linear relation
among the reports of experimental subjects: loudness = volume ×
density (1975: 57–8). Such assignment of numbers to sensations
counts as measurement because it is consistent and non-random, because
it is based on the matching operations performed by experimental
subjects, and because it captures regularities in the experimental
results. According to Stevens, these conditions are together
sufficient to justify the use of a ratio scale for measuring
sensations, despite the fact that “sensations cannot be
separated into component parts, or laid end to end like measuring
sticks” (1975: 38; see also Hempel 1952: 68–9).

### 3.4 Representational Theory of Measurement

In the mid-twentieth century the two main lines of inquiry in
measurement theory, the one dedicated to the empirical conditions of
quantification and the one concerning the classification of scales,
converged in the work of Patrick Suppes (1951; Scott and Suppes 1958;
for historical surveys see Savage and Ehrlich 1992; Diez 1997a,b).
Suppes’ work laid the basis for the Representational Theory of
Measurement (RTM), which remains the most influential mathematical
theory of measurement to date (Krantz et al. 1971; Suppes et
al. 1989; Luce et al. 1990). RTM defines measurement as the
construction of mappings from empirical relational structures into
numerical relational structures (Krantz et al. 1971: 9). An empirical
relational structure consists of a set of empirical objects (e.g.,
rigid rods) along with certain qualitative relations among them
(e.g., ordering, concatenation), while a numerical relational
structure consists of a set of numbers (e.g., real numbers) and
specific mathematical relations among them (e.g., “equal to or
bigger than”, addition). Simply put, a measurement scale is a
many-to-one mapping—a homomorphism—from an empirical to a
numerical relational structure, and measurement is the construction
of scales.^{[6]} RTM
goes into great detail in clarifying the assumptions underlying the
construction of different types of measurement scales. Each type of
scale is associated with a set of assumptions about the qualitative
relations obtaining among objects represented on that type of
scale. From these assumptions, or axioms, the authors of RTM derive
the representational adequacy of each scale type, as well as the
family of permissible transformations making that type of scale
unique. In this way RTM provides a conceptual link between the
empirical basis of measurement and the typology of
scales.^{[7]}

On the issue of measurability, the Representational Theory takes a middle path between the liberal approach adopted by Stevens and the strict emphasis on concatenation operations espoused by Campbell. Like Campbell, RTM accepts that rules of quantification must be grounded in known empirical structures and should not be chosen arbitrarily to fit the data. However, RTM rejects the idea that additive scales are adequate only when concatenation operations are available (Luce and Suppes 2004: 15). Instead, RTM argues for the existence of fundamental measurement operations that do not involve concatenation. The central example of this type of operation is known as “additive conjoint measurement” (Luce and Tukey 1964; Krantz et al. 1971: 17–21 and Ch. 6–7). Here, measurements of two or more different types of attribute, such as the temperature and pressure of a gas, are obtained by observing their joint effect, such as the volume of the gas. Luce and Tukey showed that by establishing certain qualitative relations among volumes under variations of temperature and pressure, one can construct additive representations of temperature and pressure, without invoking any antecedent method of measuring volume. This sort of procedure is generalizable to any suitably related triplet of attributes, such as the loudness, intensity and frequency of pure tones, or the preference for a reward, it size and the delay in receiving it (Luce and Suppes 2004: 17). The discovery of additive conjoint measurement led the authors of RTM to divide fundamental measurement into two kinds: traditional measurement procedures based on concatenation operations, which they called “extensive measurement”, and conjoint or “nonextensive” fundamental measurement. Under this new conception of fundamentality, all the traditional physical attributes can be measured fundamentally, as well as many psychological attributes (Krantz et al. 1971: 502–3).

## 4. Operationalism and Conventionalism

Above we saw that mathematical theories of measurement are
primarily concerned with the mathematical properties of measurement
scales and the conditions of their application. A related but distinct
strand of scholarship concerns the meaning and use of quantity
terms. Scientific theories and models are commonly expressed in terms
of quantitative relations among parameters, bearing names such as
“length”, “unemployment rate” and
“introversion”. A realist about one of these terms would
argue that it refers to a set of properties or relations that exist
independently of being measured. An operationalist or conventionalist
would argue that the way such quantity-terms apply to concrete
particulars depends on nontrivial choices made by humans, and
specifically on choices that have to do with the way the relevant
quantity is measured. Note that under this broad construal, realism is
compatible with operationalism and conventionalism. That is, it is
conceivable that choices of measurement method regulate the use of a
quantity-term and that, given the *correct* choice, this term
succeeds in referring to a mind-independent property or
relation. Nonetheless, many operationalists and conventionalists
adopted stronger views, according to which there are no facts of the
matter as to which of several and nontrivially different operations is
correct for applying a given quantity-term. These stronger variants
are inconsistent with realism about measurement. This section will be
dedicated to operationalism and conventionalism, and the next to
realism about measurement.

Operationalism (or “operationism”) about measurement is the view that the meaning of quantity-concepts is determined by the set of operations used for their measurement. The strongest expression of operationalism appears in the early work of Percy Bridgman (1927), who argued that

we mean by any concept nothing more than a set of operations; the concept is synonymous with the corresponding set of operations. (1927: 5)

Length, for example, would be defined as the
result of the operation of concatenating rigid rods. According to
this extreme version of operationalism, different operations measure
different quantities. Length measured by using rulers and by timing
electromagnetic pulses should, strictly speaking, be distinguished
into two distinct quantity-concepts labeled “length-1”
and “length-2” respectively. This conclusion led Bridgman
to claim that currently accepted quantity concepts have
“joints” where different operations overlap in their
domain of application. He warned against dogmatic faith in the unity
of quantity concepts across these “joints”, urging
instead that unity be checked against experiments whenever the
application of a quantity-concept is to be extended into a new
domain. Nevertheless, Bridgman conceded that as long as the results
of different operations agree within experimental error it is
pragmatically justified to label the corresponding quantities with
the same name (1927:
16).^{[8]}

Operationalism became influential in psychology, where it was
well-received by behaviorists like Edwin Boring (1945) and
B.F. Skinner (1945). Indeed, Skinner maintained that behaviorism is
“nothing more than a thoroughgoing operational analysis of
traditional mentalistic concepts” (1945: 271). Stevens, who was
Boring’s student, was a key promoter of operationalism in
psychology, and argued that psychological concepts have empirical
meaning only if they stand for definite and concrete operations
(1935: 517). The idea that concepts are defined by measurement
operations is consistent with Stevens’ liberal views on
measurability, which were discussed above (Section
3.3). As long as the assignment of numbers to objects is
performed in accordance with concrete and consistent rules, Stevens
maintained that such assignment has empirical meaning and does not
need to satisfy any additional constraints. Nonetheless, Stevens
probably did not embrace an anti-realist view about psychological
attributes. Instead, there are good reasons to think that he
understood operationalism as a methodological attitude that was
valuable to the extent that it allowed psychologists to justify the
conclusions they drew from experiments (Feest 2005). For example,
Stevens did not treat operational definitions as *a priori*
but as amenable to improvement in light of empirical discoveries,
implying that he took psychological attributes to exist independently
of such definitions (Stevens 1935: 527). This suggests that
Stevens’ operationalism was of a more moderate variety than
that found in the early writings of
Bridgman.^{[9]}

Operationalism met with initial enthusiasm by logical positivists,
who viewed it as akin to verificationism. Nonetheless, it was soon
revealed that any attempt to base a theory of meaning on
operationalist principles was riddled with problems. Among such
problems were the automatic reliability operationalism conferred on
measurement operations, the ambiguities surrounding the notion of
operation, the overly restrictive operational criterion of
meaningfulness, and the fact that many useful theoretical concepts
lack clear operational definitions (Chang
2009).^{[10]} In
particular, Carl Hempel (1956, 1966) criticized operationalists for
being unable to define dispositional terms such as “solubility
in water”, and for multiplying the number of scientific
concepts in a manner that runs against the need for systematic and
simple theories. Accordingly, most writers on the semantics of
quantity-terms have avoided espousing an operational
analysis.^{[11]}

A more widely advocated approach admitted a conventional element to
the use of quantity-terms, while resisting attempts to reduce the
meaning of quantity terms to measurement operations. These accounts
are classified under the general heading
“conventionalism”, though they differ in the particular
aspects of measurement they deem conventional and in the degree of
arbitrariness they ascribe to such
conventions.^{[12]}
An early precursor of conventionalism
was Ernst Mach, who examined the notion of equality among temperature
intervals (1896: 52). Mach noted that different types of thermometric
fluid expand at different (and nonlinearly related) rates when
heated, raising the question: which fluid expands most uniformly with
temperature? According to Mach, there is no fact of the matter as to
which fluid expands more uniformly, since the very notion of equality
among temperature intervals has no determinate application prior to a
conventional choice of standard thermometric fluid. Mach coined the
term “principle of coordination” for this sort of
conventionally chosen principle for the application of a quantity
concept. The concepts of uniformity of time and space received
similar treatments by Henri Poincaré (1898, 1902: Part
2). Poincaré argued that procedures used to determine equality
among durations stem from scientists’ unconscious preference
for descriptive simplicity, rather than from any fact about
nature. Similarly, scientists’ choice to represent space with
either Euclidean or non-Euclidean geometries is not determined by
experience but by considerations of convenience.

Conventionalism with respect to measurement reached its most
sophisticated expression in logical positivism. Logical positivists
like Hans Reichenbach and Rudolf Carnap proposed “coordinative
definitions” or “correspondence rules” as the
semantic link between theoretical and observational
terms. These *a priori*, definition-like statements were
intended to regulate the use of theoretical terms by connecting them
with empirical procedures (Reichenbach 1927: 14–19; Carnap
1966: Ch. 24). An example of a coordinative definition is the
statement: “a measuring rod retains its length when
transported”. According to Reichenbach, this statement cannot
be empirically verified, because a universal and experimentally
undetectable force could exist that equally distorts every
object’s length when it is transported. In accordance with
verificationism, statements that are unverifiable are neither true
nor false. Instead, Reichenbach took this statement to expresses an
arbitrary rule for regulating the use of the concept of equality of
length, namely, for determining whether particular instances of
length are equal (Reichenbach 1927: 16). At the same time,
coordinative definitions were not seen as replacements, but rather as
necessary additions, to the familiar sort of theoretical definitions
of concepts in terms of other concepts (1927: 14). Under the
conventionalist viewpoint, then, the specification of measurement
operations did not exhaust the meaning of concepts such as length or
length-equality, thereby avoiding many of the problems associated
with operationalism.^{[13]}

## 5. Realist Accounts of Measurement

Realists about measurement maintain that measurement is best
understood as the empirical estimation of an objective property or
relation. A few clarificatory remarks are in order with respect to
this characterization of measurement. First, the term
“objective” is not meant to exclude mental properties or
relations, which are the objects of psychological measurement. Rather,
measurable properties or relations are taken to be objective inasmuch
as they are independent of the beliefs and conventions of the humans
performing the measurement and of the methods used for measuring. For
example, a realist would argue that the ratio of the length of a given
solid rod to the standard meter has an objective value regardless of
whether and how it is measured. Second, the term
“estimation” is used by realists to highlight the fact
that measurement results are mere *approximations* of true
values (Trout 1998: 46). Third, according to realists, measurement is
aimed at obtaining knowledge about properties and relations, rather
than at assigning values directly to individual objects. This is
significant because observable objects (e.g., levers, chemical
solutions, humans) often instantiate measurable properties and
relations that are not directly observable (e.g., amount of mechanical
work, more acidic than, intelligence). Knowledge claims about such
properties and relations must presuppose some background theory. By
shifting the emphasis from objects to properties and relations,
realists highlight the theory-laden character of measurements.

Realism about measurement should not be confused with realism about entities (e.g., electrons). Nor does realism about measurement necessarily entail realism about properties (e.g., temperature), since one could in principle accept only the reality of relations (e.g., ratios among quantities) without embracing the reality of underlying properties. Nonetheless, most philosophers who have defended realism about measurement have done so by arguing for some form of realism about properties (Byerly and Lazara 1973; Swoyer 1987; Mundy 1987; Trout 1998, 2000). These realists argue that at least some measurable properties exist independently of the beliefs and conventions of the humans who measure them, and that the existence and structure of these properties provides the best explanation for key features of measurement, including the usefulness of numbers in expressing measurement results and the reliability of measuring instruments.

For example, a typical realist about length measurement would argue that the empirical regularities displayed by individual objects’ lengths when they are ordered and concatenated are best explained by assuming that length is an objective property that has an extensive structure (Swoyer 1987: 271–4). That is, relations among lengths such as “longer than” and “sum of” exist independently of whether any objects happen to be ordered and concatenated by humans, and indeed independently of whether objects of some particular length happen to exist at all. The existence of an extensive property structure means that lengths share much of their structure with the positive real numbers, and this explains the usefulness of the positive reals in representing lengths. Moreover, if measurable properties are analyzed in dispositional terms, it becomes easy to explain why some measuring instruments are reliable. For example, if one assumes that a certain amount of electric current in a wire entails a disposition to deflect an ammeter needle by a certain angle, it follows that the ammeter’s indications counterfactually depend on the amount of electric current in the wire, and therefore that the ammeter is reliable (Trout 1998: 65).

A different argument for realism about measurement is due to Joel
Michell (1994, 2005), who proposes a realist theory of number based on
the Euclidean concept of ratio. According to Michell, numbers are
ratios between quantities, and therefore exist in space and time.
Specifically, *real* numbers are ratios between pairs of
infinite standard sequences, e.g., the sequence of lengths normally
denoted by “1 meter”, “2 meters” etc. and the
sequence of whole multiples of the length we are trying to measure.
Measurement is the discovery and estimation of such ratios. An
interesting consequence of this empirical realism about numbers is
that measurement is not a representational activity, but rather the
activity of approximating mind-independent numbers (Michell 1994:
400).

Realist accounts of measurement are largely formulated in
opposition to strong versions of operationalism and conventionalism,
which dominated philosophical discussions of measurement from the
1930s until the 1960s. In addition to the drawbacks of operationalism
already discussed in the previous section, realists point out that
anti-realism about measurable quantities fails to make sense of
scientific practice. If quantities had no real values independently
of one’s choice of measurement procedure, it would be difficult
to explain what scientists mean by “measurement accuracy”
and “measurement error”, and why they try to increase
accuracy and diminish error. By contrast, realists can easily make
sense of the notions of accuracy and error in terms of the distance
between real and measured values (Byerly and Lazara 1973: 17–8;
Swoyer 1987: 239; Trout 1998: 57). A closely related point is the
fact that newer measurement procedures tend to improve on the
accuracy of older ones. If choices of measurement procedure were
merely conventional it would be difficult to make sense of such
progress. In addition, realism provides an intuitive explanation for
why different measurement procedures often yield similar results,
namely, because they are sensitive to the same facts (Swoyer 1987:
239; Trout 1998: 56). Finally, realists note that the construction of
measurement apparatus and the analysis of measurement results are
guided by theoretical assumptions concerning causal relationships
among quantities. The ability of such causal assumptions to guide
measurement suggests that quantities are ontologically prior to the
procedures that measure
them.^{[14]}

While their stance towards operationalism and conventionalism is
largely critical, realists are more charitable in their assessment of
mathematical theories of measurement. Brent Mundy (1987) and Chris
Swoyer (1987) both accept the axiomatic treatment of measurement
scales, but object to the empiricist interpretation given to the
axioms by prominent measurement theorists like Campbell (1920) and
Ernest Nagel (1931; Cohen and Nagel 1934: Ch. 15). Rather than
interpreting the axioms as pertaining to concrete objects or to
observable relations among such objects, Mundy and Swoyer reinterpret
the axioms as pertaining to universal magnitudes, e.g., to the
universal property of being 5 meter long rather than to the concrete
instantiations of that property. This construal preserves the
intuition that statements like “the size of *x* is twice
the size of *y*” are first and foremost about
two *sizes*, and only derivatively about the
objects *x* and *y* themselves (Mundy 1987:
34).^{[15]} Mundy
and Swoyer argue that their interpretation is more general, because
it logically entails all the first-order consequences of the
empiricist interpretation along with additional, second-order claims
about universal magnitudes. Moreover, under their interpretation
measurement theory becomes a genuine scientific theory, with
explanatory hypotheses and testable predictions. Despite these
virtues, the realist interpretation has been largely ignored in the
wider literature on measurement theory.

## 6. Information-Theoretic Accounts of Measurement

Information-theoretic accounts of measurement are based on an analogy between measuring systems and communication systems. In a simple communication system, a message (input) is encoded into a signal at the transmitter’s end, sent to the receiver’s end, and then decoded back (output). The accuracy of the transmission depends on features of the communication system as well as on features of the environment, i.e., the level of background noise. Similarly, measuring instruments can be thought of as “information machines” (Finkelstein 1977) that interact with an object in a given state (input), encode that state into an internal signal, and convert that signal into a reading (output). The accuracy of a measurement similarly depends on the instrument as well as on the level of noise in its environment. Conceived as a special sort of information transmission, measurement becomes analyzable in terms of the conceptual apparatus of information theory (Hartley 1928; Shannon 1948; Shannon and Weaver 1949). For example, the information that reading \(y_i\) conveys about the occurrence of a state \(x_k\) of the object can be quantified as \(\log \left[\frac{p(x_k \mid y_i)}{p(x_k)}\right]\), namely as a function of the decrease of uncertainty about the object’s state (Finkelstein 1975: 222; for alternative formulations see Brillouin 1962: Ch. 15; Kirpatovskii 1974; and Mari 1999: 185).

Ludwik Finkelstein (1975, 1977) and Luca Mari (1999) suggested the possibility of a synthesis between Shannon-Weaver information theory and measurement theory. As they argue, both theories centrally appeal to the idea of mapping: information theory concerns the mapping between symbols in the input and output messages, while measurement theory concerns the mapping between objects and numbers. If measurement is taken to be analogous to symbol-manipulation, then Shannon-Weaver theory could provide a formalization of the syntax of measurement while measurement theory could provide a formalization of its semantics. Nonetheless, Mari (1999: 185) also warns that the analogy between communication and measurement systems is limited. Whereas a sender’s message can be known with arbitrary precision independently of its transmission, the state of an object cannot be known with arbitrary precision independently of its measurement.

Information-theoretic accounts of measurement were originally developed by metrologists with little involvement from philosophers. Metrology, officially defined as the “science of measurement and its application” (JCGM 2012: 2.2), is a field of study concerned with the design, maintenance and improvement of measuring instruments in the natural sciences and engineering. Metrologists typically work at standardization bureaus or at specialized laboratories that are responsible for the calibration of measurement equipment, the comparison of standards and the evaluation of measurement uncertainties, among other tasks. It is only recently that philosophers have begun to engage with the rich conceptual issues underlying metrological practice, and particularly with the inferences involved in evaluating and improving the accuracy of measurement standards (Chang 2004; Boumans 2005a: Chap. 5, 2005b, 2007a; Frigerio et al. 2010; Tal forthcoming-a; Teller 2013a,b; Riordan 2014). Further philosophical work is required to explore the assumptions and consequences of information-theoretic accounts of measurement, their implications for metrological practice, and their connections with other accounts of measurement.

Independently of developments in metrology, Bas van Fraassen (2008:
141–185) has recently proposed a conception of measurement in
which information plays a key role. He views measurement as composed
of two levels: on the physical level, the measuring apparatus
interacts with an object and produces a reading, e.g., a pointer
position.^{[16]}
On the abstract level, background theory represents the
object’s possible states on a parameter space. Measurement
locates an object on a sub-region of this abstract parameter space,
thereby reducing the range of possible states (2008: 164 and
172). This reduction of possibilities amounts to the collection of
information about the measured object. Van Fraassen’s analysis
of measurement differs from information-theoretic accounts developed
in metrology in its explicit appeal to background theory, and in the
fact that it does not invoke the symbolic conception of information
developed by Shannon and Weaver.

## 7. Model-Based Accounts of Measurement

Since the early 2000s a new wave of philosophical scholarship has emerged that emphasizes the relationships between measurement and theoretical and statistical modeling. According to model-based accounts, measurement consists of two levels: (i) a concrete process involving interactions between an object of interest, an instrument, and the environment; and (ii) a theoretical and/or statistical model of that process, where “model” denotes an abstract and local representation constructed from simplifying assumptions. The central goal of measurement according to this view is to assign values to one or more parameters of interest in the model in a manner that satisfies certain epistemic desiderata, in particular coherence and consistency.

A central motivation for the development of model-based accounts is the attempt to clarify the epistemological principles underlying aspects of measurement practice. For example, metrologists employ a variety of methods for the calibration of measuring instruments, the standardization and tracing of units and the evaluation of uncertainties (for a discussion of metrology, see the previous section). Traditional philosophical accounts such as mathematical theories of measurement do not elaborate on the assumptions, inference patterns, evidential grounds or success criteria associated with such methods. As Frigerio et al. (2010) argue, measurement theory is ill-suited for clarifying these aspects of measurement because it abstracts away from the process of measurement and focuses solely on the mathematical properties of scales. By contrast, model-based accounts take scale construction to be merely one of several tasks involved in measurement, alongside the definition of measured parameters, instrument design and calibration, object sampling and preparation, error detection and uncertainty evaluation, among others (2010: 145–7).

### 7.1 The roles of models in measurement

According to model-based accounts, measurement involves interaction between an object of interest (the “system under measurement”), an instrument (the “measurement system”) and an environment, which includes the measuring subjects. Other, secondary interactions may also be relevant for the determination of a measurement outcome, such as the interaction between the measuring instrument and the reference standards used for its calibration, and the chain of comparisons that trace the reference standard back to primary measurement standards (Mari 2003: 25). Measurement proceeds by representing these interactions with a set of parameters, and assigning values to a subset of those parameters (known as “measurands”) based on the results of the interactions. When measured parameters are numerical they are called “quantities”. Although measurands need not be quantities, a quantitative measurement scenario will be supposed in what follows.

Two sorts of measurement outputs are distinguished by model-based accounts [JCGM 2012: 2.9 & 4.1; Giordani and Mari 2012: 2146; Tal 2013]:

**Instrument indications**(or “readings”): these are properties of the measuring instrument in its final state after the measurement process is complete. Examples are digits on a display, marks on a multiple-choice questionnaire and bits stored in a device’s memory. Indications may be represented by numbers, but such numbers describe states of the instrument and should not be confused with measurement outcomes, which concern states of the object being measured.**Measurement outcomes**(or “results”): these are knowledge claims about the values of one or more quantities attributed to the object being measured, and are typically accompanied by a specification of the measurement unit and scale and an estimate of measurement uncertainty. For example, a measurement outcome may be expressed by the sentence “the mass of object*a*is 20±1 grams with a probability of 68%”.

As proponents of model-based accounts stress, inferences from instrument indications to measurement outcomes are nontrivial and depend on a host of theoretical and statistical assumptions about the object being measured, the instrument, the environment and the calibration process. Measurement outcomes are often obtained through statistical analysis of multiple indications, thereby involving assumptions about the shape of the distribution of indications and the randomness of environmental effects (Bogen and Woodward 1988: 307–310). Measurement outcomes also incorporate corrections for systematic effects, and such corrections are based on theoretical assumptions concerning the workings of the instrument and its interactions with the object and environment. For example, length measurements need to be corrected for the change of the measuring rod’s length with temperature, a correction which is derived from a theoretical equation of thermal expansion. Systematic corrections involve uncertainties of their own, for example in the determination of the values of constants, and these uncertainties are assessed through secondary experiments involving further theoretical and statistical assumptions. Moreover, the uncertainty associated with a measurement outcome depends on the methods employed for the calibration of the instrument. Calibration involves additional assumptions about the instrument, the calibrating apparatus, the quantity being measured and the properties of measurement standards (Rothbart and Slayden 1994; Franklin 1997; Baird 2004: Ch. 4; Soler et al. 2011). Another component of uncertainty originates from vagueness in the definition of the measurand, and is known as “definitional uncertainty” (Mari and Giordani 2013). Finally, measurement involves background assumptions about the scale type and unit system being used, and these assumptions are often tied to broader theoretical and technological considerations relating to the definition and realization of scales and units.

These various theoretical and statistical assumptions form the
basis for the construction of one or more models of the measurement
process. Unlike mathematical theories of measurement, where the term
“model” denotes a set-theoretical structure that
interprets a formal language, here the term “model”
denotes an abstract and local representation of a target system that
is constructed from simplifying
assumptions.^{[17]}
The relevant target system in this case
is a measurement process, that is, a system composed of a measuring
instrument, objects or events to be measured, the environment
(including human operators), secondary instruments and reference
standards, the time-evolution of these components, and their various
interactions with each other. Measurement is viewed as a set of
procedures whose aim is to coherently assign values to model
parameters based on instrument indications. Models are therefore seen
as necessary preconditions for the possibility of inferring
measurement outcomes from instrument indications, and as crucial for
determining the content of measurement outcomes. As proponents of
model-based accounts emphasize, the same indications produced by the
same measurement process may be used to establish different
measurement outcomes depending on how the measurement process is
modeled, e.g., depending on which environmental influences are taken
into account, which statistical assumptions are used to analyze
noise, and which approximations are used in applying background
theory. As Luca Mari puts it,

any measurement result reports information that is meaningful only in the context of a metrological model, such a model being required to include a specification for all the entities that explicitly or implicitly appear in the expression of the measurement result. (2003: 25)

Similarly, models are said to provide the necessary context for evaluating various aspects of the goodness of measurement outcomes, including accuracy, precision, error and uncertainty (Boumans 2006, 2007a, 2009, 2012b; Mari 2005b).

Model-based accounts diverge from empiricist interpretations of
measurement theory in that they do not require relations among
measurement outcomes to be isomorphic or homomorphic to observable
relations among the items being measured (Mari 2000). Indeed,
according to model-based accounts relations among measured objects
need not be observable at all prior to their measurement (Frigerio et
al. 2010: 125). Instead, the key normative requirement of model-based
accounts is that values be assigned to model parameters in a coherent
manner. The coherence criterion may be viewed as a conjunction of two
sub-criteria: (i) coherence of model assumptions with relevant
background theories or other substantive presuppositions about the
quantity being measured; and (ii) objectivity, i.e., the mutual
consistency of measurement outcomes across different measuring
instruments, environments and
models^{[18]}
(Frigerio et al. 2010; Teller 2013b;
Tal forthcoming-b). The first sub-criterion is meant to ensure that
the *intended* quantity is being measured, while the second
sub-criterion is meant to ensure that measurement outcomes can be
reasonably attributed to the measured *object* rather than to
some artifact of the measuring instrument, environment or
model. Taken together, these two requirements ensure that measurement
outcomes remain valid independently of the specific assumptions
involved in their production, and hence that the context-dependence
of measurement outcomes does not threaten their general
applicability.

### 7.2 Models and measurement in economics

Besides their applicability to physical measurement, model-based
analyses also shed light on measurement in economics. Like physical
quantities, values of economic variables often cannot be observed
directly and must be inferred from observations based on abstract and
idealized models. The nineteenth century economist William Jevons,
for example, measured changes in the value of gold by postulating
certain causal relationships between the value of gold, the supply of
gold and the general level of prices (Hoover and Dowell 2001:
155–159; Morgan 2001: 239). As Julian Reiss (2001) shows,
Jevons’ measurements were made possible by using two models: a
causal-theoretical model of the economy, which is based on the
assumption that the quantity of gold has the capacity to raise or
lower prices; and a statistical model of the data, which is based on
the assumption that local variations in prices are mutually
independent and therefore cancel each other out when averaged. Taken
together, these models allowed Jevons to infer the change in the
value of gold from data concerning the historical prices of various
goods.^{[19]}

The ways in which models function in economic measurement have led some philosophers to view certain economic models as measuring instruments in their own right, analogously to rulers and balances (Boumans 1999, 2005c, 2006, 2007a, 2009, 2012a; Morgan 2001). Marcel Boumans explains how macroeconomists are able to isolate a variable of interest from external influences by tuning parameters in a model of the macroeconomic system. This technique frees economists from the impossible task of controlling the actual system. As Boumans argues, macroeconomic models function as measuring instruments insofar as they produce invariant relations between inputs (indications) and outputs (outcomes), and insofar as this invariance can be tested by calibration against known and stable facts.

### 7.3 Psychometric models and construct validity

Another area where models play a central role in measurement is psychology. The measurement of most psychological attributes, such as intelligence, anxiety and depression, does not rely on homomorphic mappings of the sort espoused by the Representational Theory of Measurement (Wilson 2013: 3766). Instead, psychometric theory relies predominantly on the development of abstract models that are meant to predict subjects’ performance in certain tasks. These models are constructed from substantive and statistical assumptions about the psychological attribute being measured and its relation to each measurement task. For example, Item Response Theory, a popular approach to psychological measurement, employs a variety of models to evaluate the validity of questionnaires. Consider a questionnaire that is meant to assess English language comprehension (the “ability”), by presenting subjects with a series of yes/no questions (the “items”). One of the simplest models used to validate such questionnaires is the Rasch model (Rasch 1960). This model supposes a straightforward algebraic relation—known as the “log of the odds”—between the probability that a subject will answer a given item correctly, the difficulty of that particular item, and the subject’s ability. New questionnaires are calibrated by testing the fit between their indications and the predictions of the Rasch model and assigning difficulty levels to each item accordingly. The model is then used in conjunction with the questionnaire to infer levels of English language comprehension (outcomes) from raw questionnaire scores (indications) (Wilson 2013; Mari and Wilson 2014).

The sort of statistical calibration (or “scaling”)
provided by Rasch models yields repeatable results, but it is often
only a first step towards full-fledged psychological measurement.
Psychologists are typically interested in the results of a measure not
for its own sake, but for the sake of assessing some underlying and
latent psychological attribute. It is therefore desirable to be able
to test whether different measures, such as different questionnaires
or multiple controlled experiments, all measure the *same*
latent attribute. Such testing is known as “construct
validation”. A construct is an abstract representation of the
latent attribute intended to be measured, and

reflects a hypothesis […] that a variety of behaviors will correlate with one another in studies of individual differences and/or will be similarly affected by experimental manipulations. (Nunnally & Bernstein 1994: 85)

Constructs are denoted by variables in a model that predicts which correlations would be observed among the indications of different measures if they are indeed measures of the same attribute. Such models involve substantive assumptions about the attribute, including its internal structure and its relations to other attributes, and statistical assumptions about the correlation among different measures (Campbell & Fiske 1959; Nunnally & Bernstein 1994: Ch. 3; Angner 2008).

Several scholars have pointed out similarities between the ways
models are used to standardize measurable quantities in the natural
and social sciences. For example, Mark Wilson (2013) argues that
psychometric models can be viewed as tools for constructing
measurement standards in the same sense of “measurement
standard” used by metrologists. Others have raised doubts about
the feasibility and desirability of adopting the example of the
natural sciences when standardizing constructs in the social
sciences. As Anna Alexandrova (2008) points out, ethical
considerations bear on questions about construct validity no less
than considerations of reproducibility. Such ethical considerations
are context sensitive, and can only be applied piecemeal. Nancy
Cartwright and Rosa Runhardt (2014) make a similar point about
“Ballung” concepts, a term they borrow from Otto Neurath
to denote concepts with a fuzzy and context-dependent scope. Examples
of Ballung concepts are race, poverty, social exclusion, and the
quality of PhD programs. Such concepts are too multifaceted to be
measured on a single metric without loss of meaning, and must be
represented either by a matrix of indices or by several different
measures depending on which goals and values are at play (see also
Cartwright and Bradburn 2010). In a similar vein, Leah McClimans
(2010) argues that uniformity is not always an appropriate goal for
designing questionnaires, as the open-endedness of questions is often
both unavoidable and desirable for obtaining relevant information
from subjects.^{[20]}
These insights highlight the
interdependence between epistemic, pragmatic and ethical
considerations characteristic of the standardization of constructs in
the social sciences.

## 8. The Epistemology of Measurement

The development of model-based accounts discussed in the previous section is part of a larger, “epistemic turn” in the philosophy of measurement that occurred in the early 2000s. Rather than emphasizing the mathematical foundations, metaphysics or semantics of measurement, philosophical work in recent years tends to focus on the presuppositions and inferential patterns involved in concrete practices of measurement, and on the historical, social and material dimensions of measuring. The philosophical study of these topics has been referred to as the “epistemology of measurement” (Mari 2003, 2005a; Leplège 2003; Tal forthcoming-b). In the broadest sense, the epistemology of measurement is the study of the relationships between measurement and knowledge. Central topics that fall under the purview of the epistemology of measurement include the conditions under which measurement produces knowledge; the content, scope, justification and limits of such knowledge; the reasons why particular methodologies of measurement and standardization succeed or fail in supporting particular knowledge claims, and the relationships between measurement and other knowledge-producing activities such as observation, theorizing, experimentation, modelling and calculation. In pursuing these objectives, philosophers are drawing on the work of historians and sociologists of science, who have been investigating measurement practices for a longer period (Wise and Smith 1986; Latour 1987: Ch. 6; Schaffer 1992; Porter 1995, 2007; Wise 1995; Alder 2002; Galison 2003; Gooday 2004; Crease 2011), as well as on the history and philosophy of scientific experimentation (Harré 1981; Hacking 1983; Franklin 1986; Cartwright 1999). The following subsections survey some of the topics discussed in this burgeoning body of literature.

### 8.1 Standardization and scientific progress

A topic that has attracted considerable philosophical attention in
recent years is the selection and improvement of measurement
standards. Generally speaking, to standardize a quantity concept is
to prescribe a determinate way in which that concept is to be applied
to concrete
particulars.^{[21]}
To standardize a measuring instrument is
to assess how well the outcomes of measuring with that instrument fit
the prescribed mode of application of the relevant concept.
^{[22]}
The term “measurement standard” accordingly has at least
two meanings: on the one hand, it is commonly used to refer to
abstract rules and definitions that regulate the use of quantity
concepts, such as the definition of the meter. On the other hand, the
term “measurement standard” is also commonly used to refer
to the concrete artifacts and procedures that are deemed exemplary of
the application of a quantity concept, such as the metallic bar that
served as the standard meter until 1960. This duality in meaning
reflects the dual nature of standardization, which involves both
abstract and concrete aspects.

In Section 4 it was noted that
standardization involves choices among nontrivial alternatives, such
as the choice among different thermometric fluids or among different
ways of marking equal duration. These choices are nontrivial in the
sense that they affect whether or not the same temperature (or time)
intervals are deemed equal, and hence affect whether or not statements
of natural law containing the term “temperature” (or
“time”) come out true. Appealing to theory to decide
which standard is more accurate would be circular, since the theory
cannot be determinately applied to particulars prior to a choice of
measurement standard. This circularity has been variously called the
“problem of coordination” (van Fraassen 2008: Ch. 5) and
the “problem of nomic measurement” (Chang 2004: Ch. 2). As
already mentioned, conventionalists attempted to escape the
circularity by positing *a priori* statements, known as
“coordinative definitions”, which were supposed to link
quantity-terms with specific measurement operations. A drawback of
this solution is that it supposes that choices of measurement standard
are arbitrary and static, whereas in actual practice measurement
standards tend to be chosen based on empirical considerations and are
eventually improved or replaced with standards that are deemed more
accurate.

A new strand of writing on the problem of coordination has emerged in recent years, consisting most notably of the works of Hasok Chang (2001, 2004, 2007) and Bas van Fraassen (2008: Ch. 5; 2009, 2012). These works take a historical and coherentist approach to the problem. Rather than attempting to avoid the problem of circularity completely, as their predecessors did, they set out to show that the circularity is not vicious. Chang argues that constructing a quantity-concept and standardizing its measurement are co-dependent and iterative tasks. Each “epistemic iteration” in the history of standardization respects existing traditions while at the same time correcting them (Chang 2004: Ch. 5). The pre-scientific concept of temperature, for example, was associated with crude and ambiguous methods of ordering objects from hot to cold. Thermoscopes, and eventually thermometers, helped modify the original concept and made it more precise. With each such iteration the quantity concept was re-coordinated to a more stable set of standards, which in turn allowed theoretical predictions to be tested more precisely, facilitating the subsequent development of theory and the construction of more stable standards, and so on.

How this process avoids vicious circularity becomes clear when we
look at it either “from above”, i.e., in retrospect given
our current scientific knowledge, or “from within”, by
looking at historical developments in their original context (van
Fraassen 2008: 122). From either vantage point, coordination succeeds
because it increases coherence among elements of theory and
instrumentation. The questions “what counts as a measurement of
quantity *X*?” and “what is
quantity *X*?”, though unanswerable independently of each
other, are addressed together in a process of mutual refinement. It is
only when one adopts a foundationalist view and attempts to find a
starting point for coordination free of presupposition that this
historical process erroneously appears to lack epistemic justification
(2008: 137).

The new literature on coordination shifts the emphasis of the
discussion from the definitions of quantity-terms to
the *realizations* of those definitions. In metrological
jargon, a “realization” is a physical instrument or
procedure that approximately satisfies a given definition (cf. JCGM
2012: 5.1). Examples of metrological realizations are the official
prototypes of the kilogram and the cesium fountain clocks used to
standardize the second. Recent studies suggest that the methods used
to design, maintain and compare realizations have a direct bearing on
the practical application of concepts of quantity, unit and scale, no
less than the definitions of those concepts (Tal
forthcoming-a; Riordan 2014).

### 8.2 Theory-ladenness of measurement

As already discussed above (Sections 7 and 8.1), theory and measurement are interdependent both historically and conceptually. On the historical side, the development of theory and measurement proceeds through iterative and mutual refinements. On the conceptual side, the specification of measurement procedures shapes the empirical content of theoretical concepts, while theory provides a systematic interpretation for the indications of measuring instruments. This interdependence of measurement and theory may seem like a threat to the evidential role that measurement is supposed to play in the scientific enterprise. After all, measurement outcomes are thought to be able to test theoretical hypotheses, and this seems to require some degree of independence of measurement from theory. This threat is especially clear when the theoretical hypothesis being tested is already presupposed as part of the model of the measuring instrument. To cite an example from Franklin et al. (1989: 230):

There would seem to be, at first glance, a vicious circularity if one were to use a mercury thermometer to measure the temperature of objects as part of an experiment to test whether or not objects expand as their temperature increases.

Nonetheless, Franklin et al. conclude that the circularity is not vicious. The mercury thermometer could be calibrated against another thermometer whose principle of operation does not presuppose the law of thermal expansion, such as a constant-volume gas thermometer, thereby establishing the reliability of the mercury thermometer on independent grounds. To put the point more generally, in the context of local hypothesis-testing the threat of circularity can usually be avoided by appealing to other kinds of instruments and other parts of theory.

A different sort of worry about the evidential function of measurement arises on the global scale, when the testing of entire theories is concerned. As Thomas Kuhn (1961) argues, scientific theories are usually accepted long before quantitative methods for testing them become available. The reliability of newly introduced measurement methods is typically tested against the predictions of the theory rather than the other way around. In Kuhn’s words, “The road from scientific law to scientific measurement can rarely be traveled in the reverse direction” (1961: 189). For example, Dalton’s Law, which states that the weights of elements in a chemical compound are related to each other in whole-number proportions, initially conflicted with some of the best known measurements of such proportions. It is only by assuming Dalton’s Law that subsequent experimental chemists were able to correct and improve their measurement techniques (1961: 173). Hence, Kuhn argues, the function of measurement in the physical sciences is not to test the theory but to apply it with increasing scope and precision, and eventually to allow persistent anomalies to surface that would precipitate the next crisis and scientific revolution. Note that Kuhn is not claiming that measurement has no evidential role to play in science. Instead, he argues that measurements cannot test a theory in isolation, but only by comparison to some alternative theory that is proposed in an attempt to account for the anomalies revealed by increasingly precise measurements (for an illuminating discussion of Kuhn’s thesis see Hacking 1983: 243–5).

Traditional discussions of theory-ladenness, like those of Kuhn,
were conducted against the background of the logical
positivists’ distinction between theoretical and observational
language. The theory-ladenness of measurement was correctly perceived
as a threat to the possibility of a clear demarcation between the two
languages. Contemporary discussions, by contrast, no longer present
theory-ladenness as an epistemological threat but take for granted
that some level of theory-ladenness is a prerequisite for measurements
to have any evidential power. Without some minimal substantive
assumptions about the quantity being measured, such as its amenability
to manipulation and its relations to other quantities, it would be
impossible to interpret the indications of measuring instruments and
hence impossible to ascertain the evidential relevance of those
indications. This point was already made by Pierre Duhem (1906:
153–6; see also Carrier 1994: 9–19). Moreover,
contemporary authors emphasize that theoretical assumptions play
crucial roles in correcting for measurement errors and evaluating
measurement uncertainties. Indeed, physical measurement procedures
become *more* accurate when the model underlying them is
de-idealized, a process which involves increasing the theoretical
richness of the model (Tal 2011).

The acknowledgment that theory is crucial for guaranteeing the
evidential reliability of measurement draws attention to the
“problem of observational grounding”, which is an inverse
challenge to the traditional threat of theory-ladenness (Tal 2013:
1168). The challenge is to specify what role *observation*
plays in measurement, and particularly what sort of connection with
observation is necessary and/or sufficient to allow measurement to
play an evidential role in the sciences. This problem is especially
clear when one attempts to account for the increasing use of
computational methods for performing tasks that were traditionally
accomplished by measuring instruments. As Margaret Morrison (2009)
and Wendy Parker (forthcoming) argue, there are cases where reliable
quantitative information is gathered about a target system with the
aid of a computer simulation, but in a manner that satisfies some of
the central desiderata for measurement such as being empirically
grounded and backward-looking. Such information does not rely on
signals transmitted from the particular object of interest to the
instrument, but on the use of theoretical and statistical models to
process empirical data about related objects. For example, data
assimilation methods are customarily used to estimate past
atmospheric temperatures in regions where thermometer readings are
not available. Some methods do this by fitting a computational model
of the atmosphere’s behavior to a combination of available data
from nearby regions and a model-based forecast of conditions at the
time of observation (Parker forthcoming). These estimations are then
used in various ways, including as data for evaluating
forward-looking climate models. Regardless of whether one calls
these estimations “measurements”, they challenge the idea
that producing reliable quantitative evidence about the state of an
object requires observing that object, however loosely one
understands the term
“observation”.^{[23]}

### 8.3 Accuracy and precision

Two key aspects of the reliability of measurement outcomes are
accuracy and precision. Consider a series of repeated weight
measurements performed on a particular object with an equal-arms
balance. From a realist, “error-based” perspective, the
outcomes of these measurements are *accurate* if they are close
to the true value of the quantity being measured—in our case,
the true ratio of the object’s weight to the chosen
unit—and *precise* if they are close to each other. An
analogy often cited to clarify the error-based distinction is that of
arrows shot at a target, with accuracy analogous to the closeness of
hits to the bull’s eye and precision analogous to the tightness
of spread of hits (cf. JCGM 2012: 2.13 & 2.15, Teller 2013a:
192). Though intuitive, the error-based way of carving the distinction
raises an epistemological difficulty. It is commonly thought that the
exact true values of most quantities of interest to science are
unknowable, at least when those quantities are measured on continuous
scales. If this assumption is granted, the accuracy with which such
quantities are measured cannot be known with exactitude, but only
estimated by comparing inaccurate measurements to each other. And yet
it is unclear why convergence among inaccurate measurements should be
taken as an indication of truth. After all, the measurements could be
plagued by a common bias that prevents their individual inaccuracies
from cancelling each other out when averaged. In the absence of
cognitive access to true values, how is the evaluation of measurement
accuracy possible?

In answering this question, philosophers have benefited from studying the various senses of the term “measurement accuracy” as used by practicing scientists. At least five different senses have been identified: metaphysical, epistemic, operational, comparative and pragmatic (Tal 2011: 1084–5). In particular, the epistemic or “uncertainty-based” sense of the term is metaphysically neutral and does not presuppose the existence of true values. Instead, the accuracy of a measurement outcome is taken to be the closeness of agreement among values reasonably attributed to a quantity given available empirical data and background knowledge (cf. JCGM 2012: 2.13 Note 3; Giordani & Mari 2012). Thus construed, measurement accuracy can be evaluated by establishing robustness among the consequences of models representing different measurement processes.

Under the uncertainty-based conception, imprecision is a special type of inaccuracy. For example, the inaccuracy of weight measurements is the breadth of spread of values that are reasonably attributed to the object’s weight given the indications of the balance and available background knowledge about the way the balance works and the standard weights used. The imprecision of these measurements is the component of inaccuracy arising from uncontrolled variations to the indications of the balance over repeated trials. Other sources of inaccuracy besides imprecision include imperfect corrections to systematic errors, inaccurately known physical constants, and vague measurand definitions, among others (see Section 7.1).

Paul Teller (2013b) raises a different objection to the error-based
conception of measurement accuracy. He argues against an assumption he
calls “measurement accuracy realism”, according to which
measurable quantities have definite values in reality. Teller argues
that this assumption is false insofar as it concerns the quantities
habitually measured in physics, because any specification of definite
values (or value ranges) for such quantities involves idealization and
hence cannot refer to anything in reality. For example, the concept
usually understood by the phrase “the velocity of sound in
air” involves a host of implicit idealizations concerning the
uniformity of the air’s chemical composition, temperature and
pressure as well as the stability of units of measurement. Removing
these idealizations completely would require adding infinite amount of
detail to each specification. As Teller argues, measurement accuracy
should itself be understood as a useful idealization, namely as a
concept that allows scientists to assess coherence and consistency
among measurement outcomes *as if* the linguistic expression of
these outcomes latched onto anything in the world. Precision is
similarly an idealized concept, which is based on an open-ended and
indefinite specification of what counts as repetition of measurement
under “the same” circumstances (Teller 2013a: 194).

## Bibliography

- Alder, K., 2002,
*The Measure of All Things: The Seven-Year Odyssey and Hidden Error That Transformed the World*, New York: The Free Press. - Alexandrova, A., 2008, “First Person Reports and the
Measurement of Happiness”,
*Philosophical Psychology*, 21(5): 571–583. - Angner, E., 2008, “The Philosophical Foundations of
Subjective Measures of Well-Being”, in
*Capabilities and Happiness*, L. Bruni, F. Comim, and M. Pugno (eds.), Oxford: Oxford University Press. - –––, 2013, “Is it Possible to Measure
Happiness? The argument from measurability”,
*European Journal for Philosophy of Science*, 3: 221–240. - Aristotle,
*Categories*, in*The Complete Works of Aristotle*, Volume I, J. Barnes (ed.), Princeton: Princeton University Press, 1984. - Baird, D., 2004,
*Thing Knowledge: A Philosophy of Scientific Instruments*, Berkeley: University of California Press. - Bogen, J. and J. Woodward, 1988, “Saving the
Phenomena”,
*The Philosophical Review*, 97(3): 303–352. - Boring, E.G., 1945, “The use of operational definitions in science”, in Boring et al. 1945: 243–5.
- Boring, E.G., P.W. Bridgman, H. Feigl, H. Israel, C.C Pratt, and
B.F. Skinner, 1945, “Symposium on Operationism”,
*The Psychological Review*, 52: 241–294. - Boumans, M., 1999, “Representation and Stability in Testing
and Measuring Rational Expectations”,
*Journal of Economic Methodology*, 6(3): 381–401. - –––, 2005a,
*How Economists Model the World into Numbers*, New York: Routledge. - –––, 2005b, “Truth versus
Precision”, in
*Logic, Methodology and Philosophy of Science: Proceedings of the Twelfth International Congress*, P. Hájek, L. Valdés-Villanueva, and D. Westerstahl (eds.), London: College Publications, pp. 257–269. - –––, 2005c, “Measurement outside the
laboratory”,
*Philosophy of Science*, 72: 850–863. - –––, 2006, “The difference between
answering a ‘why’ question and answering a ‘how
much’ question”, in
*Simulation: Pragmatic Construction of Reality*, J. Lenhard, G Küppers, and T Shinn (eds.), Dordrecht: Springer, pp. 107–124. - –––, 2007a, “Invariance and Calibration”, in 2007: 231–248.
- ––– (ed.), 2007b,
*Measurement in Economics: A Handbook*, London: Elsevier. - –––, 2009, “Grey-Box Understanding in
Economics”, in
*Scientific Understanding: Philosophical Perspectives*, H.W. de Regt, S. Leonelli, and K. Eigner, Pittsburgh: University of Pittsburgh Press, pp. 210–229. - –––, 2012a, “Modeling Strategies for
Measuring Phenomena In- and Outside the Laboratory”, in
*EPSA Philosophy of Science: Amsterdam 2009*, H.W. de Regt, S. Hartmann, and S. Okasha (eds.), (The European Philosophy of Science Association Proceedings), Dordrecht: Springer, pp. 1–11. - –––, 2012b, “Measurement in
Economics”, in
*Philosophy of Economics*(Vol. 13 of Handbook of the Philosophy of Science), U. Mäki (ed.), Oxford: Elsevier, pp. 395–423. - Bridgman, P.W., 1927,
*The Logic of Modern Physics*, New York: Macmillan. - –––, 1938, “Operational
Analysis”,
*Philosophy of Science*, 5: 114–131. - –––, 1945, “Some General Principles of Operational Analysis”, in Boring et al. 1945: 246–249.
- –––, 1956, “The Present State of Operationalism”, in Frank 1956: 74–79.
- Brillouin, L., 1962,
*Science and information theory*, New York: Academic Press, 2nd edition. - Byerly, H.C. and V.A. Lazara, 1973, “Realist Foundations of
Measurement”,
*Philosophy of Science*, 40(1): 10–28. - Campbell, N.R., 1920,
*Physics: the Elements*, London: Cambridge University Press. - Campbell, D.T. and D.W. Fiske, 1959, “Convergent and
discriminant validation by the multitrait-multimethod
matrix”,
*Psychological Bulletin*, 56(2): 81–105. - Cantù, P. and O. Schlaudt (eds.), 2013, “The
Epistemological Thought of Otto Hölder”, special issue
of
*Philosophia Scientiæ*, 17(1). - Carnap, R., 1966,
*Philosophical foundations of physics*, G. Martin (ed.), reprinted as*An Introduction to the Philosophy of Science*, NY: Dover, 1995. - Carrier, M., 1994,
*The Completeness of Scientific Theories: On the Derivation of Empirical Indicators Within a Theoretical Framework: the Case of Physical Geometry*, The University of Western Ontario Series in Philosophy of Science Vol. 53, Dordrecht: Kluwer. - Cartwright, N.L., 1999,
*The Dappled World: A Study of the Boundaries of Science*, Cambridge: Cambridge University Press. - Cartwright, N.L. and N.M. Bradburn, 2010, “A Theory of
Measurement”,
URL=<https://www.dur.ac.uk/resources/philosophy/BradburnCartwrightThMeaspostSSFINAL.pdf>.
(A summary of this paper appears in R.M. Li (ed),
*The Importance of Common Metrics for Advancing Social Science Theory and Research: A Workshop Summary*, Washington, DC: National Academies Press, 2011, pp. 53–70.) - Cartwright, N.L. and R. Runhardt, 2014, “Measurement”,
in N.L. Cartwright and E. Montuschi (eds.),
*Philosophy of Social Science: A New Introduction*, Oxford: Oxford University Press, pp. 265–287. - Chang, H., 2001, “Spirit, air, and quicksilver: The search
for the ‘real’ scale of temperature”,
*Historical Studies in the Physical and Biological Sciences*, 31(2): 249–284. - –––, 2004,
*Inventing Temperature: Measurement and Scientific Progress*, Oxford: Oxford University Press. - –––, 2007, “Scientific Progress: Beyond
Foundationalism and Coherentism”,
*Royal Institute of Philosophy Supplement*, 61: 1–20. - –––, 2009, “Operationalism”,
*The Stanford Encyclopedia of Philosophy*(Fall 2009 Edition), E.N. Zalta (ed.), URL= <http://plato.stanford.edu/archives/fall2009/entries/operationalism/> - Chang, H. and N.L. Cartwright, 2008, “Measurement”,
in
*The Routledge Companion to Philosophy of Science*, S. Psillos and M. Curd (eds.), New York: Routledge, pp. 367–375. - Clagett, M., 1968,
*Nicole Oresme and the medieval geometry of qualities and motions*, Madison: University of Wisconsin Press. - Cohen, M.R. and E. Nagel, 1934,
*An introduction to logic and scientific method*, USA: Harcourt, Brace & World. - Crease, R.P., 2011,
*World in the Balance: The Historic Quest for an Absolute System of Measurement*, New York and London: W.W. Norton. - Darrigol, O., 2003, “Number and measure: Hermann von
Helmholtz at the crossroads of mathematics, physics, and
psychology”,
*Studies in History and Philosophy of Science Part A*, 34(3): 515–573. - Diehl, C.E., 2012,
*The Theory of Intensive Magnitudes in Leibniz and Kant*, PhD Dissertation, Princeton University. [Diehl 2012 available online] - Diez, J.A., 1997a, “A Hundred Years of Numbers. An
Historical Introduction to Measurement Theory
1887–1990—Part 1”,
*Studies in History and Philosophy of Science*, 28(1): 167–185. - –––, 1997b, “A Hundred Years of
Numbers. An Historical Introduction to Measurement Theory
1887–1990—Part 2”,
*Studies in History and Philosophy of Science*, 28(2): 237–265. - Dingle, H., 1950, “A Theory of Measurement”,
*The British Journal for the Philosophy of Science*, 1(1): 5–26. - Duhem, P., 1906,
*The Aim and Structure of Physical Theory*, P.P. Wiener (trans.), New York: Atheneum, 1962. - Ellis, B., 1966,
*Basic Concepts of Measurement*, Cambridge: Cambridge University Press. - Euclid,
*Elements*, in*The Thirteen Books of Euclid’s Elements*, T.L. Heath (trans.), Cambridge: Cambridge University Press, 1908. - Fechner, G., 1860,
*Elements of Psychophysics*, H.E. Adler (trans.), New York: Holt, Reinhart & Winston, 1966. - Feest, U., 2005, “Operationism in Psychology: What the
Debate Is About, What the Debate Should Be About”,
*Journal of the History of the Behavioral Sciences*, 41(2): 131–149. - Ferguson, A., C.S. Myers, R.J. Bartlett, H. Banister, F.C.
Bartlett, W. Brown, N.R. Campbell, K.J.W. Craik, J. Drever, J. Guild,
R.A. Houstoun, J.O. Irwin, G.W.C. Kaye, S.J.F. Philpott, L.F.
Richardson, J.H. Shaxby, T. Smith, R.H. Thouless, and W.S. Tucker,
1940, “Quantitative estimates of sensory
events”,
*Advancement of Science*, 2: 331–349. (The final report of a committee appointed by the British Association for the Advancement of Science in 1932 to consider the possibility of measuring intensities of sensation. See Michell 1999, Ch 6. for a detailed discussion.) - Finkelstein, L., 1975, “Representation by symbol systems as
an extension of the concept of
measurement”,
*Kybernetes*, 4(4): 215–223. - –––, 1977, “Introductory article”,
(instrument science),
*Journal of Physics E: Scientific Instruments*, 10(6): 566–572. - Frank, P.G. (ed.), 1956,
*The Validation of Scientific Theories*. Boston: Beacon Press. (Chapter 2, “The Present State of Operationalism” contains papers by H. Margenau, G. Bergmann, C.G. Hempel, R.B. Lindsay, P.W. Bridgman, R.J. Seeger, and A. Grünbaum) - Franklin, A., 1986,
*The Neglect of Experiment*, Cambridge: Cambridge University Press. - –––, 1997, “Calibration”,
*Perspectives on Science*, 5(1): 31–80. - Franklin, A., M. Anderson, D. Brock, S. Coleman, J. Downing, A.
Gruvander, J. Lilly, J. Neal, D. Peterson, M. Price, R. Rice,
L. Smith, S. Speirer, and D. Toering, 1989, “Can a Theory-Laden
Observation Test the Theory?”,
*The British Journal for the Philosophy of Science*, 40(2): 229–231. - Frigerio, A., A. Giordani, and L. Mari, 2010, “Outline of a
general model of measurement”,
*Synthese*, 175(2): 123–149. - Galison, P., 2003,
*Einstein’s Clocks, Poincaré’s Maps: Empires of Time*, New York and London: W.W. Norton. - Gillies, D.A., 1972,
“Operationalism”,
*Synthese*, 25(1): 1–24. - Giordani, A., and L. Mari, 2012, “Measurement, models, and
uncertainty”,
*IEEE Transactions on Instrumentation and Measurement*, 61(8): 2144–2152. - Gooday, G., 2004,
*The Morals of Measurement: Accuracy, Irony and Trust in Late Victorian Electrical Practice*, Cambridge: Cambridge University Press. - Grant, E., 1996,
*The foundations of modern science in the middle ages*, Cambridge: Cambridge University Press. - Grattan-Guinness, I., 1996, “Numbers, magnitudes, ratios,
and proportions in Euclid's Elements: How did he handle
them?”,
*Historia Mathematica*, 23: 355–375. - Guala, F., 2008, “Paradigmatic Experiments: The Ultimatum
Game from Testing to Measurement Device”,
*Philosophy of Science*, 75: 658–669. - Hacking, I, 1983,
*Representing and Intervening*, Cambridge: Cambridge University Press. - Harré, R., 1981,
*Great Scientific Experiments: Twenty Experiments that Changed our View of the World*, Oxford: Phaidon Press. - Hartley, R.V., 1928, “Transmission of
information”,
*Bell System technical journal*, 7(3): 535–563. - Heidelberger, M., 1993a,
*Nature from Within: Gustav Theodore Fechner and His Psychophysical Worldview*, C. Klohr (trans.), Pittsburgh: University of Pittsburgh Press, 2004. - –––, 1993b, “Fechner’s impact for
measurement theory”, commentary on D.J. Murray, “A
perspective for viewing the history of
psychophysics”,
*Behavioural and Brain Sciences*, 16(1): 146–148. - von Helmholtz, H., 1887,
*Counting and measuring*, C.L. Bryan (trans.), New Jersey: D. Van Nostrand, 1930. - Hempel, C.G., 1952,
*Fundamentals of concept formation in empirical science*, International Encyclopedia of Unified Science, Vol. II. No. 7, Chicago and London: University of Chicago Press. - –––, 1956, “A logical appraisal of operationalism”, in Frank 1956: 52–67.
- –––, 1966,
*Philosophy of Natural Science*, Englewood Cliffs, N.J.: Prentice-Hall. - Hölder, O., 1901, “Die Axiome der Quantität und
die Lehre vom Mass”,
*Berichte über die Verhandlungen der Königlich Sächsischen Gesellschaft der Wissenschaften zu Leipzig, Mathematische-Physische Klasse*, 53: 1–64. (for an excerpt translated into English see Michell and Ernst 1996) - Hoover, K. and M. Dowell, 2001, “Measuring Causes: Episodes
in the Quantitative Assessment of the Value of Money”,
in
*The Age of Economic Measurement*, Annual supplement to vol. 33 of*History of Political Economy*, J. Klein and M. Morgan (eds.), pp. 137–161. - Israel-Jost, V., 2011, “The Epistemological Foundations of
Scientific Observation”,
*South African Journal of Philosophy*, 30(1): 29–40. - JCGM (Joint Committee for Guides in Metrology),
2012,
*International Vocabulary of Metrology—Basic and general concepts and associated terms*(VIM), 3rd edition with minor corrections. Sèvres: JCGM. [JCGM 2012 available online] - Jorgensen, L.M., 2009, “The Principle of Continuity and
Leibniz's Theory of Consciousness”,
*Journal of the History of Philosophy*, 47(2): 223–248. - Jung, E., 2011, “Intension and Remission of Forms”,
in
*Encyclopedia of Medieval Philosophy*, H. Lagerlund (ed.), Netherlands: Springer, pp. 551–555. - Kant, I., 1787,
*Critique of Pure Reason*, P. Guyer and A.W. Wood (trans.), Cambridge: Cambridge University Press, 1998. - Kirpatovskii, S.I., 1974, “Principles of the information
theory of measurements”,
*Izmeritel'naya Tekhnika*, 5: 11–13, English translation in*Measurement Techniques*, 17(5): 655–659. - Krantz, D.H., R.D. Luce, P. Suppes, and A. Tversky,
1971,
*Foundations of Measurement Vol 1: Additive and Polynomial Representations*, San Diego and London: Academic Press. (for references to the two other volumes see Suppes et al. 1989 and Luce et al. 1990) - von Kries, J., 1882, “Über die Messung intensiver
Grösse und über das sogenannte psychophysiches
Gesetz”,
*Vierteljahrschrift für wissenschaftliche Philosophie (Leipzig)*, 6: 257–294. - Kuhn, T.S., 1961, “The Function of Measurement in Modern
Physical Sciences”,
*Isis*, 52(2): 161–193. - Kyburg, H.H. Jr., 1984,
*Theory and Measurement*, Cambridge: Cambridge University Press. - Latour, B., 1987,
*Science in Action*, Cambridge: Harvard University Press. - Leplège, A., 2003, “Epistemology of Measurement in
the Social Sciences: Historical and Contemporary
Perspectives”,
*Social Science Information*, 42: 451–462. - Luce, R.D., D.H. Krantz, P. Suppes, and A. Tversky,
1990,
*Foundations of Measurement Vol 3: Representation, Axiomatization, and Invariance*, San Diego and London: Academic Press. (for references to the two other volumes see Krantz et al. 1971 and Suppes et al. 1989) - Luce, R.D., and J.W. Tukey, 1964, “Simultaneous conjoint
measurement: A new type of fundamental measurement”,
*Journal of mathematical psychology*, 1(1): 1–27. - Luce, R.D. and P. Suppes, 2004, “Representational
Measurement Theory”, in
*Stevens' Handbook of Experimental Psychology*, vol. 4: Methodology in Experimental Psychology, J. Wixted and H. Pashler (eds.), New York: Wiley, 3rd edition, pp. 1–41. - Mach, E., 1896,
*Principles of the Theory of Heat*, T.J. McCormack (trans.), Dordrecht: D. Reidel, 1986. - Mari, L., 1999, “Notes towards a qualitative analysis of
information in measurement results”,
*Measurement*, 25(3): 183–192. - –––, 2000, “Beyond the representational
viewpoint: a new formalization of
measurement”,
*Measurement*, 27: 71–84. - –––, 2003, “Epistemology of
Measurement”,
*Measurement*, 34: 17–30. - –––, 2005a, “The problem of foundations of
measurement”,
*Measurement*, 38: 259–266. - –––, 2005b, “Models of the Measurement
Process”, in
*Handbook of Measuring Systems Design*, vol. 2, P. Sydenman and R. Thorn (eds.), Wiley, Ch. 104. - Mari, L., and M. Wilson, 2014, “An introduction to the Rasch
measurement approach for metrologists”,
*Measurement*, 51: 315–327. - Mari, L. and A. Giordani, 2013, “Modeling measurement: error
and uncertainty,’, in
*Error and Uncertainty in Scientific Practice*, M. Boumans, G. Hon, and A. Petersen (eds.), Ch. 4. - Maxwell, J.C., 1873,
*A Treatise on Electricity and Magnetism*, Oxford: Clarendon Press. - McClimans, L., 2010, “A theoretical framework for
patient-reported outcome measures”,
*Theoretical Medicine and Bioethics*, 31: 225–240. - McClimans, L. and P. Browne, 2012, “Quality of life is a
process not an outcome”,
*Theoretical Medicine and Bioethics*, 33: 279–292. - Michell, J., 1993, “The origins of the representational
theory of measurement: Helmholtz, Hölder, and
Russell”,
*Studies in History and Philosophy of Science Part A*, 24(2): 185–206. - –––, 1994, “Numbers as Quantitative
Relations and the Traditional Theory of
Measurement”,
*British Journal for the Philosophy of Science*, 45: 389–406. - –––, 1999,
*Measurement in Psychology: Critical History of a Methodological Concept*, Cambridge: Cambridge University Press. - –––, 2003, “Epistemology of Measurement:
the Relevance of its History for Quantification in the Social
Sciences”,
*Social Science Information*, 42(4): 515–534. - –––, 2004, “History and philosophy of
measurement: A realist view”, in
*Proceedings of the 10th IMEKO TC7 International symposium on advances of measurement science*, [Michell 2004 available online] - –––, 2005, “The logic of measurement: A
realist overview”,
*Measurement*, 38(4): 285–294. - Michell, J. and C. Ernst, 1996, “The Axioms of Quantity and
the Theory of Measurement”,
*Journal of Mathematical Psychology*, 40: 235–252. (This article contains a translation into English of a long excerpt from Hölder 1901) - Morgan, M., 2001, “Making measuring instruments”,
in
*The Age of Economic Measurement*, Annual supplement to vol. 33 of*History of Political Economy*, J.L. Klein and M. Morgan (eds.), pp. 235–251. - Morgan, M. and M. Morrison (eds.), 1999,
*Models as Mediators: Perspectives on Natural and Social Science*, Cambridge: Cambridge University Press. - Morrison, M., 1999, “Models as Autonomous Agents”, in Morgan and Morrison 1999: 38–65.
- –––, 2009, “Models, measurement and
computer simulation: the changing face of
experimentation”,
*Philosophical Studies*, 143: 33–57. - Morrison, M. and M. Morgan, 1999, “Models as Mediating Instruments”, in Morgan and Morrison 1999: 10–37.
- Mundy, B., 1987, “The metaphysics of
quantity”,
*Philosophical Studies*, 51(1): 29–54. - Nagel, E., 1931, “Measurement”,
*Erkenntnis*, 2(1): 313–333. - Narens, L., 1981, “On the scales of
measurement”,
*Journal of Mathematical Psychology*, 24: 249–275. - –––, 1985,
*Abstract Measurement Theory*, Cambridge, MA: MIT Press. - Nunnally, J.C., and I.H. Bernstein, 1994,
*Psychometric Theory*, New York: McGraw-Hill, 3rd edition. - Parker, W., forthcoming, “Computer Simulation, Measurement
and Data Assimilation”,
*British Journal for the Philosophy of Science*. - Poincaré, H., 1898, “The Measure of Time”,
in
*The Value of Science*, New York: Dover, 1958, pp. 26–36. - –––, 1902,
*Science and Hypothesis*, W.J. Greenstreet (trans.), New York: Cosimo, 2007. - Porter, T.M., 1995,
*Trust in Numbers: The Pursuit of Objectivity in Science and Public Life*, New Jersey: Princeton University Press. - –––, 2007, “Precision”, in Boumans 2007b: 343–356.
- Rasch, G., 1960,
*Probabilistic Models for Some Intelligence and Achievement Tests*, Copenhagen: Danish Institute for Educational Research. - Reiss, J., 2001, “Natural Economic Quantities and Their
Measurement”,
*Journal of Economic Methodology*, 8(2): 287–311. - Riordan, S., 2014, “The Objectivity of Scientific
Measures”,
*Studies in History and Philosophy of Science Part A*. Riordan 2014 available online] - Reichenbach, H., 1927,
*The Philosophy of Space and Time*, New York: Dover Publications, 1958. - Rothbart, D. and S.W. Slayden, 1994, “The Epistemology of a
Spectrometer”,
*Philosophy of Science*, 61: 25–38. - Russell, B., 1903,
*The Principles of Mathematics*, New York: W.W. Norton. - Savage, C.W. and P. Ehrlich, 1992, “A brief introduction to
measurement theory and to the essays”, in
*Philosophical and Foundational Issues in Measurement Theory*, C.W. Savage and P. Ehrlich (eds.), New Jersey: Lawrence Erlbaum, pp. 1–14. - Schaffer, S., 1992, “Late Victorian metrology and its
instrumentation: a manufactory of Ohms”, in
*Invisible Connections: Instruments, Institutions, and Science*, R. Bud and S.E. Cozzens (eds.), Cardiff: SPIE Optical Engineering, pp. 23–56. - Scott, D. and P. Suppes, 1958, “Foundational aspects of
theories of measurement”,
*Journal of Symbolic logic*, 23(2): 113–128. - Shannon, C.E., 1948, “A Mathematical Theory of
Communication”,
*The Bell System Technical Journal*, 27: 379–423 and 623–656. - Shannon, C.E. and W. Weaver, 1949,
*A Mathematical Theory of Communication*, Urbana: The University of Illinois Press. - Shapere, D., 1982, “The Concept of Observation in Science
and Philosophy”,
*Philosophy of Science*, 49(4): 485–525. - Skinner, B.F., 1945, “The operational analysis of psychological terms”, in Boring et al. 1945: 270–277.
- Soler, L., C. Allamel-Raffin, F. Wieber, and J.L. Gangloff, 2011, “Calibration in everyday scientific practice: a conceptual framework”, paper presented at the 3rd Biennial Conference of the Society for Philosophy of Science in Practice, Exeter, UK. [Soler et al. 2011 available online]
- Stevens, S.S., 1935, “The operational definition of
psychological concepts”,
*Psychological Review*, 42(6): 517–527. - –––, 1946, “On the theory of scales of
measurement”,
*Science*, 103: 677–680. - –––, 1951, “Mathematics, Measurement,
Psychophysics”, in
*Handbook of Experimental Psychology*, S.S. Stevens (ed.), New York: Wiley & Sons, pp. 1–49. - –––, 1959, “Measurement, psychophysics and
utility”, in
*Measurement: Definitions and Theories*, C.W. Churchman and P. Ratoosh (eds.), New York: Wiley & Sons, pp. 18–63. - –––, 1975,
*Psychophysics: Introduction to Its Perceptual, Neural and Social Prospects*, New York: Wiley & Sons. - Suppes, P., 1951, “A set of independent axioms for extensive
quantities”,
*Portugaliae Mathematica*, 10(4): 163–172. - –––, 1960, “A Comparison of the Meaning
and Uses of Models in Mathematics and the Empirical
Sciences”,
*Synthese*, 12(2): 287–301. - –––, 1962, “Models of Data”,
in
*Logic, methodology and philosophy of science: proceedings of the 1960 International Congress*, E. Nagel (ed.), Stanford: Stanford University Press, pp. 252–261. - –––, 1967, “What is a Scientific
Theory?”, in
*Philosophy of Science Today*, S. Morgenbesser (ed.), New York: Basic Books, pp. 55–67. - Suppes, P., D.H. Krantz, R.D. Luce, and A. Tversky,
1989,
*Foundations of Measurement Vol 2: Geometrical, Threshold and Probabilistic Representations*, San Diego and London: Academic Press. (for references to the two other volumes see Krantz et al. 1971 and Luce et al. 1990) - Swoyer, C., 1987, “The Metaphysics of Measurement”,
in
*Measurement, Realism and Objectivity*, J. Forge (ed.), Reidel, pp. 235–290. - Sylla, E., 1971, “Medieval quantifications of qualities: The
‘Merton School’”,
*Archive for history of exact sciences*, 8(1): 9–39. - Tabor, D., 1970, “The hardness of solids”,
*Review of Physics in Technology*, 1(3): 145–179. - Tal, E., 2011, “How Accurate Is the Standard
Second?”,
*Philosophy of Science*, 78(5): 1082–96. - –––, 2013, “Old and New Problems in
Philosophy of Measurement”,
*Philosophy Compass*, 8(12): 1159–1173. - –––, forthcoming-a, “Making Time: A Study
in the Epistemology of Measurement”,
*British Journal for the Philosophy of Science*, doi 10.1093/bjps/axu037. - –––, forthcoming-b, “A Model-Based
Epistemology of Measurement”, in
*Reasoning in Measurement*, N. Mößner and A. Nordmann (eds.), London: Pickering & Chatto Publishers. - Teller, P., 2013a, “The concept of
measurement-precision”,
*Synthese*, 190: 189–202. - –––, 2013b, “Measurement accuracy realism”, paper presented at Foundations of Physics 2013: The 17th UK and European Meeting on the Foundations of Physics. [Teller 2013a available online]
- Thomson, W., 1889, “Electrical Units of Measurement”,
in
*Popular Lectures and Addresses*, vol. 1, London: MacMillan, pp. 73–136. - Trout, J.D., 1998,
*Measuring the intentional world: Realism, naturalism, and quantitative methods in the behavioral sciences*, Oxford: Oxford University Press. - –––, 2000, “Measurement”, in
*A Companion to the Philosophy of Science*, W.H. Newton-Smith (ed.), Malden, MA: Blackwell, pp. 265–276. - van Fraassen, B.C., 1980,
*The Scientific Image*, Oxford: Clarendon Press. - –––, 2008,
*Scientific Representation: Paradoxes of Perspective*, Oxford: Oxford University Press. - –––, 2009, “The perils of Perrin, in the
hands of philosophers”,
*Philosophical Studies*, 143: 5–24. - –––, 2012, “Modeling and Measurement: The
Criterion of Empirical Grounding”,
*Philosophy of Science*, 79(5): 773–784. - Wilson, M., 2013, “Using the concept of a measurement system
to characterize measurement models used in
psychometrics”,
*Measurement*, 46(9): 3766–3774. - Wise, M.N. (ed.), 1995,
*The Values of Precision*, NJ: Princeton University Press. - Wise, M.N. and C. Smith, 1986, “Measurement, Work and
Industry in Lord Kelvin's Britain”,
*Historical Studies in the Physical and Biological Sciences*, 17(1): 147–173.

## Academic Tools

How to cite this entry. Preview the PDF version of this entry at the Friends of the SEP Society. Look up this entry topic at the Indiana Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers, with links to its database.

## Other Internet Resources

- Openly accessible guides to metrological terms and methods by the International Bureau of Weights and Measures (BIPM)
- Bibliography on measurement in science at PhilPapers.

### Acknowledgments

The author would like to thank Stephan Hartmann, Wendy Parker, Paul
Teller, Alessandra Basso, Sally Riordan, Johanna Wolff, Conrad
Heilmann and participants of the History and Philosophy of Physics
reading group at the Department of History and Philosophy of Science
at the University of Cambridge for helpful feedback on drafts of this
entry. The author is also indebted to Joel Michell and Oliver
Schliemann for useful bibliographical advice, and to John Wiley and
Sons Publishers for permission to reproduce excerpt from Tal
(2013). Work on this entry was supported by an Alexander von Humboldt
Postdoctoral Research Fellowship and a Marie Curie Intra-European
Fellowship within the 7^{th} European Community Framework
Programme.