# Measurement in Science

*First published Mon Jun 15, 2015; substantive revision Fri Aug 7, 2020*

Measurement is an integral part of modern science as well as of
engineering, commerce, and daily life. Measurement is often considered
a hallmark of the scientific enterprise and a privileged source of
knowledge relative to qualitative modes of
inquiry.^{[1]}
Despite its ubiquity and importance, there is little consensus among
philosophers as to how to define measurement, what sorts of things are
measurable, or which conditions make measurement possible. Most (but
not all) contemporary authors agree that measurement is an activity
that involves interaction with a concrete system with the aim of
representing aspects of that system in abstract terms (e.g., in terms
of classes, numbers, vectors etc.) But this characterization also fits
various kinds of perceptual and linguistic activities that are not
usually considered measurements, and is therefore too broad to count
as a definition of measurement. Moreover, if “concrete”
implies “real”, this characterization is also too narrow,
as measurement often involves the representation of ideal systems such
as the average household or an electron at complete rest.

Philosophers have written on a variety of conceptual, metaphysical, semantic and epistemological issues related to measurement. This entry will survey the central philosophical standpoints on the nature of measurement, the notion of measurable quantity and related epistemological issues. It will refrain from elaborating on the many discipline-specific problems associated with measurement and focus on issues that have a general character.

- 1. Overview
- 2. Quantity and Magnitude: A Brief History
- 3. Mathematical Theories of Measurement (“Measurement Theory”)
- 4. Operationalism and Conventionalism
- 5. Realist Accounts of Measurement
- 6. Information-Theoretic Accounts of Measurement
- 7. Model-Based Accounts of Measurement
- 8. The Epistemology of Measurement
- Bibliography
- Academic Tools
- Other Internet Resources
- Related Entries

## 1. Overview

Modern philosophical discussions about measurement—spanning from the late nineteenth century to the present day—may be divided into several strands of scholarship. These strands reflect different perspectives on the nature of measurement and the conditions that make measurement possible and reliable. The main strands are mathematical theories of measurement, operationalism, conventionalism, realism, information-theoretic accounts and model-based accounts. These strands of scholarship do not, for the most part, constitute directly competing views. Instead, they are best understood as highlighting different and complementary aspects of measurement. The following is a very rough overview of these perspectives:

**Mathematical theories****of measurement**view measurement as the mapping of qualitative empirical relations to relations among numbers (or other mathematical entities).**Operationalists and conventionalists**view measurement as a set of operations that shape the meaning and/or regulate the use of a quantity-term.**Realists**view measurement as the estimation of mind-independent properties and/or relations.**Information-theoretic accounts**view measurement as the gathering and interpretation of information about a system.**Model-based accounts**view measurement as the coherent assignment of values to parameters in a theoretical and/or statistical model of a process.

These perspectives are in principle consistent with each other. While mathematical theories of measurement deal with the mathematical foundations of measurement scales, operationalism and conventionalism are primarily concerned with the semantics of quantity terms, realism is concerned with the metaphysical status of measurable quantities, and information-theoretic and model-based accounts are concerned with the epistemological aspects of measuring. Nonetheless, the subject domain is not as neatly divided as the list above suggests. Issues concerning the metaphysics, epistemology, semantics and mathematical foundations of measurement are interconnected and often bear on one another. Hence, for example, operationalists and conventionalists have often adopted anti-realist views, and proponents of model-based accounts have argued against the prevailing empiricist interpretation of mathematical theories of measurement. These subtleties will become clear in the following discussion.

The list of strands of scholarship is neither exclusive nor exhaustive. It reflects the historical trajectory of the philosophical discussion thus far, rather than any principled distinction among different levels of analysis of measurement. Some philosophical works on measurement belong to more than one strand, while many other works do not squarely fit either. This is especially the case since the early 2000s, when measurement returned to the forefront of philosophical discussion after several decades of relative neglect. This recent body of scholarship is sometimes called “the epistemology of measurement”, and includes a rich array of works that cannot yet be classified into distinct schools of thought. The last section of this entry will be dedicated to surveying some of these developments.

## 2. Quantity and Magnitude: A Brief History

Although the philosophy of measurement formed as a distinct area of
inquiry only during the second half of the nineteenth century,
fundamental concepts of measurement such as magnitude and quantity
have been discussed since antiquity. According to Euclid’s
*Elements*, a magnitude—such as a line, a surface or a
solid—measures another when the latter is a whole multiple of
the former (Book V, def. 1 & 2). Two magnitudes have a common
measure when they are both whole multiples of some magnitude, and are
incommensurable otherwise (Book X, def. 1). The discovery of
incommensurable magnitudes allowed Euclid and his contemporaries to
develop the notion of a *ratio* of magnitudes. Ratios can be
either rational or irrational, and therefore the concept of ratio is
more general than that of measure (Michell 2003, 2004a;
Grattan-Guinness 1996).

Aristotle distinguished between quantities and qualities. Examples of
quantities are numbers, lines, surfaces, bodies, time and place,
whereas examples of qualities are justice, health, hotness and
paleness (*Categories* §6 and §8). According to
Aristotle, quantities admit of equality and inequality but not of
degrees, as “one thing is not more four-foot than another”
(ibid. 6.6a19). Qualities, conversely, do not admit of equality or
inequality but do admit of degrees, “for one thing is called
more pale or less pale than another” (ibid. 8.10b26). Aristotle
did not clearly specify whether degrees of qualities such as paleness
correspond to distinct qualities, or whether the same quality,
paleness, was capable of different intensities. This topic was at the
center of an ongoing debate in the thirteenth and fourteenth centuries
(Jung 2011). Duns Scotus supported the “addition theory”,
according to which a change in the degree of a quality can be
explained by the addition or subtraction of smaller degrees of that
quality (2011: 553). This theory was later refined by Nicole Oresme,
who used geometrical figures to represent changes in the intensity of
qualities such as velocity (Clagett 1968; Sylla 1971). Oresme’s
geometrical representations established a subset of qualities that
were amenable to quantitative treatment, thereby challenging the
strict Aristotelian dichotomy between quantities and qualities. These
developments made possible the formulation of quantitative laws of
motion during the sixteenth and seventeenth centuries (Grant
1996).

The concept of qualitative intensity was further developed by Leibniz and Kant. Leibniz’s “principle of continuity” stated that all natural change is produced by degrees. Leibniz argued that this principle applies not only to changes in extended magnitudes such as length and duration, but also to intensities of representational states of consciousness, such as sounds (Jorgensen 2009; Diehl 2012). Kant is thought to have relied on Leibniz’s principle of continuity to formulate his distinction between extensive and intensive magnitudes. According to Kant, extensive magnitudes are those “in which the representation of the parts makes possible the representation of the whole” (1787: A162/B203). An example is length: a line can only be mentally represented by a successive synthesis in which parts of the line join to form the whole. For Kant, the possibility of such synthesis was grounded in the forms of intuition, namely space and time. Intensive magnitudes, like warmth or colors, also come in continuous degrees, but their apprehension takes place in an instant rather than through a successive synthesis of parts. The degrees of intensive magnitudes “can only be represented through approximation to negation” (1787: A 168/B210), that is, by imagining their gradual diminution until their complete absence.

Scientific developments during the nineteenth century challenged the distinction between extensive and intensive magnitudes. Thermodynamics and wave optics showed that differences in temperature and hue corresponded to differences in spatio-temporal magnitudes such as velocity and wavelength. Electrical magnitudes such as resistance and conductance were shown to be capable of addition and division despite not being extensive in the Kantian sense, i.e., not synthesized from spatial or temporal parts. Moreover, early experiments in psychophysics suggested that intensities of sensation such as brightness and loudness could be represented as sums of “just noticeable differences” among stimuli, and could therefore be thought of as composed of parts (see Section 3.3). These findings, along with advances in the axiomatization of branches of mathematics, motivated some of the leading scientists of the late nineteenth century to attempt to clarify the mathematical foundations of measurement (Maxwell 1873; von Kries 1882; Helmholtz 1887; Mach 1896; Poincaré 1898; Hölder 1901; for historical surveys see Darrigol 2003; Michell 1993, 2003; Cantù and Schlaudt 2013; Biagioli 2016: Ch. 4, 2018). These works are viewed today as precursors to the body of scholarship known as “measurement theory”.

## 3. Mathematical Theories of Measurement (“Measurement Theory”)

Mathematical theories of measurement (often referred to collectively
as “measurement theory”) concern the conditions under
which relations among numbers (and other mathematical entities) can be
used to express relations among
objects.^{[2]}
In order to appreciate the need for mathematical theories of
measurement, consider the fact that relations exhibited by
numbers—such as equality, sum, difference and ratio—do not
always correspond to relations among the objects measured by those
numbers. For example, 60 is twice 30, but one would be mistaken in
thinking that an object measured at 60 degrees Celsius is twice as hot
as an object at 30 degrees Celsius. This is because the zero point of
the Celsius scale is arbitrary and does not correspond to an absence
of
temperature.^{[3]}
Similarly, numerical intervals do not always carry empirical
information. When subjects are asked to rank on a scale from 1 to 7
how strongly they agree with a given statement, there is no *prima
facie* reason to think that the intervals between 5 and 6 and
between 6 and 7 correspond to equal increments of strength of opinion.
To provide a third example, equality among numbers is transitive [if
(a=b & b=c) then a=c] but empirical comparisons among physical
magnitudes reveal only approximate equality, which is not a transitive
relation. These examples suggest that not all of the mathematical
relations among numbers used in measurement are empirically
significant, and that different kinds of measurement scale convey
different kinds of empirically significant information.

The study of measurement scales and the empirical information they convey is the main concern of mathematical theories of measurement. In his seminal 1887 essay, “Counting and Measuring”, Hermann von Helmholtz phrased the key question of measurement theory as follows:

[W]hat is the objective meaning of expressing through denominate numbers the relations of real objects as magnitudes, and under what conditions can we do this? (1887: 4)

Broadly speaking, measurement theory sets out to (i) identify the
assumptions underlying the use of various mathematical structures for
describing aspects of the empirical world, and (ii) draw lessons about
the adequacy and limits of using these mathematical structures for
describing aspects of the empirical world. Following Otto Hölder
(1901), measurement theorists often tackle these goals through formal
proofs, with the assumptions in (i) serving as axioms and the lessons
in (ii) following as theorems. A key insight of measurement theory is
that the empirically significant aspects of a given mathematical
structure are those that *mirror relevant relations* among the
objects being measured. For example, the relation “bigger
than” among numbers is empirically significant for measuring
length insofar as it mirrors the relation “longer than”
among objects. This mirroring, or mapping, of relations between
objects and mathematical entities constitutes a measurement scale. As
will be clarified below, measurement scales are usually thought of as
isomorphisms or homomorphisms between objects and mathematical
entities.

Other than these broad goals and claims, measurement theory is a
highly heterogeneous body of scholarship. It includes works that span
from the late nineteenth century to the present day and endorse a wide
array of views on the ontology, epistemology and semantics of
measurement. Two main differences among mathematical theories of
measurement are especially worth mentioning. The first concerns the
nature of the *relata*, or “objects”, whose
relations numbers are supposed to mirror. These *relata* may be
understood in at least four different ways: as concrete individual
objects, as qualitative observations of concrete individual objects,
as abstract representations of individual objects, or as universal
properties of objects. Which interpretation is adopted depends in
large part on the author’s metaphysical and epistemic
commitments. This issue will be especially relevant to the discussion
of realist accounts of measurement
(Section 5).
Second, different measurement theorists have taken different stands
on the kind of empirical evidence that is required to establish
mappings between objects and numbers. As a result, measurement
theorists have come to disagree about the necessary conditions for
establishing the measurability of attributes, and specifically about
whether psychological attributes are measurable. Debates about
measurability have been highly fruitful for the development of
measurement theory, and the following subsections will introduce some
of these debates and the central concepts developed therein.

### 3.1 Fundamental and derived measurement

During the late nineteenth and early twentieth centuries several
attempts were made to provide a universal definition of measurement.
Although accounts of measurement varied, the consensus was that
measurement is a method of *assigning numbers to magnitudes*.
For example, Helmholtz (1887: 17) defined measurement as the procedure
by which one finds the denominate number that expresses the value of a
magnitude, where a “denominate number” is a number
together with a unit, e.g., 5 meters, and a magnitude is a quality of
objects that is amenable to ordering from smaller to greater, e.g.,
length. Bertrand Russell similarly stated that measurement is

any method by which a unique and reciprocal correspondence is established between all or some of the magnitudes of a kind and all or some of the numbers, integral, rational or real. (1903: 176)

Norman Campbell defined measurement simply as “the process of assigning numbers to represent qualities”, where a quality is a property that admits of non-arbitrary ordering (1920: 267).

Defining measurement as numerical assignment raises the question: which assignments are adequate, and under what conditions? Early measurement theorists like Helmholtz (1887), Hölder (1901) and Campbell (1920) argued that numbers are adequate for expressing magnitudes insofar as algebraic operations among numbers mirror empirical relations among magnitudes. For example, the qualitative relation “longer than” among rigid rods is (roughly) transitive and asymmetrical, and in this regard shares structural features with the relation “larger than” among numbers. Moreover, the end-to-end concatenation of rigid rods shares structural features—such as associativity and commutativity—with the mathematical operation of addition. A similar situation holds for the measurement of weight with an equal-arms balance. Here deflection of the arms provides ordering among weights and the heaping of weights on one pan constitutes concatenation.

Early measurement theorists formulated axioms that describe these
qualitative empirical structures, and used these axioms to prove
theorems about the adequacy of assigning numbers to magnitudes that
exhibit such structures. Specifically, they proved that ordering and
concatenation are together sufficient for the construction of an
*additive* numerical representation of the relevant magnitudes.
An additive representation is one in which addition is empirically
meaningful, and hence also multiplication, division etc. Campbell
called measurement procedures that satisfy the conditions of
additivity “fundamental” because they do not involve the
measurement of any other magnitude (1920: 277). Kinds of magnitudes
for which a fundamental measurement procedure has been
found—such as length, area, volume, duration, weight and
electrical resistance—Campbell called “fundamental
magnitudes”. A hallmark of such magnitudes is that it is
possible to generate them by concatenating a standard sequence of
equal units, as in the example of a series of equally spaced marks on
a ruler.

Although they viewed additivity as the hallmark of measurement, most early measurement theorists acknowledged that additivity is not necessary for measuring. Other magnitudes exist that admit of ordering from smaller to greater, but whose ratios and/or differences cannot currently be determined except through their relations to other, fundamentally measurable magnitudes. Examples are temperature, which may be measured by determining the volume of a mercury column, and density, which may be measured as the ratio of mass and volume. Such indirect determination came to be called “derived” measurement and the relevant magnitudes “derived magnitudes” (Campbell 1920: 275–7).

At first glance, the distinction between fundamental and derived
measurement may seem reminiscent of the distinction between extensive
and intensive magnitudes, and indeed fundamental measurement is
sometimes called “extensive”. Nonetheless, it is important
to note that the two distinctions are based on significantly different
criteria of measurability. As discussed in
Section 2,
the extensive-intensive distinction focused on the intrinsic
structure of the quantity in question, i.e., whether or not it is
composed of spatio-temporal parts. The fundamental-derived
distinction, by contrast, focuses on the properties of measurement
*operations*. A fundamentally measurable magnitude is one for
which a fundamental measurement operation has been found.
Consequently, fundamentality is not an intrinsic property of a
magnitude: a derived magnitude can become fundamental with the
discovery of new operations for its measurement. Moreover, in
fundamental measurement the numerical assignment need not mirror the
structure of spatio-temporal parts. Electrical resistance, for
example, can be fundamentally measured by connecting resistors in a
series (Campbell 1920: 293). This is considered a fundamental
measurement operation because it has a shared structure with numerical
addition, even though objects with equal resistance are not generally
equal in size.

The distinction between fundamental and derived measurement was
revised by subsequent authors. Brian Ellis (1966: Ch. 5–8)
distinguished among three types of measurement: fundamental,
associative and derived. Fundamental measurement requires ordering and
concatenation operations satisfying the same conditions specified by
Campbell. Associative measurement procedures are based on a
correlation of two ordering relationships, e.g., the correlation
between the volume of a mercury column and its temperature. Derived
measurement procedures consist in the determination of the value of a
constant in a physical law. The constant may be local, as in the
determination of the specific density of water from mass and volume,
or universal, as in the determination of the Newtonian gravitational
constant from force, mass and distance. Henry Kyburg (1984: Ch.
5–7) proposed a somewhat different threefold distinction among
direct, indirect and systematic measurement, which does not completely
overlap with that of
Ellis.^{[4]}
A more radical revision of the distinction between fundamental and
derived measurement was offered by R. Duncan Luce and John Tukey
(1964) in their work on conjoint measurement, which will be discussed
in
Section 3.4.

### 3.2 The classification of scales

The previous subsection discussed the axiomatization of empirical
structures, a line of inquiry that dates back to the early days of
measurement theory. A complementary line of inquiry within measurement
theory concerns the classification of measurement scales. The
psychophysicist S.S. Stevens (1946, 1951) distinguished among four
types of scales: nominal, ordinal, interval and ratio. Nominal scales
represent objects as belonging to classes that have no particular
order, e.g., male and female. Ordinal scales represent order but no
further algebraic structure. For example, the Mohs scale of mineral
hardness represents minerals with numbers ranging from 1 (softest) to
10 (hardest), but there is no empirical significance to equality among
intervals or ratios of those
numbers.^{[5]}
Celsius and Fahrenheit are examples of interval scales: they
represent equality or inequality among intervals of temperature, but
not ratios of temperature, because their zero points are arbitrary.
The Kelvin scale, by contrast, is a ratio scale, as are the familiar
scales representing mass in kilograms, length in meters and duration
in seconds. Stevens later refined this classification and
distinguished between linear and logarithmic interval scales (1959:
31–34) and between ratio scales with and without a natural unit
(1959: 34). Ratio scales with a natural unit, such as those used for
counting discrete objects and for representing probabilities, were
named “absolute” scales.

As Stevens notes, scale types are individuated by the families of transformations they can undergo without loss of empirical information. Empirical relations represented on ratio scales, for example, are invariant under multiplication by a positive number, e.g., multiplication by 2.54 converts from inches to centimeters. Linear interval scales allow both multiplication by a positive number and a constant shift, e.g., the conversion from Celsius to Fahrenheit in accordance with the formula °C × 9/5 + 32 = °F. Ordinal scales admit of any transformation function as long as it is monotonic and increasing, and nominal scales admit of any one-to-one substitution. Absolute scales admit of no transformation other than identity. Stevens’ classification of scales was later generalized by Louis Narens (1981, 1985: Ch. 2) and Luce et al. (1990: Ch. 20) in terms of the homogeneity and uniqueness of the relevant transformation groups.

While Stevens’ classification of scales met with general approval in scientific and philosophical circles, its wider implications for measurement theory became the topic of considerable debate. Two issues were especially contested. The first was whether classification and ordering operations deserve to be called “measurement” operations, and accordingly whether the representation of magnitudes on nominal and ordinal scales should count as measurement. Several physicists, including Campbell, argued that classification and ordering operations did not provide a sufficiently rich structure to warrant the use of numbers, and hence should not count as measurement operations. The second contested issue was whether a concatenation operation had to be found for a magnitude before it could be fundamentally measured on a ratio scale. The debate became especially heated when it re-ignited a longer controversy surrounding the measurability of intensities of sensation. It is to this debate we now turn.

### 3.3 The measurability of sensation

One of the main catalysts for the development of mathematical theories
of measurement was an ongoing debate surrounding measurability in
psychology. The debate is often traced back to Gustav Fechner’s
(1860) *Elements of Psychophysics*, in which he described a
method of measuring intensities of sensation. Fechner’s method
was based on the recording of “just noticeable
differences” between sensations associated with pairs of
stimuli, e.g., two sounds of different intensity. These differences
were assumed to be equal increments of intensity of sensation. As
Fechner showed, under this assumption a stable linear relationship is
revealed between the intensity of sensation and the logarithm of the
intensity of the stimulus, a relation that came to be known as
“Fechner’s law” (Heidelberger 1993a: 203; Luce and
Suppes 2004: 11–2). This law in turn provides a method for
indirectly measuring the intensity of sensation by measuring the
intensity of the stimulus, and hence, Fechner argued, provides
justification for measuring intensities of sensation on the real
numbers.

Fechner’s claims concerning the measurability of sensation became the subject of a series of debates that lasted nearly a century and proved extremely fruitful for the philosophy of measurement, involving key figures such as Mach, Helmholtz, Campbell and Stevens (Heidelberger 1993a: Ch. 6 and 1993b; Michell 1999: Ch. 6). Those objecting to the measurability of sensation, such as Campbell, stressed the necessity of an empirical concatenation operation for fundamental measurement. Since intensities of sensation cannot be concatenated to each other in the manner afforded by lengths and weights, there could be no fundamental measurement of sensation intensity. Moreover, Campbell claimed that none of the psychophysical regularities discovered thus far are sufficiently universal to count as laws in the sense required for derived measurement (Campbell in Ferguson et al. 1940: 347). All that psychophysicists have shown is that intensities of sensation can be consistently ordered, but order by itself does not yet warrant the use of numerical relations such as sums and ratios to express empirical results.

The central opponent of Campbell in this debate was Stevens, whose
distinction between types of measurement scale was discussed above.
Stevens defined measurement as the “assignment of numerals to
objects or events according to rules” (1951: 1) and claimed that
any consistent and non-random assignment counts as measurement in the
broad sense (1975: 47). In useful cases of scientific inquiry, Stevens
claimed, measurement can be construed somewhat more narrowly as a
numerical assignment that is based on the results of *matching*
operations, such as the coupling of temperature to mercury volume or
the matching of sensations to each other. Stevens argued against the
view that relations among numbers need to mirror qualitative empirical
structures, claiming instead that measurement scales should be
regarded as arbitrary formal schemas and adopted in accordance with
their usefulness for describing empirical data. For example, adopting
a ratio scale for measuring the sensations of loudness, volume and
density of sounds leads to the formulation of a simple linear relation
among the reports of experimental subjects: loudness = volume ×
density (1975: 57–8). Such assignment of numbers to sensations
counts as measurement because it is consistent and non-random, because
it is based on the matching operations performed by experimental
subjects, and because it captures regularities in the experimental
results. According to Stevens, these conditions are together
sufficient to justify the use of a ratio scale for measuring
sensations, despite the fact that “sensations cannot be
separated into component parts, or laid end to end like measuring
sticks” (1975: 38; see also Hempel 1952: 68–9).

### 3.4 Representational Theory of Measurement

In the mid-twentieth century the two main lines of inquiry in
measurement theory, the one dedicated to the empirical conditions of
quantification and the one concerning the classification of scales,
converged in the work of Patrick Suppes (1951; Scott and Suppes 1958;
for historical surveys see Savage and Ehrlich 1992; Diez 1997a,b).
Suppes’ work laid the basis for the Representational Theory of
Measurement (RTM), which remains the most influential mathematical
theory of measurement to date (Krantz et al. 1971; Suppes et al. 1989;
Luce et al. 1990). RTM defines measurement as the construction of
mappings from empirical relational structures into numerical
relational structures (Krantz et al. 1971: 9). An empirical relational
structure consists of a set of empirical objects (e.g., rigid rods)
along with certain qualitative relations among them (e.g., ordering,
concatenation), while a numerical relational structure consists of a
set of numbers (e.g., real numbers) and specific mathematical
relations among them (e.g., “equal to or bigger than”,
addition). Simply put, a measurement scale is a many-to-one
mapping—a homomorphism—from an empirical to a numerical
relational structure, and measurement is the construction of
scales.^{[6]}
RTM goes into great detail in clarifying the assumptions underlying
the construction of different types of measurement scales. Each type
of scale is associated with a set of assumptions about the qualitative
relations obtaining among objects represented on that type of scale.
From these assumptions, or axioms, the authors of RTM derive the
representational adequacy of each scale type, as well as the family of
permissible transformations making that type of scale unique. In this
way RTM provides a conceptual link between the empirical basis of
measurement and the typology of
scales.^{[7]}

On the issue of measurability, the Representational Theory takes a middle path between the liberal approach adopted by Stevens and the strict emphasis on concatenation operations espoused by Campbell. Like Campbell, RTM accepts that rules of quantification must be grounded in known empirical structures and should not be chosen arbitrarily to fit the data. However, RTM rejects the idea that additive scales are adequate only when concatenation operations are available (Luce and Suppes 2004: 15). Instead, RTM argues for the existence of fundamental measurement operations that do not involve concatenation. The central example of this type of operation is known as “additive conjoint measurement” (Luce and Tukey 1964; Krantz et al. 1971: 17–21 and Ch. 6–7). Here, measurements of two or more different types of attribute, such as the temperature and pressure of a gas, are obtained by observing their joint effect, such as the volume of the gas. Luce and Tukey showed that by establishing certain qualitative relations among volumes under variations of temperature and pressure, one can construct additive representations of temperature and pressure, without invoking any antecedent method of measuring volume. This sort of procedure is generalizable to any suitably related triplet of attributes, such as the loudness, intensity and frequency of pure tones, or the preference for a reward, it size and the delay in receiving it (Luce and Suppes 2004: 17). The discovery of additive conjoint measurement led the authors of RTM to divide fundamental measurement into two kinds: traditional measurement procedures based on concatenation operations, which they called “extensive measurement”, and conjoint or “nonextensive” fundamental measurement. Under this new conception of fundamentality, all the traditional physical attributes can be measured fundamentally, as well as many psychological attributes (Krantz et al. 1971: 502–3).

## 4. Operationalism and Conventionalism

Above we saw that mathematical theories of measurement are primarily
concerned with the mathematical properties of measurement scales and
the conditions of their application. A related but distinct strand of
scholarship concerns the meaning and use of quantity terms. Scientific
theories and models are commonly expressed in terms of quantitative
relations among parameters, bearing names such as
“length”, “unemployment rate” and
“introversion”. A realist about one of these terms would
argue that it refers to a set of properties or relations that exist
independently of being measured. An operationalist or conventionalist
would argue that the way such quantity-terms apply to concrete
particulars depends on nontrivial choices made by humans, and
specifically on choices that have to do with the way the relevant
quantity is measured. Note that under this broad construal, realism is
compatible with operationalism and conventionalism. That is, it is
conceivable that choices of measurement method regulate the use of a
quantity-term and that, given the *correct* choice, this term
succeeds in referring to a mind-independent property or relation.
Nonetheless, many operationalists and conventionalists adopted
stronger views, according to which there are no facts of the matter as
to which of several and nontrivially different operations is correct
for applying a given quantity-term. These stronger variants are
inconsistent with realism about measurement. This section will be
dedicated to operationalism and conventionalism, and the next to
realism about measurement.

Operationalism (or “operationism”) about measurement is the view that the meaning of quantity-concepts is determined by the set of operations used for their measurement. The strongest expression of operationalism appears in the early work of Percy Bridgman (1927), who argued that

we mean by any concept nothing more than a set of operations; the concept is synonymous with the corresponding set of operations. (1927: 5)

Length, for example, would be defined as the result of the operation
of concatenating rigid rods. According to this extreme version of
operationalism, different operations measure different quantities.
Length measured by using rulers and by timing electromagnetic pulses
should, strictly speaking, be distinguished into two distinct
quantity-concepts labeled “length-1” and
“length-2” respectively. This conclusion led Bridgman to
claim that currently accepted quantity concepts have
“joints” where different operations overlap in their
domain of application. He warned against dogmatic faith in the unity
of quantity concepts across these “joints”, urging instead
that unity be checked against experiments whenever the application of
a quantity-concept is to be extended into a new domain. Nevertheless,
Bridgman conceded that as long as the results of different operations
agree within experimental error it is pragmatically justified to label
the corresponding quantities with the same name (1927:
16).^{[8]}

Operationalism became influential in psychology, where it was
well-received by behaviorists like Edwin Boring (1945) and B.F.
Skinner (1945). Indeed, Skinner maintained that behaviorism is
“nothing more than a thoroughgoing operational analysis of
traditional mentalistic concepts” (1945: 271). Stevens, who was
Boring’s student, was a key promoter of operationalism in
psychology, and argued that psychological concepts have empirical
meaning only if they stand for definite and concrete operations (1935:
517; see also Isaac 2017). The idea that concepts are defined by
measurement operations is consistent with Stevens’ liberal views
on measurability, which were discussed above
(Section 3.3).
As long as the assignment of numbers to objects is performed in
accordance with concrete and consistent rules, Stevens maintained that
such assignment has empirical meaning and does not need to satisfy any
additional constraints. Nonetheless, Stevens probably did not embrace
an anti-realist view about psychological attributes. Instead, there
are good reasons to think that he understood operationalism as a
methodological attitude that was valuable to the extent that it
allowed psychologists to justify the conclusions they drew from
experiments (Feest 2005). For example, Stevens did not treat
operational definitions as *a priori* but as amenable to
improvement in light of empirical discoveries, implying that he took
psychological attributes to exist independently of such definitions
(Stevens 1935: 527). This suggests that Stevens’ operationalism
was of a more moderate variety than that found in the early writings
of
Bridgman.^{[9]}

Operationalism met with initial enthusiasm by logical positivists, who
viewed it as akin to verificationism. Nonetheless, it was soon
revealed that any attempt to base a theory of meaning on
operationalist principles was riddled with problems. Among such
problems were the automatic reliability operationalism conferred on
measurement operations, the ambiguities surrounding the notion of
operation, the overly restrictive operational criterion of
meaningfulness, and the fact that many useful theoretical concepts
lack clear operational definitions (Chang
2009).^{[10]}
In particular, Carl Hempel (1956, 1966) criticized operationalists
for being unable to define dispositional terms such as
“solubility in water”, and for multiplying the number of
scientific concepts in a manner that runs against the need for
systematic and simple theories. Accordingly, most writers on the
semantics of quantity-terms have avoided espousing an operational
analysis.^{[11]}

A more widely advocated approach admitted a conventional element to
the use of quantity-terms, while resisting attempts to reduce the
meaning of quantity terms to measurement operations. These accounts
are classified under the general heading
“conventionalism”, though they differ in the particular
aspects of measurement they deem conventional and in the degree of
arbitrariness they ascribe to such
conventions.^{[12]}
An early precursor of conventionalism was Ernst Mach, who examined
the notion of equality among temperature intervals (1896: 52). Mach
noted that different types of thermometric fluid expand at different
(and nonlinearly related) rates when heated, raising the question:
which fluid expands most uniformly with temperature? According to
Mach, there is no fact of the matter as to which fluid expands more
uniformly, since the very notion of equality among temperature
intervals has no determinate application prior to a conventional
choice of standard thermometric fluid. Mach coined the term
“principle of coordination” for this sort of
conventionally chosen principle for the application of a quantity
concept. The concepts of uniformity of time and space received similar
treatments by Henri Poincaré (1898, 1902: Part 2).
Poincaré argued that procedures used to determine equality
among durations stem from scientists’ unconscious preference for
descriptive simplicity, rather than from any fact about nature.
Similarly, scientists’ choice to represent space with either
Euclidean or non-Euclidean geometries is not determined by experience
but by considerations of convenience.

Conventionalism with respect to measurement reached its most
sophisticated expression in logical positivism. Logical positivists
like Hans Reichenbach and Rudolf Carnap proposed “coordinative
definitions” or “correspondence rules” as the
semantic link between theoretical and observational terms. These *a
priori*, definition-like statements were intended to regulate the
use of theoretical terms by connecting them with empirical procedures
(Reichenbach 1927: 14–19; Carnap 1966: Ch. 24). An example of a
coordinative definition is the statement: “a measuring rod
retains its length when transported”. According to Reichenbach,
this statement cannot be empirically verified, because a universal and
experimentally undetectable force could exist that equally distorts
every object’s length when it is transported. In accordance with
verificationism, statements that are unverifiable are neither true nor
false. Instead, Reichenbach took this statement to expresses an
arbitrary rule for regulating the use of the concept of equality of
length, namely, for determining whether particular instances of length
are equal (Reichenbach 1927: 16). At the same time, coordinative
definitions were not seen as replacements, but rather as necessary
additions, to the familiar sort of theoretical definitions of concepts
in terms of other concepts (1927: 14). Under the conventionalist
viewpoint, then, the specification of measurement operations did not
exhaust the meaning of concepts such as length or length-equality,
thereby avoiding many of the problems associated with
operationalism.^{[13]}

## 5. Realist Accounts of Measurement

Realists about measurement maintain that measurement is best
understood as the empirical estimation of an objective property or
relation. A few clarificatory remarks are in order with respect to
this characterization of measurement. First, the term
“objective” is not meant to exclude mental properties or
relations, which are the objects of psychological measurement. Rather,
measurable properties or relations are taken to be objective inasmuch
as they are independent of the beliefs and conventions of the humans
performing the measurement and of the methods used for measuring. For
example, a realist would argue that the ratio of the length of a given
solid rod to the standard meter has an objective value regardless of
whether and how it is measured. Second, the term
“estimation” is used by realists to highlight the fact
that measurement results are mere *approximations* of true
values (Trout 1998: 46). Third, according to realists, measurement is
aimed at obtaining knowledge about properties and relations, rather
than at assigning values directly to individual objects. This is
significant because observable objects (e.g., levers, chemical
solutions, humans) often instantiate measurable properties and
relations that are not directly observable (e.g., amount of mechanical
work, more acidic than, intelligence). Knowledge claims about such
properties and relations must presuppose some background theory. By
shifting the emphasis from objects to properties and relations,
realists highlight the theory-laden character of measurements.

Realism about measurement should not be confused with realism about entities (e.g., electrons). Nor does realism about measurement necessarily entail realism about properties (e.g., temperature), since one could in principle accept only the reality of relations (e.g., ratios among quantities) without embracing the reality of underlying properties. Nonetheless, most philosophers who have defended realism about measurement have done so by arguing for some form of realism about properties (Byerly and Lazara 1973; Swoyer 1987; Mundy 1987; Trout 1998, 2000). These realists argue that at least some measurable properties exist independently of the beliefs and conventions of the humans who measure them, and that the existence and structure of these properties provides the best explanation for key features of measurement, including the usefulness of numbers in expressing measurement results and the reliability of measuring instruments.

For example, a typical realist about length measurement would argue that the empirical regularities displayed by individual objects’ lengths when they are ordered and concatenated are best explained by assuming that length is an objective property that has an extensive structure (Swoyer 1987: 271–4). That is, relations among lengths such as “longer than” and “sum of” exist independently of whether any objects happen to be ordered and concatenated by humans, and indeed independently of whether objects of some particular length happen to exist at all. The existence of an extensive property structure means that lengths share much of their structure with the positive real numbers, and this explains the usefulness of the positive reals in representing lengths. Moreover, if measurable properties are analyzed in dispositional terms, it becomes easy to explain why some measuring instruments are reliable. For example, if one assumes that a certain amount of electric current in a wire entails a disposition to deflect an ammeter needle by a certain angle, it follows that the ammeter’s indications counterfactually depend on the amount of electric current in the wire, and therefore that the ammeter is reliable (Trout 1998: 65).

A different argument for realism about measurement is due to Joel
Michell (1994, 2005), who proposes a realist theory of number based on
the Euclidean concept of ratio. According to Michell, numbers are
ratios between quantities, and therefore exist in space and time.
Specifically, *real* numbers are ratios between pairs of
infinite standard sequences, e.g., the sequence of lengths normally
denoted by “1 meter”, “2 meters” etc. and the
sequence of whole multiples of the length we are trying to measure.
Measurement is the discovery and estimation of such ratios. An
interesting consequence of this empirical realism about numbers is
that measurement is not a representational activity, but rather the
activity of approximating mind-independent numbers (Michell 1994:
400).

Realist accounts of measurement are largely formulated in opposition
to strong versions of operationalism and conventionalism, which
dominated philosophical discussions of measurement from the 1930s
until the 1960s. In addition to the drawbacks of operationalism
already discussed in the previous section, realists point out that
anti-realism about measurable quantities fails to make sense of
scientific practice. If quantities had no real values independently of
one’s choice of measurement procedure, it would be difficult to
explain what scientists mean by “measurement accuracy” and
“measurement error”, and why they try to increase accuracy
and diminish error. By contrast, realists can easily make sense of the
notions of accuracy and error in terms of the distance between real
and measured values (Byerly and Lazara 1973: 17–8; Swoyer 1987:
239; Trout 1998: 57). A closely related point is the fact that newer
measurement procedures tend to improve on the accuracy of older ones.
If choices of measurement procedure were merely conventional it would
be difficult to make sense of such progress. In addition, realism
provides an intuitive explanation for why different measurement
procedures often yield similar results, namely, because they are
sensitive to the same facts (Swoyer 1987: 239; Trout 1998: 56).
Finally, realists note that the construction of measurement apparatus
and the analysis of measurement results are guided by theoretical
assumptions concerning causal relationships among quantities. The
ability of such causal assumptions to guide measurement suggests that
quantities are ontologically prior to the procedures that measure
them.^{[14]}

While their stance towards operationalism and conventionalism is
largely critical, realists are more charitable in their assessment of
mathematical theories of measurement. Brent Mundy (1987) and Chris
Swoyer (1987) both accept the axiomatic treatment of measurement
scales, but object to the empiricist interpretation given to the
axioms by prominent measurement theorists like Campbell (1920) and
Ernest Nagel (1931; Cohen and Nagel 1934: Ch. 15). Rather than
interpreting the axioms as pertaining to concrete objects or to
observable relations among such objects, Mundy and Swoyer reinterpret
the axioms as pertaining to universal magnitudes, e.g., to the
universal property of being 5 meter long rather than to the concrete
instantiations of that property. This construal preserves the
intuition that statements like “the size of *x* is twice
the size of *y*” are first and foremost about two
*sizes*, and only derivatively about the objects *x* and
*y* themselves (Mundy 1987:
34).^{[15]}
Mundy and Swoyer argue that their interpretation is more general,
because it logically entails all the first-order consequences of the
empiricist interpretation along with additional, second-order claims
about universal magnitudes. Moreover, under their interpretation
measurement theory becomes a genuine scientific theory, with
explanatory hypotheses and testable predictions. Building on this
work, Jo Wolff (2020a) has recently proposed a novel realist account
of quantities that relies on the Representational Theory of
Measurement. According to Wolff’s structuralist theory of
quantity, quantitative attributes are relational structures.
Specifically, an attribute is quantitative if its structure has
translations that form an Archimedean ordered group. Wolff’s
focus on translations, rather than on specific relations such as
concatenation and ordering, means that quantitativeness can be
realized in multiple ways and is not restricted to extensive
structures. It also means that being a quantity does not have anything
special to do with numbers, as both numerical and non-numerical
structures can be quantitative.

## 6. Information-Theoretic Accounts of Measurement

Information-theoretic accounts of measurement are based on an analogy between measuring systems and communication systems. In a simple communication system, a message (input) is encoded into a signal at the transmitter’s end, sent to the receiver’s end, and then decoded back (output). The accuracy of the transmission depends on features of the communication system as well as on features of the environment, i.e., the level of background noise. Similarly, measuring instruments can be thought of as “information machines” (Finkelstein 1977) that interact with an object in a given state (input), encode that state into an internal signal, and convert that signal into a reading (output). The accuracy of a measurement similarly depends on the instrument as well as on the level of noise in its environment. Conceived as a special sort of information transmission, measurement becomes analyzable in terms of the conceptual apparatus of information theory (Hartley 1928; Shannon 1948; Shannon and Weaver 1949). For example, the information that reading \(y_i\) conveys about the occurrence of a state \(x_k\) of the object can be quantified as \(\log \left[\frac{p(x_k \mid y_i)}{p(x_k)}\right]\), namely as a function of the decrease of uncertainty about the object’s state (Finkelstein 1975: 222; for alternative formulations see Brillouin 1962: Ch. 15; Kirpatovskii 1974; and Mari 1999: 185).

Ludwik Finkelstein (1975, 1977) and Luca Mari (1999) suggested the possibility of a synthesis between Shannon-Weaver information theory and measurement theory. As they argue, both theories centrally appeal to the idea of mapping: information theory concerns the mapping between symbols in the input and output messages, while measurement theory concerns the mapping between objects and numbers. If measurement is taken to be analogous to symbol-manipulation, then Shannon-Weaver theory could provide a formalization of the syntax of measurement while measurement theory could provide a formalization of its semantics. Nonetheless, Mari (1999: 185) also warns that the analogy between communication and measurement systems is limited. Whereas a sender’s message can be known with arbitrary precision independently of its transmission, the state of an object cannot be known with arbitrary precision independently of its measurement.

Information-theoretic accounts of measurement were originally
developed by metrologists — experts in physical measurement and
standardization — with little involvement from philosophers.
Independently of developments in metrology, Bas van Fraassen (2008:
141–185) has recently proposed a conception of measurement in
which information plays a key role. He views measurement as composed
of two levels: on the physical level, the measuring apparatus
interacts with an object and produces a reading, e.g., a pointer
position.^{[16]}
On the abstract level, background theory represents the
object’s possible states on a parameter space. Measurement
locates an object on a sub-region of this abstract parameter space,
thereby reducing the range of possible states (2008: 164 and 172).
This reduction of possibilities amounts to the collection of
information about the measured object. Van Fraassen’s analysis
of measurement differs from information-theoretic accounts developed
in metrology in its explicit appeal to background theory, and in the
fact that it does not invoke the symbolic conception of information
developed by Shannon and Weaver.

## 7. Model-Based Accounts of Measurement

Since the early 2000s a new wave of philosophical scholarship has emerged that emphasizes the relationships between measurement and theoretical and statistical modeling (Morgan 2001; Boumans 2005a, 2015; Mari 2005b; Mari and Giordani 2013; Tal 2016, 2017; Parker 2017; Miyake 2017). According to model-based accounts, measurement consists of two levels: (i) a concrete process involving interactions between an object of interest, an instrument, and the environment; and (ii) a theoretical and/or statistical model of that process, where “model” denotes an abstract and local representation constructed from simplifying assumptions. The central goal of measurement according to this view is to assign values to one or more parameters of interest in the model in a manner that satisfies certain epistemic desiderata, in particular coherence and consistency.

Model-based accounts have been developed by studying measurement practices in the sciences, and particularly in metrology. Metrology, officially defined as the “science of measurement and its application” (JCGM 2012: 2.2), is a field of study concerned with the design, maintenance and improvement of measuring instruments in the natural sciences and engineering. Metrologists typically work at standardization bureaus or at specialized laboratories that are responsible for the calibration of measurement equipment, the comparison of standards and the evaluation of measurement uncertainties, among other tasks. It is only recently that philosophers have begun to engage with the rich conceptual issues underlying metrological practice, and particularly with the inferences involved in evaluating and improving the accuracy of measurement standards (Chang 2004; Boumans 2005a: Chap. 5, 2005b, 2007a; Frigerio et al. 2010; Teller 2013, 2018; Riordan 2015; Schlaudt and Huber 2015; Tal 2016a, 2018; Mitchell et al. 2017; Mößner and Nordmann 2017; de Courtenay et al. 2019).

A central motivation for the development of model-based accounts is the attempt to clarify the epistemological principles underlying aspects of measurement practice. For example, metrologists employ a variety of methods for the calibration of measuring instruments, the standardization and tracing of units and the evaluation of uncertainties (for a discussion of metrology, see the previous section). Traditional philosophical accounts such as mathematical theories of measurement do not elaborate on the assumptions, inference patterns, evidential grounds or success criteria associated with such methods. As Frigerio et al. (2010) argue, measurement theory is ill-suited for clarifying these aspects of measurement because it abstracts away from the process of measurement and focuses solely on the mathematical properties of scales. By contrast, model-based accounts take scale construction to be merely one of several tasks involved in measurement, alongside the definition of measured parameters, instrument design and calibration, object sampling and preparation, error detection and uncertainty evaluation, among others (2010: 145–7).

### 7.1 The roles of models in measurement

According to model-based accounts, measurement involves interaction between an object of interest (the “system under measurement”), an instrument (the “measurement system”) and an environment, which includes the measuring subjects. Other, secondary interactions may also be relevant for the determination of a measurement outcome, such as the interaction between the measuring instrument and the reference standards used for its calibration, and the chain of comparisons that trace the reference standard back to primary measurement standards (Mari 2003: 25). Measurement proceeds by representing these interactions with a set of parameters, and assigning values to a subset of those parameters (known as “measurands”) based on the results of the interactions. When measured parameters are numerical they are called “quantities”. Although measurands need not be quantities, a quantitative measurement scenario will be supposed in what follows.

Two sorts of measurement outputs are distinguished by model-based accounts [JCGM 2012: 2.9 & 4.1; Giordani and Mari 2012: 2146; Tal 2013]:

**Instrument indications**(or “readings”): these are properties of the measuring instrument in its final state after the measurement process is complete. Examples are digits on a display, marks on a multiple-choice questionnaire and bits stored in a device’s memory. Indications may be represented by numbers, but such numbers describe states of the instrument and should not be confused with measurement outcomes, which concern states of the object being measured.**Measurement outcomes**(or “results”): these are knowledge claims about the values of one or more quantities attributed to the object being measured, and are typically accompanied by a specification of the measurement unit and scale and an estimate of measurement uncertainty. For example, a measurement outcome may be expressed by the sentence “the mass of object*a*is 20±1 grams with a probability of 68%”.

As proponents of model-based accounts stress, inferences from instrument indications to measurement outcomes are nontrivial and depend on a host of theoretical and statistical assumptions about the object being measured, the instrument, the environment and the calibration process. Measurement outcomes are often obtained through statistical analysis of multiple indications, thereby involving assumptions about the shape of the distribution of indications and the randomness of environmental effects (Bogen and Woodward 1988: 307–310). Measurement outcomes also incorporate corrections for systematic effects, and such corrections are based on theoretical assumptions concerning the workings of the instrument and its interactions with the object and environment. For example, length measurements need to be corrected for the change of the measuring rod’s length with temperature, a correction which is derived from a theoretical equation of thermal expansion. Systematic corrections involve uncertainties of their own, for example in the determination of the values of constants, and these uncertainties are assessed through secondary experiments involving further theoretical and statistical assumptions. Moreover, the uncertainty associated with a measurement outcome depends on the methods employed for the calibration of the instrument. Calibration involves additional assumptions about the instrument, the calibrating apparatus, the quantity being measured and the properties of measurement standards (Rothbart and Slayden 1994; Franklin 1997; Baird 2004: Ch. 4; Soler et al. 2013). Another component of uncertainty originates from vagueness in the definition of the measurand, and is known as “definitional uncertainty” (Mari and Giordani 2013; Grégis 2015). Finally, measurement involves background assumptions about the scale type and unit system being used, and these assumptions are often tied to broader theoretical and technological considerations relating to the definition and realization of scales and units.

These various theoretical and statistical assumptions form the basis
for the construction of one or more models of the measurement process.
Unlike mathematical theories of measurement, where the term
“model” denotes a set-theoretical structure that
interprets a formal language, here the term “model”
denotes an abstract and local representation of a target system that
is constructed from simplifying
assumptions.^{[17]}
The relevant target system in this case is a measurement process,
that is, a system composed of a measuring instrument, objects or
events to be measured, the environment (including human operators),
secondary instruments and reference standards, the time-evolution of
these components, and their various interactions with each other.
Measurement is viewed as a set of procedures whose aim is to
coherently assign values to model parameters based on instrument
indications. Models are therefore seen as necessary preconditions for
the possibility of inferring measurement outcomes from instrument
indications, and as crucial for determining the content of measurement
outcomes. As proponents of model-based accounts emphasize, the same
indications produced by the same measurement process may be used to
establish different measurement outcomes depending on how the
measurement process is modeled, e.g., depending on which environmental
influences are taken into account, which statistical assumptions are
used to analyze noise, and which approximations are used in applying
background theory. As Luca Mari puts it,

any measurement result reports information that is meaningful only in the context of a metrological model, such a model being required to include a specification for all the entities that explicitly or implicitly appear in the expression of the measurement result. (2003: 25)

Similarly, models are said to provide the necessary context for evaluating various aspects of the goodness of measurement outcomes, including accuracy, precision, error and uncertainty (Boumans 2006, 2007a, 2009, 2012b; Mari 2005b).

Model-based accounts diverge from empiricist interpretations of
measurement theory in that they do not require relations among
measurement outcomes to be isomorphic or homomorphic to observable
relations among the items being measured (Mari 2000). Indeed,
according to model-based accounts relations among measured objects
need not be observable at all prior to their measurement (Frigerio et
al. 2010: 125). Instead, the key normative requirement of model-based
accounts is that values be assigned to model parameters in a coherent
manner. The coherence criterion may be viewed as a conjunction of two
sub-criteria: (i) coherence of model assumptions with relevant
background theories or other substantive presuppositions about the
quantity being measured; and (ii) objectivity, i.e., the mutual
consistency of measurement outcomes across different measuring
instruments, environments and
models^{[18]}
(Frigerio et al. 2010; Tal 2017a; Teller 2018). The first
sub-criterion is meant to ensure that the *intended* quantity
is being measured, while the second sub-criterion is meant to ensure
that measurement outcomes can be reasonably attributed to the measured
*object* rather than to some artifact of the measuring
instrument, environment or model. Taken together, these two
requirements ensure that measurement outcomes remain valid
independently of the specific assumptions involved in their
production, and hence that the context-dependence of measurement
outcomes does not threaten their general applicability.

### 7.2 Models and measurement in economics

Besides their applicability to physical measurement, model-based
analyses also shed light on measurement in economics. Like physical
quantities, values of economic variables often cannot be observed
directly and must be inferred from observations based on abstract and
idealized models. The nineteenth century economist William Jevons, for
example, measured changes in the value of gold by postulating certain
causal relationships between the value of gold, the supply of gold and
the general level of prices (Hoover and Dowell 2001: 155–159;
Morgan 2001: 239). As Julian Reiss (2001) shows, Jevons’
measurements were made possible by using two models: a
causal-theoretical model of the economy, which is based on the
assumption that the quantity of gold has the capacity to raise or
lower prices; and a statistical model of the data, which is based on
the assumption that local variations in prices are mutually
independent and therefore cancel each other out when averaged. Taken
together, these models allowed Jevons to infer the change in the value
of gold from data concerning the historical prices of various
goods.^{[19]}

The ways in which models function in economic measurement have led some philosophers to view certain economic models as measuring instruments in their own right, analogously to rulers and balances (Boumans 1999, 2005c, 2006, 2007a, 2009, 2012a, 2015; Morgan 2001). Marcel Boumans explains how macroeconomists are able to isolate a variable of interest from external influences by tuning parameters in a model of the macroeconomic system. This technique frees economists from the impossible task of controlling the actual system. As Boumans argues, macroeconomic models function as measuring instruments insofar as they produce invariant relations between inputs (indications) and outputs (outcomes), and insofar as this invariance can be tested by calibration against known and stable facts. When such model-based procedures are combined with expert judgment, they can produce reliable measurements of economic phenomena even outside controlled laboratory settings (Boumans 2015: Chap. 5).

### 7.3 Psychometric models and construct validity

Another area where models play a central role in measurement is psychology. The measurement of most psychological attributes, such as intelligence, anxiety and depression, does not rely on homomorphic mappings of the sort espoused by the Representational Theory of Measurement (Wilson 2013: 3766). Instead, psychometric theory relies predominantly on the development of abstract models that are meant to predict subjects’ performance in certain tasks. These models are constructed from substantive and statistical assumptions about the psychological attribute being measured and its relation to each measurement task. For example, Item Response Theory, a popular approach to psychological measurement, employs a variety of models to evaluate the reliability and validity of questionnaires. Consider a questionnaire that is meant to assess English language comprehension (the “ability”), by presenting subjects with a series of yes/no questions (the “items”). One of the simplest models used to calibrate such questionnaires is the Rasch model (Rasch 1960). This model supposes a straightforward algebraic relation—known as the “log of the odds”—between the probability that a subject will answer a given item correctly, the difficulty of that particular item, and the subject’s ability. New questionnaires are calibrated by testing the fit between their indications and the predictions of the Rasch model and assigning difficulty levels to each item accordingly. The model is then used in conjunction with the questionnaire to infer levels of English language comprehension (outcomes) from raw questionnaire scores (indications) (Wilson 2013; Mari and Wilson 2014).

The sort of statistical calibration (or “scaling”) provided by Rasch models yields repeatable results, but it is often only a first step towards full-fledged psychological measurement. Psychologists are typically interested in the results of a measure not for its own sake, but for the sake of assessing some underlying and latent psychological attribute, e.g., English language comprehension. A good fit between item responses and a statistical model does not yet determine what the questionnaire is measuring. The process of establishing that a procedure measures the intended psychological attribute is known as “validation”. One way of validating a psychometric instrument is to test whether different procedures that are intended to measure the same latent attribute provide consistent results. Such testing belongs to a family of validation techniques known as “construct validation”. A construct is an abstract representation of the latent attribute intended to be measured, and

reflects a hypothesis […] that a variety of behaviors will correlate with one another in studies of individual differences and/or will be similarly affected by experimental manipulations. (Nunnally & Bernstein 1994: 85)

Constructs are denoted by variables in a model that predicts which correlations would be observed among the indications of different measures if they are indeed measures of the same attribute. Such models involve substantive assumptions about the attribute, including its internal structure and its relations to other attributes, and statistical assumptions about the correlation among different measures (Campbell & Fiske 1959; Nunnally & Bernstein 1994: Ch. 3; Angner 2008).

In recent years, philosophers of science have become increasingly interested in psychometrics and the concept of validity. One debate concerns the ontological status of latent psychological attributes. Denny Borsboom has argued against operationalism about latent attributes, and in favour of defining validity in a manner that embraces realism: “a test is valid for measuring an attribute if and only if a) the attribute exists, and b) variations in the attribute causally produce variations in the outcomes of the measurement procedure” (2005: 150; see also Hood 2009, 2013; Feest 2020). Elina Vessonen has defended a moderate form of operationalism about psychological attributes, and argued that moderate operationalism is compatible with a cautious type of realism (2019). Another recent discussion focuses on the justification for construct validation procedures. According to Anna Alexandrova, construct validation is in principle a justified methodology, insofar as it establishes coherence with theoretical assumptions and background knowledge about the latent attribute. However, Alexandrova notes that in practice psychometricians who intend to measure happiness and well-being often avoid theorizing about these constructs, and instead appeal to respondents’ folk beliefs. This defeats the purpose of construct validation and turns it into a narrow, technical exercise (Alexandrova and Haybron 2016; Alexandrova 2017; see also McClimans et al. 2017).

A more fundamental criticism leveled against psychometrics is that it dogmatically presupposes that psychological attributes can be quantified. Michell (2000, 2004b) argues that psychometricians have not made serious attempts to test whether the attributes they purport to measure have quantitative structure, and instead adopted an overly loose conception of measurement that disguises this neglect. In response, Borsboom and Mellenbergh (2004) argue that Item Response Theory provides probabilistic tests of the quantifiability of attributes. Psychometricians who construct a statistical model initially hypothesize that an attribute is quantitative, and then subject the model to empirical tests. When successful, such tests provide indirect confirmation of the initial hypothesis, e.g. by showing that the attribute has an additive conjoint structure (see also Vessonen 2020).

Several scholars have pointed out similarities between the ways models
are used to standardize measurable quantities in the natural and
social sciences. For example, Mark Wilson (2013) argues that
psychometric models can be viewed as tools for constructing
measurement standards in the same sense of “measurement
standard” used by metrologists. Others have raised doubts about
the feasibility and desirability of adopting the example of the
natural sciences when standardizing constructs in the social sciences.
Nancy Cartwright and Rosa Runhardt (2014) discuss
“Ballung” concepts, a term they borrow from Otto Neurath
to denote concepts with a fuzzy and context-dependent scope. Examples
of Ballung concepts are race, poverty, social exclusion, and the
quality of PhD programs. Such concepts are too multifaceted to be
measured on a single metric without loss of meaning, and must be
represented either by a matrix of indices or by several different
measures depending on which goals and values are at play (see also
Bradburn, Cartwright, & Fuller 2016, Other Internet Resources).
Alexandrova (2008) points out that ethical considerations bear on
questions about the validity of measures of well-being no less than
considerations of reproducibility. Such ethical considerations are
context sensitive, and can only be applied piecemeal. In a similar
vein, Leah McClimans (2010) argues that uniformity is not always an
appropriate goal for designing questionnaires, as the open-endedness
of questions is often both unavoidable and desirable for obtaining
relevant information from
subjects.^{[20]}
The intertwining of ethical and epistemic considerations is
especially clear when psychometric questionnaires are used in medical
contexts to evaluate patient well-being and mental health. In such
cases, small changes to the design of a questionnaire or the analysis
of its results may result in significant harms or benefits to patients
(McClimans 2017; Stegenga 2018, Chap. 8). These insights highlight the
value-laden and contextual nature of the measurement of mental and
social phenomena.

## 8. The Epistemology of Measurement

The development of model-based accounts discussed in the previous section is part of a larger, “epistemic turn” in the philosophy of measurement that occurred in the early 2000s. Rather than emphasizing the mathematical foundations, metaphysics or semantics of measurement, philosophical work in recent years tends to focus on the presuppositions and inferential patterns involved in concrete practices of measurement, and on the historical, social and material dimensions of measuring. The philosophical study of these topics has been referred to as the “epistemology of measurement” (Mari 2003, 2005a; Leplège 2003; Tal 2017a). In the broadest sense, the epistemology of measurement is the study of the relationships between measurement and knowledge. Central topics that fall under the purview of the epistemology of measurement include the conditions under which measurement produces knowledge; the content, scope, justification and limits of such knowledge; the reasons why particular methodologies of measurement and standardization succeed or fail in supporting particular knowledge claims, and the relationships between measurement and other knowledge-producing activities such as observation, theorizing, experimentation, modelling and calculation. In pursuing these objectives, philosophers are drawing on the work of historians and sociologists of science, who have been investigating measurement practices for a longer period (Wise and Smith 1986; Latour 1987: Ch. 6; Schaffer 1992; Porter 1995, 2007; Wise 1995; Alder 2002; Galison 2003; Gooday 2004; Crease 2011), as well as on the history and philosophy of scientific experimentation (Harré 1981; Hacking 1983; Franklin 1986; Cartwright 1999). The following subsections survey some of the topics discussed in this burgeoning body of literature.

### 8.1 Standardization and scientific progress

A topic that has attracted considerable philosophical attention in
recent years is the selection and improvement of measurement
standards. Generally speaking, to standardize a quantity concept is to
prescribe a determinate way in which that concept is to be applied to
concrete
particulars.^{[21]}
To standardize a measuring instrument is to assess how well the
outcomes of measuring with that instrument fit the prescribed mode of
application of the relevant concept.
^{[22]}
The term “measurement standard” accordingly has at least
two meanings: on the one hand, it is commonly used to refer to
abstract rules and definitions that regulate the use of quantity
concepts, such as the definition of the meter. On the other hand, the
term “measurement standard” is also commonly used to refer
to the concrete artifacts and procedures that are deemed exemplary of
the application of a quantity concept, such as the metallic bar that
served as the standard meter until 1960. This duality in meaning
reflects the dual nature of standardization, which involves both
abstract and concrete aspects.

In
Section 4
it was noted that standardization involves choices among nontrivial
alternatives, such as the choice among different thermometric fluids
or among different ways of marking equal duration. These choices are
nontrivial in the sense that they affect whether or not the same
temperature (or time) intervals are deemed equal, and hence affect
whether or not statements of natural law containing the term
“temperature” (or “time”) come out true.
Appealing to theory to decide which standard is more accurate would be
circular, since the theory cannot be determinately applied to
particulars prior to a choice of measurement standard. This
circularity has been variously called the “problem of
coordination” (van Fraassen 2008: Ch. 5) and the “problem
of nomic measurement” (Chang 2004: Ch. 2). As already mentioned,
conventionalists attempted to escape the circularity by positing *a
priori* statements, known as “coordinative
definitions”, which were supposed to link quantity-terms with
specific measurement operations. A drawback of this solution is that
it supposes that choices of measurement standard are arbitrary and
static, whereas in actual practice measurement standards tend to be
chosen based on empirical considerations and are eventually improved
or replaced with standards that are deemed more accurate.

A new strand of writing on the problem of coordination has emerged in recent years, consisting most notably of the works of Hasok Chang (2001, 2004, 2007; Barwich and Chang 2015) and Bas van Fraassen (2008: Ch. 5; 2009, 2012; see also Padovani 2015, 2017; Michel 2019). These works take a historical and coherentist approach to the problem. Rather than attempting to avoid the problem of circularity completely, as their predecessors did, they set out to show that the circularity is not vicious. Chang argues that constructing a quantity-concept and standardizing its measurement are co-dependent and iterative tasks. Each “epistemic iteration” in the history of standardization respects existing traditions while at the same time correcting them (Chang 2004: Ch. 5). The pre-scientific concept of temperature, for example, was associated with crude and ambiguous methods of ordering objects from hot to cold. Thermoscopes, and eventually thermometers, helped modify the original concept and made it more precise. With each such iteration the quantity concept was re-coordinated to a more stable set of standards, which in turn allowed theoretical predictions to be tested more precisely, facilitating the subsequent development of theory and the construction of more stable standards, and so on.

How this process avoids vicious circularity becomes clear when we look
at it either “from above”, i.e., in retrospect given our
current scientific knowledge, or “from within”, by looking
at historical developments in their original context (van Fraassen
2008: 122). From either vantage point, coordination succeeds because
it increases coherence among elements of theory and instrumentation.
The questions “what counts as a measurement of quantity
*X*?” and “what is quantity *X*?”,
though unanswerable independently of each other, are addressed
together in a process of mutual refinement. It is only when one adopts
a foundationalist view and attempts to find a starting point for
coordination free of presupposition that this historical process
erroneously appears to lack epistemic justification (2008: 137).

The new literature on coordination shifts the emphasis of the
discussion from the definitions of quantity-terms to the
*realizations* of those definitions. In metrological jargon, a
“realization” is a physical instrument or procedure that
approximately satisfies a given definition (cf. JCGM 2012: 5.1).
Examples of metrological realizations are the official prototypes of
the kilogram and the cesium fountain clocks used to standardize the
second. Recent studies suggest that the methods used to design,
maintain and compare realizations have a direct bearing on the
practical application of concepts of quantity, unit and scale, no less
than the definitions of those concepts (Riordan 2015; Tal 2016). The
relationship between the definition and realizations of a unit becomes
especially complex when the definition is stated in theoretical terms.
Several of the base units of the International System (SI) —
including the meter, kilogram, ampere, kelvin and mole — are no
longer defined by reference to any specific kind of physical system,
but by fixing the numerical value of a fundamental physical constant.
The kilogram, for example, was redefined in 2019 as the unit of mass
such that the numerical value of the Planck constant is exactly
6.62607015 × 10^{-34} kg m^{2} s^{-1}
(BIPM 2019:131). Realizing the kilogram under this definition is a
highly theory-laden task. The study of the practical realization of
such units has shed new light on the evolving relationships between
measurement and theory (Tal 2018; de Courtenay et al 2019; Wolff
2020b).

### 8.2 Theory-ladenness of measurement

As already discussed above (Sections 7 and 8.1), theory and measurement are interdependent both historically and conceptually. On the historical side, the development of theory and measurement proceeds through iterative and mutual refinements. On the conceptual side, the specification of measurement procedures shapes the empirical content of theoretical concepts, while theory provides a systematic interpretation for the indications of measuring instruments. This interdependence of measurement and theory may seem like a threat to the evidential role that measurement is supposed to play in the scientific enterprise. After all, measurement outcomes are thought to be able to test theoretical hypotheses, and this seems to require some degree of independence of measurement from theory. This threat is especially clear when the theoretical hypothesis being tested is already presupposed as part of the model of the measuring instrument. To cite an example from Franklin et al. (1989: 230):

There would seem to be, at first glance, a vicious circularity if one were to use a mercury thermometer to measure the temperature of objects as part of an experiment to test whether or not objects expand as their temperature increases.

Nonetheless, Franklin et al. conclude that the circularity is not vicious. The mercury thermometer could be calibrated against another thermometer whose principle of operation does not presuppose the law of thermal expansion, such as a constant-volume gas thermometer, thereby establishing the reliability of the mercury thermometer on independent grounds. To put the point more generally, in the context of local hypothesis-testing the threat of circularity can usually be avoided by appealing to other kinds of instruments and other parts of theory.

A different sort of worry about the evidential function of measurement arises on the global scale, when the testing of entire theories is concerned. As Thomas Kuhn (1961) argues, scientific theories are usually accepted long before quantitative methods for testing them become available. The reliability of newly introduced measurement methods is typically tested against the predictions of the theory rather than the other way around. In Kuhn’s words, “The road from scientific law to scientific measurement can rarely be traveled in the reverse direction” (1961: 189). For example, Dalton’s Law, which states that the weights of elements in a chemical compound are related to each other in whole-number proportions, initially conflicted with some of the best known measurements of such proportions. It is only by assuming Dalton’s Law that subsequent experimental chemists were able to correct and improve their measurement techniques (1961: 173). Hence, Kuhn argues, the function of measurement in the physical sciences is not to test the theory but to apply it with increasing scope and precision, and eventually to allow persistent anomalies to surface that would precipitate the next crisis and scientific revolution. Note that Kuhn is not claiming that measurement has no evidential role to play in science. Instead, he argues that measurements cannot test a theory in isolation, but only by comparison to some alternative theory that is proposed in an attempt to account for the anomalies revealed by increasingly precise measurements (for an illuminating discussion of Kuhn’s thesis see Hacking 1983: 243–5).

Traditional discussions of theory-ladenness, like those of Kuhn, were
conducted against the background of the logical positivists’
distinction between theoretical and observational language. The
theory-ladenness of measurement was correctly perceived as a threat to
the possibility of a clear demarcation between the two languages.
Contemporary discussions, by contrast, no longer present
theory-ladenness as an epistemological threat but take for granted
that some level of theory-ladenness is a prerequisite for measurements
to have any evidential power. Without some minimal substantive
assumptions about the quantity being measured, such as its amenability
to manipulation and its relations to other quantities, it would be
impossible to interpret the indications of measuring instruments and
hence impossible to ascertain the evidential relevance of those
indications. This point was already made by Pierre Duhem (1906:
153–6; see also Carrier 1994: 9–19). Moreover,
contemporary authors emphasize that theoretical assumptions play
crucial roles in correcting for measurement errors and evaluating
measurement uncertainties. Indeed, physical measurement procedures
become *more* accurate when the model underlying them is
de-idealized, a process which involves increasing the theoretical
richness of the model (Tal 2011).

The acknowledgment that theory is crucial for guaranteeing the
evidential reliability of measurement draws attention to the
“problem of observational grounding”, which is an inverse
challenge to the traditional threat of theory-ladenness (Tal 2016b).
The challenge is to specify what role *observation* plays in
measurement, and particularly what sort of connection with observation
is necessary and/or sufficient to allow measurement to play an
evidential role in the sciences. This problem is especially clear when
one attempts to account for the increasing use of computational
methods for performing tasks that were traditionally accomplished by
measuring instruments. As Margaret Morrison (2009) and Wendy Parker
(2017) argue, there are cases where reliable quantitative information
is gathered about a target system with the aid of a computer
simulation, but in a manner that satisfies some of the central
desiderata for measurement such as being empirically grounded and
backward-looking (see also Lusk 2016). Such information does not rely
on signals transmitted from the particular object of interest to the
instrument, but on the use of theoretical and statistical models to
process empirical data about related objects. For example, data
assimilation methods are customarily used to estimate past atmospheric
temperatures in regions where thermometer readings are not available.
Some methods do this by fitting a computational model of the
atmosphere’s behavior to a combination of available data from
nearby regions and a model-based forecast of conditions at the time of
observation (Parker 2017). These estimations are then used in various
ways, including as data for evaluating forward-looking climate models.
Regardless of whether one calls these estimations
“measurements”, they challenge the idea that producing
reliable quantitative evidence about the state of an object requires
observing that object, however loosely one understands the term
“observation”.^{[23]}

### 8.3 Accuracy and precision

Two key aspects of the reliability of measurement outcomes are
accuracy and precision. Consider a series of repeated weight
measurements performed on a particular object with an equal-arms
balance. From a realist, “error-based” perspective, the
outcomes of these measurements are *accurate* if they are close
to the true value of the quantity being measured—in our case,
the true ratio of the object’s weight to the chosen
unit—and *precise* if they are close to each other. An
analogy often cited to clarify the error-based distinction is that of
arrows shot at a target, with accuracy analogous to the closeness of
hits to the bull’s eye and precision analogous to the tightness
of spread of hits (cf. JCGM 2012: 2.13 & 2.15, Teller 2013: 192).
Though intuitive, the error-based way of carving the distinction
raises an epistemological difficulty. It is commonly thought that the
exact true values of most quantities of interest to science are
unknowable, at least when those quantities are measured on continuous
scales. If this assumption is granted, the accuracy with which such
quantities are measured cannot be known with exactitude, but only
estimated by comparing inaccurate measurements to each other. And yet
it is unclear why convergence among inaccurate measurements should be
taken as an indication of truth. After all, the measurements could be
plagued by a common bias that prevents their individual inaccuracies
from cancelling each other out when averaged. In the absence of
cognitive access to true values, how is the evaluation of measurement
accuracy possible?

In answering this question, philosophers have benefited from studying the various senses of the term “measurement accuracy” as used by practicing scientists. At least five different senses have been identified: metaphysical, epistemic, operational, comparative and pragmatic (Tal 2011: 1084–5). In particular, the epistemic or “uncertainty-based” sense of the term is metaphysically neutral and does not presuppose the existence of true values. Instead, the accuracy of a measurement outcome is taken to be the closeness of agreement among values reasonably attributed to a quantity given available empirical data and background knowledge (cf. JCGM 2012: 2.13 Note 3; Giordani & Mari 2012; de Courtenay and Grégis 2017). Thus construed, measurement accuracy can be evaluated by establishing robustness among the consequences of models representing different measurement processes (Basso 2017; Tal 2017b; Bokulich 2020; Staley 2020).

Under the uncertainty-based conception, imprecision is a special type of inaccuracy. For example, the inaccuracy of weight measurements is the breadth of spread of values that are reasonably attributed to the object’s weight given the indications of the balance and available background knowledge about the way the balance works and the standard weights used. The imprecision of these measurements is the component of inaccuracy arising from uncontrolled variations to the indications of the balance over repeated trials. Other sources of inaccuracy besides imprecision include imperfect corrections to systematic errors, inaccurately known physical constants, and vague measurand definitions, among others (see Section 7.1).

Paul Teller (2018) raises a different objection to the error-based
conception of measurement accuracy. He argues against an assumption he
calls “measurement accuracy realism”, according to which
measurable quantities have definite values in reality. Teller argues
that this assumption is false insofar as it concerns the quantities
habitually measured in physics, because any specification of definite
values (or value ranges) for such quantities involves idealization and
hence cannot refer to anything in reality. For example, the concept
usually understood by the phrase “the velocity of sound in
air” involves a host of implicit idealizations concerning the
uniformity of the air’s chemical composition, temperature and
pressure as well as the stability of units of measurement. Removing
these idealizations completely would require adding infinite amount of
detail to each specification. As Teller argues, measurement accuracy
should itself be understood as a useful idealization, namely as a
concept that allows scientists to assess coherence and consistency
among measurement outcomes *as if* the linguistic expression of
these outcomes latched onto anything in the world. Precision is
similarly an idealized concept, which is based on an open-ended and
indefinite specification of what counts as repetition of measurement
under “the same” circumstances (Teller 2013: 194).

## Bibliography

- Alder, K., 2002,
*The Measure of All Things: The Seven-Year Odyssey and Hidden Error That Transformed the World*, New York: The Free Press. - Alexandrova, A., 2008, “First Person Reports and the
Measurement of Happiness”,
*Philosophical Psychology*, 21(5): 571–583. - –––, 2017,
*A Philosophy for the Science of Well-Being*, Oxford: Oxford University Press. - Alexandrova, A. and D.M. Haybron, 2016, “Is Construct
Validation Valid?”
*Philosophy of Science*, 83(5): 1098–1109. - Angner, E., 2008, “The Philosophical Foundations of
Subjective Measures of Well-Being”, in
*Capabilities and Happiness*, L. Bruni, F. Comim, and M. Pugno (eds.), Oxford: Oxford University Press. - –––, 2013, “Is it Possible to Measure
Happiness? The argument from measurability”,
*European Journal for Philosophy of Science*, 3: 221–240. - Aristotle,
*Categories*, in*The Complete Works of Aristotle*, Volume I, J. Barnes (ed.), Princeton: Princeton University Press, 1984. - Baird, D., 2004,
*Thing Knowledge: A Philosophy of Scientific Instruments*, Berkeley: University of California Press. - Barwich, A.S., and H. Chang, 2015, “Sensory Measurements:
Coordination and Standardization”,
*Biological Theory*, 10(3): 200–211. - Basso, A., 2017, “The Appeal to Robustness in Measurement
Practice”,
*Studies in History and Philosophy of Science Part A*, 65: 57–66. - Biagioli, F., 2016,
*Space, Number, and Geometry from Helmholtz to Cassirer*, Dordrecht: Springer. - –––, 2018, “Cohen and Helmholtz on the
Foundations of Measurement”, in C. Damböck (ed.),
*Philosophie Und Wissenschaft Bei Hermann Cohen – Philosophy and Science in Hermann Cohen*, Dordrecht: Springer, 77–100. - BIPM (Bureau International des Poids et Mesures), 2019,
*The International System of Units*(SI Brochure), 9th Edition. [BIPM 2019 available online] - Bogen, J. and J. Woodward, 1988, “Saving the
Phenomena”,
*The Philosophical Review*, 97(3): 303–352. - Bokulich, A., 2020, “Calibration, Coherence, and Consilience
in Radiometric Measures of Geologic Time”,
*Philosophy of Science*, 87(3): 425–56. - Boring, E.G., 1945, “The use of operational definitions in science”, in Boring et al. 1945: 243–5.
- Boring, E.G., P.W. Bridgman, H. Feigl, H. Israel, C.C Pratt, and
B.F. Skinner, 1945, “Symposium on Operationism”,
*The Psychological Review*, 52: 241–294. - Borsboom, D., 2005,
*Measuring the Mind: Conceptual Issues in Contemporary Psychometrics*, Cambridge: Cambridge University Press. - Borsboom, D., and G.J. Mellenbergh, 2004, “Why psychometrics
is not pathological: A comment on Michell”,
*Theory & Psychology*, 14: 105–120. - Boumans, M., 1999, “Representation and Stability in Testing
and Measuring Rational Expectations”,
*Journal of Economic Methodology*, 6(3): 381–401. - –––, 2005a,
*How Economists Model the World into Numbers*, New York: Routledge. - –––, 2005b, “Truth versus
Precision”, in
*Logic, Methodology and Philosophy of Science: Proceedings of the Twelfth International Congress*, P. Hájek, L. Valdés-Villanueva, and D. Westerstahl (eds.), London: College Publications, pp. 257–269. - –––, 2005c, “Measurement outside the
laboratory”,
*Philosophy of Science*, 72: 850–863. - –––, 2006, “The difference between
answering a ‘why’ question and answering a ‘how
much’ question”, in
*Simulation: Pragmatic Construction of Reality*, J. Lenhard, G Küppers, and T Shinn (eds.), Dordrecht: Springer, pp. 107–124. - –––, 2007a, “Invariance and Calibration”, in 2007: 231–248.
- ––– (ed.), 2007b,
*Measurement in Economics: A Handbook*, London: Elsevier. - –––, 2009, “Grey-Box Understanding in
Economics”, in
*Scientific Understanding: Philosophical Perspectives*, H.W. de Regt, S. Leonelli, and K. Eigner, Pittsburgh: University of Pittsburgh Press, pp. 210–229. - –––, 2012a, “Modeling Strategies for
Measuring Phenomena In- and Outside the Laboratory”, in
*EPSA Philosophy of Science: Amsterdam 2009*(The European Philosophy of Science Association Proceedings), H.W. de Regt, S. Hartmann, and S. Okasha (eds.), Dordrecht: Springer, pp. 1–11. - –––, 2012b, “Measurement in
Economics”, in
*Philosophy of Economics*(Handbook of the Philosophy of Science: Volume 13), University of Mäki (ed.), Oxford: Elsevier, pp. 395–423. - –––, 2015,
*Science Outside the Laboratory: Measurement in Field Science and Economics*, Oxford: Oxford University Press. - Bridgman, P.W., 1927,
*The Logic of Modern Physics*, New York: Macmillan. - –––, 1938, “Operational Analysis”,
*Philosophy of Science*, 5: 114–131. - –––, 1945, “Some General Principles of Operational Analysis”, in Boring et al. 1945: 246–249.
- –––, 1956, “The Present State of Operationalism”, in Frank 1956: 74–79.
- Brillouin, L., 1962,
*Science and information theory*, New York: Academic Press, 2nd edition. - Byerly, H.C. and V.A. Lazara, 1973, “Realist Foundations of
Measurement”,
*Philosophy of Science*, 40(1): 10–28. - Campbell, N.R., 1920,
*Physics: the Elements*, London: Cambridge University Press. - Campbell, D.T. and D.W. Fiske, 1959, “Convergent and
discriminant validation by the multitrait-multimethod matrix”,
*Psychological Bulletin*, 56(2): 81–105. - Cantù, P. and O. Schlaudt (eds.), 2013, “The
Epistemological Thought of Otto Hölder”, special issue of
*Philosophia Scientiæ*, 17(1). - Carnap, R., 1966,
*Philosophical foundations of physics*, G. Martin (ed.), reprinted as*An Introduction to the Philosophy of Science*, NY: Dover, 1995. - Carrier, M., 1994,
*The Completeness of Scientific Theories: On the Derivation of Empirical Indicators Within a Theoretical Framework: the Case of Physical Geometry*, The University of Western Ontario Series in Philosophy of Science Vol. 53, Dordrecht: Kluwer. - Cartwright, N.L., 1999,
*The Dappled World: A Study of the Boundaries of Science*, Cambridge: Cambridge University Press. - Cartwright, N.L. and R. Runhardt, 2014, “Measurement”,
in N.L. Cartwright and E. Montuschi (eds.),
*Philosophy of Social Science: A New Introduction*, Oxford: Oxford University Press, pp. 265–287. - Chang, H., 2001, “Spirit, air, and quicksilver: The search
for the ‘real’ scale of temperature”,
*Historical Studies in the Physical and Biological Sciences*, 31(2): 249–284. - –––, 2004,
*Inventing Temperature: Measurement and Scientific Progress*, Oxford: Oxford University Press. - –––, 2007, “Scientific Progress: Beyond
Foundationalism and Coherentism”,
*Royal Institute of Philosophy Supplement*, 61: 1–20. - –––, 2009, “Operationalism”,
*The Stanford Encyclopedia of Philosophy*(Fall 2009 Edition), E.N. Zalta (ed.), URL= <https://plato.stanford.edu/archives/fall2009/entries/operationalism/> - Chang, H. and N.L. Cartwright, 2008, “Measurement”, in
*The Routledge Companion to Philosophy of Science*, S. Psillos and M. Curd (eds.), New York: Routledge, pp. 367–375. - Clagett, M., 1968,
*Nicole Oresme and the medieval geometry of qualities and motions*, Madison: University of Wisconsin Press. - Cohen, M.R. and E. Nagel, 1934,
*An introduction to logic and scientific method*, New York: Harcourt, Brace & World. - Crease, R.P., 2011,
*World in the Balance: The Historic Quest for an Absolute System of Measurement*, New York and London: W.W. Norton. - Darrigol, O., 2003, “Number and measure: Hermann von
Helmholtz at the crossroads of mathematics, physics, and
psychology”,
*Studies in History and Philosophy of Science Part A*, 34(3): 515–573. - de Courtenay, N., O. Darrigol, and O. Schlaudt (eds.), 2019,
*The Reform of the International System of Units (SI): Philosophical, Historical and Sociological Issues*, London and New York: Routledge. - de Courtenay, N. and F. Grégis, 2017, “The evaluation
of measurement uncertainties and its epistemological
ramifications”,
*Studies in History and Philosophy of Science*(Part A), 65: 21–32. - Diehl, C.E., 2012,
*The Theory of Intensive Magnitudes in Leibniz and Kant*, Ph.D. Dissertation, Princeton University. [Diehl 2012 available online] - Diez, J.A., 1997a, “A Hundred Years of Numbers. An
Historical Introduction to Measurement Theory
1887–1990—Part 1”,
*Studies in History and Philosophy of Science*, 28(1): 167–185. - –––, 1997b, “A Hundred Years of Numbers.
An Historical Introduction to Measurement Theory
1887–1990—Part 2”,
*Studies in History and Philosophy of Science*, 28(2): 237–265. - Dingle, H., 1950, “A Theory of Measurement”,
*The British Journal for the Philosophy of Science*, 1(1): 5–26. - Duhem, P., 1906,
*The Aim and Structure of Physical Theory*, P.P. Wiener (trans.), New York: Atheneum, 1962. - Ellis, B., 1966,
*Basic Concepts of Measurement*, Cambridge: Cambridge University Press. - Euclid,
*Elements*, in*The Thirteen Books of Euclid’s Elements*, T.L. Heath (trans.), Cambridge: Cambridge University Press, 1908. - Fechner, G., 1860,
*Elements of Psychophysics*, H.E. Adler (trans.), New York: Holt, Reinhart & Winston, 1966. - Feest, U., 2005, “Operationism in Psychology: What the
Debate Is About, What the Debate Should Be About”,
*Journal of the History of the Behavioral Sciences*, 41(2): 131–149. - –––, 2020, “Construct Validity in
Psychological Tests–the Case of Implicit Social
Cognition”,
*European Journal for Philosophy of Science*, 10(1): 4. - Ferguson, A., C.S. Myers, R.J. Bartlett, H. Banister, F.C.
Bartlett, W. Brown, N.R. Campbell, K.J.W. Craik, J. Drever, J. Guild,
R.A. Houstoun, J.O. Irwin, G.W.C. Kaye, S.J.F. Philpott, L.F.
Richardson, J.H. Shaxby, T. Smith, R.H. Thouless, and W.S. Tucker,
1940, “Quantitative estimates of sensory events”,
*Advancement of Science*, 2: 331–349. (The final report of a committee appointed by the British Association for the Advancement of Science in 1932 to consider the possibility of measuring intensities of sensation. See Michell 1999, Ch 6. for a detailed discussion.) - Finkelstein, L., 1975, “Representation by symbol systems as
an extension of the concept of measurement”,
*Kybernetes*, 4(4): 215–223. - –––, 1977, “Introductory article”,
(instrument science),
*Journal of Physics E: Scientific Instruments*, 10(6): 566–572. - Frank, P.G. (ed.), 1956,
*The Validation of Scientific Theories*. Boston: Beacon Press. (Chapter 2, “The Present State of Operationalism” contains papers by H. Margenau, G. Bergmann, C.G. Hempel, R.B. Lindsay, P.W. Bridgman, R.J. Seeger, and A. Grünbaum) - Franklin, A., 1986,
*The Neglect of Experiment*, Cambridge: Cambridge University Press. - –––, 1997, “Calibration”,
*Perspectives on Science*, 5(1): 31–80. - Franklin, A., M. Anderson, D. Brock, S. Coleman, J. Downing, A.
Gruvander, J. Lilly, J. Neal, D. Peterson, M. Price, R. Rice, L.
Smith, S. Speirer, and D. Toering, 1989, “Can a Theory-Laden
Observation Test the Theory?”,
*The British Journal for the Philosophy of Science*, 40(2): 229–231. - Frigerio, A., A. Giordani, and L. Mari, 2010, “Outline of a
general model of measurement”,
*Synthese*, 175(2): 123–149. - Galison, P., 2003,
*Einstein’s Clocks, Poincaré’s Maps: Empires of Time*, New York and London: W.W. Norton. - Gillies, D.A., 1972, “Operationalism”,
*Synthese*, 25(1): 1–24. - Giordani, A., and L. Mari, 2012, “Measurement, models, and
uncertainty”,
*IEEE Transactions on Instrumentation and Measurement*, 61(8): 2144–2152. - Gooday, G., 2004,
*The Morals of Measurement: Accuracy, Irony and Trust in Late Victorian Electrical Practice*, Cambridge: Cambridge University Press. - Grant, E., 1996,
*The foundations of modern science in the middle ages*, Cambridge: Cambridge University Press. - Grattan-Guinness, I., 1996, “Numbers, magnitudes, ratios,
and proportions in Euclid’s Elements: How did he handle
them?”,
*Historia Mathematica*, 23: 355–375. - Grégis, F., 2015, “Can We Dispense with the Notion of ‘True Value’ in Metrology?”, in Schlaudt and Huber 2015, 81–93.
- Guala, F., 2008, “Paradigmatic Experiments: The Ultimatum
Game from Testing to Measurement Device”,
*Philosophy of Science*, 75: 658–669. - Hacking, I, 1983,
*Representing and Intervening*, Cambridge: Cambridge University Press. - Harré, R., 1981,
*Great Scientific Experiments: Twenty Experiments that Changed our View of the World*, Oxford: Phaidon Press. - Hartley, R.V., 1928, “Transmission of information”,
*Bell System technical journal*, 7(3): 535–563. - Heidelberger, M., 1993a,
*Nature from Within: Gustav Theodore Fechner and His Psychophysical Worldview*, C. Klohr (trans.), Pittsburgh: University of Pittsburgh Press, 2004. - –––, 1993b, “Fechner’s impact for
measurement theory”, commentary on D.J. Murray, “A
perspective for viewing the history of psychophysics”,
*Behavioural and Brain Sciences*, 16(1): 146–148. - von Helmholtz, H., 1887,
*Counting and measuring*, C.L. Bryan (trans.), New Jersey: D. Van Nostrand, 1930. - Hempel, C.G., 1952,
*Fundamentals of concept formation in empirical science*, International Encyclopedia of Unified Science, Vol. II. No. 7, Chicago and London: University of Chicago Press. - –––, 1956, “A logical appraisal of operationalism”, in Frank 1956: 52–67.
- –––, 1966,
*Philosophy of Natural Science*, Englewood Cliffs, N.J.: Prentice-Hall. - Hölder, O., 1901, “Die Axiome der Quantität und
die Lehre vom Mass”,
*Berichte über die Verhandlungen der Königlich Sächsischen Gesellschaft der Wissenschaften zu Leipzig, Mathematische-Physische Klasse*, 53: 1–64. (for an excerpt translated into English see Michell and Ernst 1996) - Hood, S.B., 2009, “Validity in Psychological Testing and
Scientific Realism”,
*Theory & Psychology*, 19(4): 451–473. - –––, 2013, “Psychological Measurement and
Methodological Realism”,
*Erkenntnis*, 78(4): 739–761. - Hoover, K. and M. Dowell, 2001, “Measuring Causes: Episodes
in the Quantitative Assessment of the Value of Money”, in
*The Age of Economic Measurement*(Supplement to*History of Political Economy*: Volume 33), J. Klein and M. Morgan (eds.), pp. 137–161. - Isaac, A.M. C., 2017, “Hubris to Humility: Tonal Volume and
the Fundamentality of Psychophysical Quantities”,
*Studies in History and Philosophy of Science*(Part A), 65–66: 99–111. - Israel-Jost, V., 2011, “The Epistemological Foundations of
Scientific Observation”,
*South African Journal of Philosophy*, 30(1): 29–40. - JCGM (Joint Committee for Guides in Metrology), 2012,
*International Vocabulary of Metrology—Basic and general concepts and associated terms*(VIM), 3rd edition with minor corrections, Sèvres: JCGM. [JCGM 2012 available online] - Jorgensen, L.M., 2009, “The Principle of Continuity and
Leibniz’s Theory of Consciousness”,
*Journal of the History of Philosophy*, 47(2): 223–248. - Jung, E., 2011, “Intension and Remission of Forms”, in
*Encyclopedia of Medieval Philosophy*, H. Lagerlund (ed.), Netherlands: Springer, pp. 551–555. - Kant, I., 1787,
*Critique of Pure Reason*, P. Guyer and A.W. Wood (trans.), Cambridge: Cambridge University Press, 1998. - Kirpatovskii, S.I., 1974, “Principles of the information
theory of measurements”,
*Izmeritel’naya Tekhnika*, 5: 11–13, English translation in*Measurement Techniques*, 17(5): 655–659. - Krantz, D.H., R.D. Luce, P. Suppes, and A. Tversky, 1971,
*Foundations of Measurement Vol 1: Additive and Polynomial Representations*, San Diego and London: Academic Press. (For references to the two other volumes see Suppes et al. 1989 and Luce et al. 1990.) - von Kries, J., 1882, “Über die Messung intensiver
Grösse und über das sogenannte psychophysiches
Gesetz”,
*Vierteljahrschrift für wissenschaftliche Philosophie*(Leipzig), 6: 257–294. - Kuhn, T.S., 1961, “The Function of Measurement in Modern
Physical Sciences”,
*Isis*, 52(2): 161–193. - Kyburg, H.E. Jr., 1984,
*Theory and Measurement*, Cambridge: Cambridge University Press. - Latour, B., 1987,
*Science in Action*, Cambridge: Harvard University Press. - Leplège, A., 2003, “Epistemology of Measurement in
the Social Sciences: Historical and Contemporary Perspectives”,
*Social Science Information*, 42: 451–462. - Luce, R.D., D.H. Krantz, P. Suppes, and A. Tversky, 1990,
*Foundations of Measurement*(Volume 3: Representation, Axiomatization, and Invariance), San Diego and London: Academic Press. (For references to the two other volumes see Krantz et al. 1971 and Suppes et al. 1989.) - Luce, R.D., and J.W. Tukey, 1964, “Simultaneous conjoint
measurement: A new type of fundamental measurement”,
*Journal of mathematical psychology*, 1(1): 1–27. - Luce, R.D. and P. Suppes, 2004, “Representational
Measurement Theory”, in
*Stevens’ Handbook of Experimental Psychology*(Volume 4: Methodology in Experimental Psychology), J. Wixted and H. Pashler (eds.), New York: Wiley, 3rd edition, pp. 1–41. - Lusk, G., 2016, “Computer simulation and the features of
novel empirical data”,
*Studies in History and Philosophy of Science Part A*, 56: 145–152. - Mach, E., 1896,
*Principles of the Theory of Heat*, T.J. McCormack (trans.), Dordrecht: D. Reidel, 1986. - Mari, L., 1999, “Notes towards a qualitative analysis of
information in measurement results”,
*Measurement*, 25(3): 183–192. - –––, 2000, “Beyond the representational
viewpoint: a new formalization of measurement”,
*Measurement*, 27: 71–84. - –––, 2003, “Epistemology of
Measurement”,
*Measurement*, 34: 17–30. - –––, 2005a, “The problem of foundations of
measurement”,
*Measurement*, 38: 259–266. - –––, 2005b, “Models of the Measurement
Process”, in
*Handbook of Measuring Systems Design*, vol. 2, P. Sydenman and R. Thorn (eds.), Wiley, Ch. 104. - Mari, L., and M. Wilson, 2014, “An introduction to the Rasch
measurement approach for metrologists”,
*Measurement*, 51: 315–327. - Mari, L. and A. Giordani, 2013, “Modeling measurement: error
and uncertainty,”, in
*Error and Uncertainty in Scientific Practice*, M. Boumans, G. Hon, and A. Petersen (eds.), Ch. 4. - Maxwell, J.C., 1873,
*A Treatise on Electricity and Magnetism*, Oxford: Clarendon Press. - McClimans, L., 2010, “A theoretical framework for
patient-reported outcome measures”,
*Theoretical Medicine and Bioethics*, 31: 225–240. - –––, 2017, “Psychological Measures, Risk,
and Values”, In
*Measurement in Medicine: Philosophical Essays on Assessment and Evaluation*, L. McClimans (ed.), London and New York: Rowman & Littlefield, 89–106. - McClimans, L. and P. Browne, 2012, “Quality of life is a
process not an outcome”,
*Theoretical Medicine and Bioethics*, 33: 279–292. - McClimans, L., J. Browne, and S. Cano, 2017, “Clinical
Outcome Measurement: Models, Theory, Psychometrics and
Practice”,
*Studies in History and Philosophy of Science*(Part A), 65: 67–73. - Michel, M., 2019, “The Mismeasure of Consciousness: A
Problem of Coordination for the Perceptual Awareness Scale”,
*Philosophy of Science*, 86(5): 1239–49. - Michell, J., 1993, “The origins of the representational
theory of measurement: Helmholtz, Hölder, and Russell”,
*Studies in History and Philosophy of Science*(Part A), 24(2): 185–206. - –––, 1994, “Numbers as Quantitative
Relations and the Traditional Theory of Measurement”,
*British Journal for the Philosophy of Science*, 45: 389–406. - –––, 1999,
*Measurement in Psychology: Critical History of a Methodological Concept*, Cambridge: Cambridge University Press. - –––, 2000, “Normal science, pathological
science and psychometrics”,
*Theory & Psychology*, 10: 639–667. - –––, 2003, “Epistemology of Measurement:
the Relevance of its History for Quantification in the Social
Sciences”,
*Social Science Information*, 42(4): 515–534. - –––, 2004a, “History and philosophy of
measurement: A realist view”, in
*Proceedings of the 10th IMEKO TC7 International symposium on advances of measurement science*, [Michell 2004 available online] - –––, 2004b, “Item response models,
pathological science and the shape of error: Reply to Borsboom and
Mellenbergh”,
*Theory & Psychology*, 14: 121–129. - –––, 2005, “The logic of measurement: A
realist overview”,
*Measurement*, 38(4): 285–294. - Michell, J. and C. Ernst, 1996, “The Axioms of Quantity and
the Theory of Measurement”,
*Journal of Mathematical Psychology*, 40: 235–252. (This article contains a translation into English of a long excerpt from Hölder 1901.) - Mitchell, D.J., E. Tal, and H. Chang, 2017, “The Making of
Measurement: Editors’ Introduction.”
*Studies in History and Philosophy of Science*(Part A), 65–66: 1–7. - Miyake, T., 2017, “Uncertainty and Modeling in Seismology”, in Mößner & Nordmann (eds.) 2017, 232–244.
- Morgan, M., 2001, “Making measuring instruments”, in
*The Age of Economic Measurement*(Supplement to*History of Political Economy*: Volume 33), J.L. Klein and M. Morgan (eds.), pp. 235–251. - Morgan, M. and M. Morrison (eds.), 1999,
*Models as Mediators: Perspectives on Natural and Social Science*, Cambridge: Cambridge University Press. - Morrison, M., 1999, “Models as Autonomous Agents”, in Morgan and Morrison 1999: 38–65.
- –––, 2009, “Models, measurement and
computer simulation: the changing face of experimentation”,
*Philosophical Studies*, 143: 33–57. - Morrison, M. and M. Morgan, 1999, “Models as Mediating Instruments”, in Morgan and Morrison 1999: 10–37.
- Mößner, N. and A. Nordmann (eds.), 2017,
*Reasoning in Measurement*, London and New York: Routledge. - Mundy, B., 1987, “The metaphysics of quantity”,
*Philosophical Studies*, 51(1): 29–54. - Nagel, E., 1931, “Measurement”,
*Erkenntnis*, 2(1): 313–333. - Narens, L., 1981, “On the scales of measurement”,
*Journal of Mathematical Psychology*, 24: 249–275. - –––, 1985,
*Abstract Measurement Theory*, Cambridge, MA: MIT Press. - Nunnally, J.C., and I.H. Bernstein, 1994,
*Psychometric Theory*, New York: McGraw-Hill, 3rd edition. - Padovani, F., 2015, “Measurement, Coordination, and the
Relativized a Priori”,
*Studies in History and Philosophy of Science*(Part B: Studies in History and Philosophy of Modern Physics), 52: 123–28. - –––, 2017, “Coordination and Measurement:
What We Get Wrong About What Reichenbach Got Right”, In M.
Massimi, J.W. Romeijn, and G. Schurz (eds.),
*EPSA15 Selected Papers*(European Studies in Philosophy of Science), Cham: Springer International Publishing, 49–60. - Parker, W., 2017, “Computer Simulation, Measurement, and
Data Assimilation”,
*British Journal for the Philosophy of Science*, 68(1): 273–304. - Poincaré, H., 1898, “The Measure of Time”, in
*The Value of Science*, New York: Dover, 1958, pp. 26–36. - –––, 1902,
*Science and Hypothesis*, W.J. Greenstreet (trans.), New York: Cosimo, 2007. - Porter, T.M., 1995,
*Trust in Numbers: The Pursuit of Objectivity in Science and Public Life*, New Jersey: Princeton University Press. - –––, 2007, “Precision”, in Boumans 2007b: 343–356.
- Rasch, G., 1960,
*Probabilistic Models for Some Intelligence and Achievement Tests*, Copenhagen: Danish Institute for Educational Research. - Reiss, J., 2001, “Natural Economic Quantities and Their
Measurement”,
*Journal of Economic Methodology*, 8(2): 287–311. - Riordan, S., 2015, “The Objectivity of Scientific
Measures”,
*Studies in History and Philosophy of Science*(Part A), 50: 38–47. - Reichenbach, H., 1927,
*The Philosophy of Space and Time*, New York: Dover Publications, 1958. - Rothbart, D. and S.W. Slayden, 1994, “The Epistemology of a
Spectrometer”,
*Philosophy of Science*, 61: 25–38. - Russell, B., 1903,
*The Principles of Mathematics*, New York: W.W. Norton. - Savage, C.W. and P. Ehrlich, 1992, “A brief introduction to
measurement theory and to the essays”, in
*Philosophical and Foundational Issues in Measurement Theory*, C.W. Savage and P. Ehrlich (eds.), New Jersey: Lawrence Erlbaum, pp. 1–14. - Schaffer, S., 1992, “Late Victorian metrology and its
instrumentation: a manufactory of Ohms”, in
*Invisible Connections: Instruments, Institutions, and Science*, R. Bud and S.E. Cozzens (eds.), Cardiff: SPIE Optical Engineering, pp. 23–56. - Schlaudt, O. and Huber, L. (eds.), 2015,
*Standardization in Measurement: Philosophical, Historical and Sociological Issues*, London and New York: Routledge. - Scott, D. and P. Suppes, 1958, “Foundational aspects of
theories of measurement”,
*Journal of Symbolic Logic*, 23(2): 113–128. - Shannon, C.E., 1948, “A Mathematical Theory of
Communication”,
*The Bell System Technical Journal*, 27: 379–423 and 623–656. - Shannon, C.E. and W. Weaver, 1949,
*A Mathematical Theory of Communication*, Urbana: The University of Illinois Press. - Shapere, D., 1982, “The Concept of Observation in Science
and Philosophy”,
*Philosophy of Science*, 49(4): 485–525. - Skinner, B.F., 1945, “The operational analysis of psychological terms”, in Boring et al. 1945: 270–277.
- Soler, L., F. Wieber, C. Allamel-Raffin, J.L. Gangloff, C. Dufour,
and E. Trizio, 2013, “Calibration: A Conceptual Framework
Applied to Scientific Practices Which Investigate Natural Phenomena by
Means of Standardized Instruments”,
*Journal for General Philosophy of Science*, 44(2): 263–317. - Staley, K. W., 2020, “Securing the empirical value of
measurement results”,
*The British Journal for the Philosophy of Science*, 71(1): 87–113. - Stegenga, J., 2018,
*Medical Nihilism*, Oxford: Oxford University Press. - Stevens, S.S., 1935, “The operational definition of
psychological concepts”,
*Psychological Review*, 42(6): 517–527. - –––, 1946, “On the theory of scales of
measurement”,
*Science*, 103: 677–680. - –––, 1951, “Mathematics, Measurement,
Psychophysics”, in
*Handbook of Experimental Psychology*, S.S. Stevens (ed.), New York: Wiley & Sons, pp. 1–49. - –––, 1959, “Measurement, psychophysics and
utility”, in
*Measurement: Definitions and Theories*, C.W. Churchman and P. Ratoosh (eds.), New York: Wiley & Sons, pp. 18–63. - –––, 1975,
*Psychophysics: Introduction to Its Perceptual, Neural and Social Prospects*, New York: Wiley & Sons. - Suppes, P., 1951, “A set of independent axioms for extensive
quantities”,
*Portugaliae Mathematica*, 10(4): 163–172. - –––, 1960, “A Comparison of the Meaning
and Uses of Models in Mathematics and the Empirical Sciences”,
*Synthese*, 12(2): 287–301. - –––, 1962, “Models of Data”, in
*Logic, methodology and philosophy of science: proceedings of the 1960 International Congress*, E. Nagel (ed.), Stanford: Stanford University Press, pp. 252–261. - –––, 1967, “What is a Scientific
Theory?”, in
*Philosophy of Science Today*, S. Morgenbesser (ed.), New York: Basic Books, pp. 55–67. - Suppes, P., D.H. Krantz, R.D. Luce, and A. Tversky, 1989,
*Foundations of Measurement Vol 2: Geometrical, Threshold and Probabilistic Representations*, San Diego and London: Academic Press. (For references to the two other volumes see Krantz et al. 1971 and Luce et al. 1990.) - Swoyer, C., 1987, “The Metaphysics of Measurement”, in
*Measurement, Realism and Objectivity*, J. Forge (ed.), Reidel, pp. 235–290. - Sylla, E., 1971, “Medieval quantifications of qualities: The
‘Merton School’”,
*Archive for history of exact sciences*, 8(1): 9–39. - Tabor, D., 1970, “The hardness of solids”,
*Review of Physics in Technology*, 1(3): 145–179. - Tal, E., 2011, “How Accurate Is the Standard Second?”,
*Philosophy of Science*, 78(5): 1082–96. - –––, 2013, “Old and New Problems in
Philosophy of Measurement”,
*Philosophy Compass*, 8(12): 1159–1173. - –––, 2016a, “Making Time: A Study in the
Epistemology of Measurement”,
*British Journal for the Philosophy of Science*, 67(1): 297–335 - –––, 2016b, “How Does Measuring Generate
Evidence? The Problem of Observational Grounding”,
*Journal of Physics: Conference Series*, 772: 012001. - –––, 2017a, “A Model-Based Epistemology of Measurement”, in Mößner & Nordmann (eds.) 2017, 233–253.
- –––, 2017b, “Calibration: Modelling the
Measurement Process”,
*Studies in History and Philosophy of Science*(Part A), 65: 33–45. - –––, 2018, “Naturalness and Convention in
the International System of Units”,
*Measurement*, 116: 631–643. - Teller, P., 2013, “The concept of
measurement-precision”,
*Synthese*, 190: 189–202. - –––, 2018, “Measurement Accuracy
Realism”, in I. Peschard and B.C. van Fraassen (eds.),
*The Experimental Side of Modeling*, Minneapolis: University of Minnesota Press, 273–98. - Thomson, W., 1889, “Electrical Units of Measurement”,
in
*Popular Lectures and Addresses*(Volume 1), London: MacMillan, pp. 73–136. - Trout, J.D., 1998,
*Measuring the intentional world: Realism, naturalism, and quantitative methods in the behavioral sciences*, Oxford: Oxford University Press. - –––, 2000, “Measurement”, in
*A Companion to the Philosophy of Science*, W.H. Newton-Smith (ed.), Malden, MA: Blackwell, pp. 265–276. - van Fraassen, B.C., 1980,
*The Scientific Image*, Oxford: Clarendon Press. - –––, 2008,
*Scientific Representation: Paradoxes of Perspective*, Oxford: Oxford University Press. - –––, 2009, “The perils of Perrin, in the
hands of philosophers”,
*Philosophical Studies*, 143: 5–24. - –––, 2012, “Modeling and Measurement: The
Criterion of Empirical Grounding”,
*Philosophy of Science*, 79(5): 773–784. - Vessonen, E., 2019. “Operationalism and Realism in
Psychometrics”,
*Philosophy Compass*, 14(10): e12624. - –––, 2020, “The Complementarity of
Psychometrics and the Representational Theory of Measurement”,
*The British Journal for the Philosophy of Science*, 71(2): 415–442. - Wilson, M., 2013, “Using the concept of a measurement system
to characterize measurement models used in psychometrics”,
*Measurement*, 46(9): 3766–3774. - Wise, M.N. (ed.), 1995,
*The Values of Precision*, NJ: Princeton University Press. - Wise, M.N. and C. Smith, 1986, “Measurement, Work and
Industry in Lord Kelvin’s Britain”,
*Historical Studies in the Physical and Biological Sciences*, 17(1): 147–173. - Wolff, J. E., 2020a,
*The Metaphysics of Quantities*, Oxford: Oxford University Press. - –––, 2020b, “Heaps of Moles? –
Mediating Macroscopic and Microscopic Measurement of Chemical
Substances”,
*Studies in History and Philosophy of Science*(Part A), 80: 19–27.

## Academic Tools

How to cite this entry. Preview the PDF version of this entry at the Friends of the SEP Society. Look up this entry topic at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers, with links to its database.

## Other Internet Resources

- Bradburn,, M., Cartwright, N.L., and Fuller, J., 2016,
“A Theory of Measurement”,
CHESS Working Paper No. 2016-07 (Centre for Humanities Engaging
Science and Society), Durham University. (A summary of this paper
appears in R.M. Li (ed.),
*The Importance of Common Metrics for Advancing Social Science Theory and Research: A Workshop Summary*, Washington, DC: National Academies Press, 2011, pp. 53–70.) - Openly accessible guides to metrological terms and methods by the International Bureau of Weights and Measures (BIPM)
- Bibliography on measurement in science at PhilPapers.

### Acknowledgments

The author would like to thank Stephan Hartmann, Wendy Parker, Paul
Teller, Alessandra Basso, Sally Riordan, Jo Wolff, Conrad Heilmann and
participants of the History and Philosophy of Physics reading group at
the Department of History and Philosophy of Science at the University
of Cambridge for helpful feedback on drafts of this entry. The author
is also indebted to Joel Michell and Oliver Schliemann for useful
bibliographical advice, and to John Wiley and Sons Publishers for
permission to reproduce excerpt from Tal (2013). Work on this entry
was supported by an Alexander von Humboldt Postdoctoral Research
Fellowship and a Marie Curie Intra-European Fellowship within the
7^{th} European Community Framework Programme. Work on the
2020 revision of this entry was supported by an FRQSC New Academic
grant, a Healthy Brains for Healthy Lives Knowledge Mobilization
grant, and funding from the Canada Research Chairs program.