Auditory Perception

First published Thu May 14, 2009; substantive revision Tue Apr 7, 2020

Auditory perception raises a variety of challenging philosophical questions. What do we hear? What are the objects of auditory awareness? What is the content of audition? Is hearing spatial? How does audition differ from vision and other sense modalities? How does the perception of sounds differ from that of colors and ordinary objects? This entry presents the main debates in this developing area and discusses promising avenues for future inquiry. It discusses the motivation for exploring non-visual modalities, how audition bears on theorizing about perception, and questions concerning the objects, contents, phenomenology, varieties, and bounds of auditory perception.

1. Other Modalities and the Philosophy of Perception

The philosophy of sounds and auditory perception is one area of the philosophy of perception that reaches beyond vision for insights about the nature, objects, contents, and varieties of perception. This entry characterizes central issues in the philosophy of auditory perception, many of which bear upon theorizing about perception more generally, and it mentions outstanding questions and promising future areas for inquiry in this developing literature. Before beginning the substantive discussion of audition itself, it is worthwhile to discuss the motivation and rationale for this kind of work.

Philosophical thinking about perception has focused predominantly on vision. The philosophical puzzle of perception and its proposed solutions have been shaped by a concern for visual experience and visual illusions. Questions and proposals about the nature of perceptual content have been framed and evaluated in visual terms, and detailed accounts of what we perceive frequently address just the visual case. Vision informs our understanding of perception’s epistemological role and of its role in guiding action. It is not a great exaggeration to say that much of the philosophy of perception translates roughly as philosophy of visual perception.

Recently, however, other perceptual modalities have attracted attention (see, e.g., Stokes et al. 2015, Matthen 2015). In addition to auditory perception and the experience of sound, touch and tactile awareness have generated philosophical interest concerning, for instance, the tactile and proprioceptive experience of space, the objects of touch, whether contact is required for touch, and whether distinct modalities detect pressure, heat, and pain (see, e.g., O’Shaughnessy 1989, Martin 1993, Scott 2001, Fulkerson 2013, 2016). The unique phenomenology of olfaction and smells has been used to argue that vision is atypical in supporting the transparency of perceptual experience (Lycan 2000, 282; cf. Batty 2010) and that perceptual objectivity does not require spatiality (Smith 2002, ch 5). Lycan (2000) even suggests that the philosophy of perception would have taken a different course had it focused upon olfaction instead of vision (see also Batty 2011). Some authors have appealed to taste and flavor to challenge traditional ways of dividing and counting senses (Smith 2015; cf. Richardson 2013).

This kind of work is philosophically interesting in its own right. But it is also worthwhile because theorizing about perception commonly aims to address general questions about perception, rather than concerns specific to vision. Hope for a comprehensive and general understanding of perception rests upon extending and testing claims, arguments, and theories beyond vision. One might view work on non-visual modalities as filling out the particulars required for a thoroughly detailed account of perceiving that applies not just to vision but across the modalities. At least three approaches might be adopted, with potential for increasingly revisionist outcomes.

First, one might take work on non-visual modalities as translating what we have learned from the visual case into terms that apply to other modalities. This approach is relatively conservative. It assumes that vision is representative or paradigmatic and that we have a good understanding of perception that is derived from the case of vision. One example of this kind of approach would be to develop an account of the representational content of auditory experience.

Second, considering other modalities might extend our vision-based understanding of perception. Non-visual cases might draw attention to new kinds of phenomena that are missing from or not salient in vision. If so, a vision-based account of perception is satisfactory as far as it goes, but it leaves out critical pieces. For example, speech perception, multimodal perception, and flavor perception might involve novel kinds of perceptual phenomena absent from the visual case.

Third, considering other modalities might challenge vision-based claims about perception. If falsifying evidence is discovered in non-visual cases, then theorizing beyond vision may force revision of general claims about perception that are supported by vision. For example, if olfactory experience is not diaphanous, but olfactory experience is perceptual, the transparency thesis for perceptual experience fails.

Finally, we might attempt to determine whether any unified account exists that applies generally to all of the perceptual modalities. We can ask this question either at the level of quite specific claims, such as those concerning the objects of perception or the nature and structure of content. We can ask it about the relationships among perceiving, believing, and acting. Or we can ask it about the general theory of necessary and sufficient conditions for perceiving. Some philosophers, impressed by findings concerning non-visual modalities, express skepticism whether a unified theory exists (e.g., Martin 1992).

Whatever the approach, extending our knowledge about perception beyond the visual requires systematic attention to individual modalities as well as careful accounting in order to determine how the results bear on general questions about perception. Whatever the outcome, audition is a rich subject matter in its own right, and investigating this subject matter is crucial to our overall understanding of perception.

2. The Objects of Auditory Perception

What do we hear? One way to address this question concerns the objects of auditory perception.

2.1 Sounds

In the first instance, typical human perceivers hear sounds. It is plausible that sounds are objects of auditory perception.

What are sounds? Sounds traditionally have been counted with colors, smells, and tastes as secondary, sensible, or sensory qualities (see, e.g., Locke 1689/1975, Pasnau 1999, 2000, Leddington 2019). However, recently it has been proposed that sounds are individuals to which sensible features are attributed. In particular, several philosophers have proposed that sounds are public, distally-located, event-like individuals (Casati and Dokic 1994, 2005, O’Callaghan 2007, Matthen 2010).

Four questions about audition’s objects define the debate and constrain theories of sound (see also the entry on sounds, for extensive discussion).

2.1.1 Private or Public?

Are sounds private or public? Maclachlan (1989) argues that the sounds we hear are sensations (rather than, for instance, the pressure waves that cause auditory experiences). Such sensations are internal and private, and we experience them directly, or without apparent mediation. On Maclachlan’s account, we hear the ordinary things and happenings that are the sources of sounds only indirectly, by means of inference from auditory data.

Maclachlan’s story is noteworthy partly because he uses hearing and sounds to motivate a general claim about perception. He claims that what seems perfectly intuitive and obvious in the case of sounds and hearing—that something other than material objects are the direct objects of hearing; that the direct objects of audition are internal; and that we indirectly hear things in the world by hearing their sounds—helps us to discover what is true of all perception. According to Maclachlan, for instance, seeing involves direct awareness of sensations of patterns of light, while surfaces and ordinary objects figure only indirectly and thanks to inference among the intentional objects of sight. The case of sounds and audition is important because it reveals that perceiving involves awareness of sensations in the first instance, and of the external world only indirectly.

Maclachlan’s description of sounds and auditory experience has some attractions. First, sounds are among the things we hear. And sounds are among the direct or immediate objects of audition in the relatively innocuous sense that hearing a sound does not seem to require hearing as of something else. Hearing a collision, on the other hand, may seem to require awareness as of a sound. Furthermore, sounds are unlike the ordinary material objects (e.g., bottles and staplers) we see. You cannot reach out and grab a sound, or determine its temperature. Instead, sounds may strike us as byproducts or effects of such ordinary things and their transactions. Sounds result from activities or interactions of material bodies and thus are experienced as distinct or independent from them (cf. Nudds 2001). Nevertheless, audition does afford some variety of awareness of the sources of sounds, or at least provides information about them.

However, the claim that sounds are sensations is unattractive. Good reasons suggest that sounds are public rather than private, even if sounds are not identical with ordinary objects and events such as clothespins and collisions. Suppose I am near the stage in a hall listening to some music, and that I have a headache. It is a confusion to think you could feel my headache, but I assume you hear the sounds I hear. Suppose I move to the back of the hall, and the headache then gets better. My experience of the sounds of the music differs once I am at the back of the room, and my experience of the headache differs. The sound of the music itself need not differ (the musicians could make the same sounds), but the headache itself changes. The sounds can continue once I leave the room, but if I stop experiencing the headache, it is gone. Moreover, the notion of an unfelt headache is puzzling, but it makes good sense to say that a tree makes a sound when it falls in the woods without being heard. Finally, tinnitus, or ringing of the ears, is an illusory or hallucinatory experience as of a sound, but received wisdom maintains that there are no illusory headaches.

This suggests that audition does not provide special reasons to believe that the objects of perception are private sensations. Sounds, construed as objects of auditory perception, plausibly inhabit the public world. (See the section 3.1 Spatial Hearing for further discussion.)

2.1.2 Proximal or Distal?

Are sounds proximal or distal? The customary science-based view holds that sounds are pressure waves that travel through a medium (see also Sorensen 2008). On this account, sounds are caused by objects and events such as collisions, and sounds cause auditory experiences. However, sounds are not auditorily experienced to travel through the surrounding medium as waves do. Thus, if sounds are waves, then the sounds we hear may be proximal, located at the ear of the hearer.

Alternatively, some have argued that audition presents sounds as being located in some direction at a distance (Pasnau 1999, O’Callaghan 2007, ch. 3, 2010). On such an account, sounds commonly appear auditorily to be in the neighborhood of their sources and thereby furnish useful information about the locations of those sources. The sound of the drumming across the street seems to come from across the street but does not seem audibly to travel. When sounds do appear to fill a room, sound seems located all around. Sounds that seem to “bounce” around a room appear intermittently at different locations rather than as traveling continuously from place to place. Experiencing a missile-like sound speeding towards your ears illustrates the contrast with ordinary hearing (O’Callaghan 2007, 35). Sounds, according to this conception, ordinarily appear to have distal locations and to remain stationary relative to their sources.

If sounds are not usually experienced to travel, then unless auditory experience is illusory with respect to the apparent locations of sounds, sounds themselves do not travel. Sounds thus are not identical with and do not supervene locally upon the waves, since waves travel (Pasnau 1999). Several philosophers have argued on these and related grounds that sounds are located distally, near their sources (Pasnau 1999, Casati and Dokic 2005, O’Callaghan 2007). On this view, pressure waves bear information about sounds and are the proximal causes of auditory experiences, but are not identical with sounds.

One might object by resisting the phenomenological claim that we experience sounds as distally located, for instance by suggesting that audition is aspatial, or that audition is spatial but sound sources rather than sounds are auditorily localized (see section 3.1 Spatial Hearing for further discussion). Or, one might accept some measure of illusion. Another possibility is that we experience only a small subset of the locations sounds occupy during their lifetimes (for instance, while at their sources), and simply fail to experience where they are at other times. This avoids ascribing illusion. Finally, Fowler (2013) argues indirectly on the basis of echoes against distal theories of sound.

2.1.3 Properties or Individuals?

Are sounds properties or individuals? Among both proximal and distal theories, disagreement exists concerning the ontological category to which sounds belong. Philosophers traditionally have understood sounds as properties—either as sensible or secondary qualities, or as the categorical or physical properties that ground powers to affect subjects. Commonly, sounds are attributed to the medium that intervenes between sources and perceivers. More recently, however, some distal theorists have argued that sounds are properties of what we ordinarily understand as sound sources—bells and whistles have or possess rather than make or produce sounds. Pasnau (1999), for instance, claims that sounds are transient properties that are identical with or supervene upon vibrations of objects. Kulvicki (2008) argues against transience in an attempt to subsume sounds to the model of colors, and claims that sounds are persistent, stable dispositional properties of objects to vibrate in response to being “thwacked”. He distinguishes “having” a stable sound from “making” a sound on some occasion (manifesting the stable disposition). This account implies that sounds sometimes make sounds they do not have, and that they have sounds when silent. One also might ask whether events such as collisions and strummings, rather than objects, bear sounds. Leddington (2019) recently has defended such an account.

A revisionist challenge comes from those who argue that sounds are individuals rather than properties. Several arguments support this understanding. First, empirical work on auditory scene analysis suggests that one primary task of audition is to carve up the acoustic scene into distinct sounds, each of which may possess its own pitch, timbre, and loudness (Bregman 1990). Multiple distinct sounds with different audible attributes can be heard simultaneously. An analog of Jackson’s (1977, see also Clark 2000) many properties problem thus arises for audition since feature awareness alone cannot explain the bundling or grouping of audible attributes into distinct sounds. Such bundling or grouping of audible features suggests that sounds are perceptible individuals to which these features are attributed.

Furthermore, the temporal characteristics of experienced sounds suggest that sounds are not simple qualities. Sounds audibly seem to persist through time and to survive change. A particular sound, such as that of an emergency siren, might begin high-pitched and loud and end low-pitched and soft. This suggests that sounds are individuals that bear different features at different times, rather than sensible qualities.

Several responses to these arguments are available (see Cohen 2009 for the most developed reply). One might argue that sounds are complex properties, such as pitch-timbre-loudness complexes, instantiated at a time. To account for feature binding, one might hold that such complex properties are ascribed to ordinary objects such as bells and whistles. Or, one might hold that they are particularized properties, such as tropes. To accommodate sounds that survive change through time, a property account could hold that sounds are yet more complex properties that have patterns of change built into their identity conditions. However, any such view differs a great deal from the familiar secondary or sensible quality view pioneered by Locke. Pitch, timbre, and loudness are better candidates for simple sensible features in audition (see section 3.2 Audible Qualities).

2.1.4 Objects or Events?

If sounds are individuals, are they object-like or event-like individuals? Intuitively, the material objects we see are capable of existing wholly at any given moment, and all that is required to perceptually recognize such individuals is present at a moment. On the other hand, event-like individuals occupy time and need not exist wholly at any given moment. Their individuation and recognition frequently appeal to patterns of features over time. Event-like individuals intuitively comprise temporal parts, while object-like individuals intuitively do not. The issue here is not the truth of endurantism or perdurantism as an account of the persistence of objects or events. Instead, the issue concerns a difference in how we perceptually individuate, experience, and recognize individuals.

No contemporary philosopher has yet claimed that sounds are objects in the ordinary sense. Those who argue that sounds are individuals commonly point out that sounds not only persist and survive change (as do ordinary material objects), but also require time to occur or unfold. It is difficult to imagine an instantaneous sound, or one that lacks duration. Sounds are not commonly treated as existing wholly at a given moment during their duration. Indeed, the identities of many common sounds are tied to patterns of change in qualities through time. The sound of an ambulance siren differs from that of a police siren precisely because the two differ in patterns of qualitative change through time. The sound of the spoken word ‘team’ differs from that of ‘meat’ because each instantiates a common set of audible qualities in a different temporal pattern. These considerations support the view that sounds are event-like individuals (see Casati and Dokic 1994, 2005, Scruton 1997, O’Callaghan 2007, Matthen 2010).

This may bear on debates about persistence in the following way. Differences in the intuitive plausibility of endurantism and perdurantism may be grounded in facts about perception. In particular, vision may treat objects as persisting by enduring or being wholly present at each time at which they exist, while audition may treat its objects as persisting by perduring or having temporal parts. This may stem from differences in perceptual organization. For instance, exhibiting a visible property profile at a time may suffice for being a visual object of a given sort, while being an audible object of a given sort may require exhibiting an audible property profile over time.

2.2 Auditory Objects

Though most philosophers construe sounds either as properties or as event-like individuals (see section 2.1 Sounds), psychologists commonly have discussed auditory objects (see, e.g., Kubovy and Van Valkenburg 2001, Griffiths and Warren 2004, Heald et al. 2017). The target of such discussion is not simply audition’s intentional objects or proper (specific to audition) objects. The intended analogy is with visual objects. Talk of auditory objects gestures at the visual processes involved in perceptually discriminating, attentively tracking, recognizing and categorizing ordinary material objects. What justifies talk of object perception in audition?

2.2.1 Object Perception in Audition

First of all, humans typically do not auditorily perceive three-dimensional, bounded material objects as such, though it is plausible to think we visually perceive them. Hearing does not resolve the edges, boundaries, and filled volumes in space that I see, and I do not hear audible items to complete spatially behind occluders as do visible surfaces of objects. If perceiving a three-dimensional object requires awareness of its edges, boundaries, and extension, perhaps in order to discriminate it from its surroundings, humans typically do not auditorily perceive such objects.

Nevertheless, striking and illuminating parallels do exist between the perceptual processes and experiences that take place in vision and audition. Such parallels may warrant talk of object perception in a more general sense that is common to both vision and audition (O’Callaghan 2008a).

Perceiving objects requires parsing a perceptual scene into distinct units that one can attend to and distinguish from each other and from a background. In vision, bounded, cohesive collections of surfaces that are extended in space and that persist through time play this role (see, e.g., Spelke 1990, Nakayama et al. 1995, Leslie et al. 1998, Scholl 2001, Matthen 2005). In audition, as in vision, multiple distinct perceptible individuals might exist simultaneously, and each might persist and survive change (see the discussion of auditory scene analysis in section 2.1 Sounds). A critical difference, however, is that while vision’s objects are extended in space, and are individuated and recognized primarily in virtue of spatial characteristics, audible individuals are extended in time, and are perceptually individuated and recognized primarily in virtue of pitch and temporal characteristics (see, e.g., Bregman 1990, Kubovy and Van Valkenburg 2001). For instance, audible individuals have temporal edges and boundaries, and boundary elements can belong only to a single audible individual. They also are susceptible to figure-ground effects over time. One can, for instance, shift attention among continuous audible individuals that differ in pitch. Furthermore, they are susceptible to completion effects over time in much the same way that visible objects are perceptually completed in space. Seeing a single visible region to continue behind a barrier is analogous to hearing a sound stream to continue through masking noise, which may take place even when there is no corresponding signal (Bregman 1990, 28). Finally, multiple distinct, discrete audible individuals, such as the temporally bounded notes in a tune, can form audible streams that comprise a single perceptible unit. Such streams are subject to figure-ground shifts, and, like collections of surfaces, they can be attentively tracked through changes to their features and to one’s perspective. Though such complex audible individuals include sounds, they comprise temporally unified collections of sounds and silence that are analogous to spatially complex visible objects, such as tractors.

Such audible individuals are temporally extended and bounded, serve as the locus for auditory attention, prompt completion effects, and are subject to figure-ground distinctions in pitch space. For these reasons, the auditory processes involved in their perception parallel those involved in the visual perception of ordinary three-dimensional objects. The parallels suggest a shared sense in which vision and audition involve a more general form of object perception (see, e.g., Kubovy and Van Valkenburg 2001, Scholl 2001, Griffiths and Warren 2004, O’Callaghan 2008a, Matthen 2010).

2.2.2 What is an Auditory Object?

What is the shared sense in which both visible and audible individuals deserve to be called ‘objects’? Kubovy and Van Valkenburg (2001, 2003) define objecthood in terms of figure-ground segregation, which requires perceptual grouping. They propose the theory of indispensable attributes as an account of the necessary conditions on perceptual grouping (see also Kubovy 1981). Indispensable attributes for a modality are those without which perceptual numerosity is impossible. They claim that while space and time are indispensable attributes for vision (and color is not), pitch and time are indispensable attributes for auditory objects. Though they are more skeptical about whether audition parallels vision, Griffiths and Warren (2004) sympathize with a figure-ground characterization but suggest a working notion of an auditory object defined in terms of “an acoustic experience that produces a two-dimensional image with frequency and time dimensions” (Griffiths and Warren 2004, 891).

O’Callaghan (2008a) proposes that both visible and audible objects are mereologically complex individuals, though their mereology differs in noteworthy respects. While vision’s objects possess a spatial mereology and are individuated and tracked in terms of spatial features, audition’s objects have a temporal mereology and are individuated and tracked in terms of both pitch and temporal characteristics. Discussion of auditory objects thus draws attention to two roles that space plays in vision. First, there is the role of space in determining the structure internal to visible objects, which facilitates identifying and recognizing visible objects. Second, space serves as the external structure among visible objects, and is critical in distinguishing objects from each other. In audition, time plays a role similar to space in vision in determining the structure internal to auditory objects. Pitch, on the other hand, serves as an external structural framework, along with space, that helps to distinguish among audible individuals.

Why is it useful to perceive such individuals in audition? One promising account is that they provide useful information about the happenings that produce sounds. Carving the acoustic world into mereologically complex individuals informs us about what is going on in the extra-acoustic environment. It provides ecologically significant information about what the furniture is doing, rather than just how it is arranged. It is one thing to perceive a tree; it is another to hear that it is falling behind you.

Discussion of auditory objects and accounts of their nature and perception is relatively new among philosophers (see, e.g., O’Callaghan 2008a, and essays in Bullot and Egré 2010, including Matthen 2010, Nudds 2010). Such work has led to the development of general accounts of perceptual objects designed to avoid visuocentrism (see, e.g., O’Callaghan 2016, Green 2019). This area is ripe for philosophical contributions.

2.3 Sound Sources

Sounds are among the objects of audition. Plausibly, so are complex, temporally extended individuals composed of sounds. Do we hear anything else? Reflection suggests we hear things beyond sounds and sound complexes. In hearing sounds, one may seem to experience the backfiring of the car or the banging of the drum. One might hold that a primary part of audition’s function is to reveal sound sources, the things and happenings that make sounds.

2.3.1 Do Humans Hear Sound Sources?

If sounds were internal sensations or sense-data, then, as Maclachlan (1989) observes, we would hear sound sources only indirectly, in an epistemological sense, perhaps thanks to something akin to inference. Acquiring beliefs about the environment would require mediation by propositions connecting experienced internal sounds with environmental causes.

If, however, sounds are properties attributed either to ordinary objects, as Pasnau (1999) and Kulvicki (2008) hold, or to events, as Leddington (2019) holds, then hearing a tuba or the playing of a tuba might only require hearing its sounds. Perceptually ascribing such audible attributes to their sources might ground epistemically unmediated awareness of tubas or their playings.

However, the individuals to which audible attributes are perceptually attributed need not be identical with ordinary objects or events. Instead, audible attributes may belong in the first instance to sounds. Sounds plausibly are distinct from ordinary or extra-acoustic individuals (O’Callaghan 2007, 2011). Suppose then that one could not hear an ordinary object or event without there existing an audible sound, as well as that sounds can mislead about their sources (it might sound like drumming but be hammering).

Given this, forming beliefs about ordinary things and happenings connected with sounds might seem to require inference, association, or some otherwise cognitive process, and so awareness of a sound source might appear to always involve more than perceptual awareness. According to such an account, awareness of environmental things and happenings thanks to audition is epistemically mediated by awareness as of sounds and auditory objects, but does not itself constitute auditory perceptual awareness as of those things and happenings. You are inclined to think you hear the source because your representing or being aware of it co-occurs with, but is no more than a downstream consequence triggered by, your auditory experience.

Such an account is not wholly satisfactory. First, the phenomenology of audition suggests something stronger than indirect, epistemically mediated awareness of things such as collisions or guitar strummings or lions roaring. Reflection suggests auditory awareness as of collisions, strummings, and lions. Second, the capacity to refer demonstratively to such things and events on auditory grounds also suggests genuine perceptual awareness of them. Third, we commonly perceptually individuate sounds in terms of their apparent sources, and our taxonomy reflects this. “What did you hear?” “I heard paper ripping,” or, “The sound of a dripping faucet.” We distinguish two quite similar rattles once we hear one as of a muffler clamp and the other as of a loose fender. Furthermore, characterizing certain audible features and explaining perceptual constancy effects involving such features requires appeal to sound sources. Handel says of timbre: “At this point, no known acoustic invariants can be said to underlie timbre... The cues that determine timbre quality are interdependent because all are determined by the method of sound production and the physical construction of the instrument” (Handel 1995, 441). Explaining loudness constancy—why moving to the back of the room does not change how loudly the lecturer seems to speak—appeals to facts about the sources of sounds (Zahorik and Wightman 2001). Auditory processing proceeds in accordance with natural constraints concerning characteristics of sound sources, and information concerning sources shapes how auditory experiences are organized. This is to say that processes responsible for auditory experience proceed as if acoustic information is information about sound sources. Finally, audition-guided action supports the claim that we hear such things and events. Turning to look toward the source of a sound or ducking out of the way of something we hear to be approaching—behaviors guided by auditory experience—would make little sense if we heard only sounds. In the first place, these reasons ground a case for thinking that auditory perceptual experience does not strictly end with sounds and auditory objects. In particular, awareness as of a source, even if dependent upon awareness as of a sound, may be constitutive of one’s auditory perceptual experience.

The main barrier to an alternative is that the relation between sounds and ordinary things or happenings is commonly understood as causal (see, e.g., Nudds 2001). Awareness as of an effect does not itself typically furnish epistemically unmediated awareness of its cause. Seeing smoke is not seeing fire. The right sort of dependence between characteristics of the experience and the cause is not apparent, and awareness as of an effect does not by itself ground perceptual demonstratives that concern the cause. The metaphysical indirectness of the causal relation appears to block epistemic directness (see O’Callaghan 2011a for further discussion).

2.3.2 The Mereology of Sounds and Sources

Is there another explanatory route? Suppose that instead of a causal relation, we understand the relationship between sounds and sources mereologically, or as one of part to whole (see O’Callaghan 2011a). Parthood frequently does ground perceptual awareness. For instance, seeing distinct parts of a surface interrupted by an occluder leads to perceptual experience as of a single surface (imagine seeing a dog behind a picket fence). Seeing the facing surfaces of a cube affords awareness as of a cube, and we can attentively track that same cube as it rotates and reveals different surfaces. Suppose, then, that a sound is an event-like individual (recall, property accounts escape the worry). This event is part of a more encompassing event, such as a collision or the playing of a trumpet, that occurs in the environment and that includes the sound. So, the typical horse race includes the sounds, and you might auditorily perceive the racing in hearing some of its proper parts: the sounds. More specifically, you may hear the galloping thanks to hearing the sounds it includes. You may fail to hear certain parts of the racing event, such as the jockey’s glance back after crossing the wire, but you also fail to see parts of the race, such as the misstep of the horse in second place. If the sounds are akin to the audible “profile” of the event, analogous to the visible surfaces of objects and visible parts of events, you might then enjoy auditory awareness as of the galloping of the horses in virtue of your awareness as of the sounds of the hooves. The sound is not identical with the galloping, and it is not just a property or a causal byproduct of the galloping. It is a part of a particular event of galloping. The metaphysical relation of part to whole, in contrast to that between effect and cause, might ground the sort of epistemically unmediated awareness of interest (cf. Nakayama et al. 1995, Bermúdez 2000, Noë 2004). Auditory perceptual awareness as of the whole may occur thanks to experiencing the part.

One objection is that this mereological account of the relation between sounds and sources cannot account for hearing ordinary objects by hearing their sounds. You could not strictly hear a tuba by hearing its sound because a tuba is not an event of which a sound is a part. However, the sound is part of the event of playing the tuba, and the tuba is a participant in that playing. So, though you are not aware as of a tuba, you are aware as of an event that involves a tuba. That perhaps is enough to explain talk of hearing tubas and to assuage the worry (for further discussion, see Young 2018).

Another more serious objection contends that the events we seem to hear are ones that do not constitutively involve sounds or that might have taken place without sounds. For instance, we hear the collision, but the collision is something that could have occurred in a vacuum and not made a sound. (Note that this differs from the claim discussed below according to which sounds are identical with source events and so inaudibly exist in vacuums.) If so, the collision and the sound differ and the collision does not strictly include a sound. The collision therefore must have made the sound as a causal byproduct. This suggests that, strictly speaking, you could not hear that very collision event (since it causes the sound). The best response is to bite the bullet and accept that events that do occur or that could occur in vacuums cannot be heard since they include no sounds. This is not so bad, since you could hear a different, more encompassing event that includes a sound along with a collision. Alternatively, one might say the very same event that occurs in a vacuum also could occur in air, but that it would have involved a sound had it occurred in air. In that case, one can only hear such events when they occur in air and include a sound. The choice depends upon one’s metaphysics of events. In either case, it seems reasonable that token events that do not include sounds are inaudible.

Casati et al. (2013) sidestep some of these concerns by rejecting the distinction between sounds and events that are typically understood as sound sources. They propose to “Ockhamize” the “event sources” of sounds by identifying sounds with events such as collisions and vibrations. The sound just is the collision or the vibrating, whether or not it occurs in air. This account implies that sounds could exist in vacuums.

What hinges on the debate about hearing sources? The first upshot is epistemological and concerns the nature of the justification for empirical beliefs grounded in perceptual experience. The evidential status of beliefs about what one perceptually experiences differs from that of beliefs about what is causally responsible for what one perceptually experiences. So, whether or not we hear sound sources impacts the epistemology of audition. The second upshot concerns the relation between audition and certain actions. If we hear only sounds and auditory objects, what appears to be effortless, auditorily guided action to avoid or orient toward sound sources requires another explanation (because sounds are invisible and usually do no harm). Finally, it affects how we understand the adaptive significance of audition. Did audition evolve so as to furnish awareness of sounds alone, while leaving their environmental significance to extra-perceptual cognition, or did it evolve so as to furnish perceptual responsiveness to the sources of sounds?

3. The Contents of Auditory Perception

Another way to address the question, “What do we hear?” concerns the contents of auditory perception. Two topics are especially noteworthy in the context of related debates about vision and its contents. The first concerns the whether audition has spatial content. The second concerns the perception of audible qualities. Parallel questions can be raised without relying on the perceptual content framework, though important complications arise.

3.1 Spatial Hearing

One topic where the contrast between vision and audition has been thought to be particularly philosophically significant concerns space. Vision is a robustly spatial perceptual modality. Vision furnishes awareness of space and spatial features. Some claim vision has an inherently spatial structure, or, further, that vision’s spatial structure is a necessary condition on being visually aware of things as independent from oneself.

Hearing also provides information about space—humans learn about space on the basis of hearing. If audition represents space or spatial features, there is a natural account of being so informed. We might form beliefs about spatial features of environments on the basis of auditory perceptual experiences simply by accepting or endorsing what is apparent in having those experiences.

But learning about spatial features on the basis of audition and audition’s bearing information about space both are consistent with entirely aspatial auditory phenomenology. For instance, volume might bear information about distance, and differences in volume at the two ears might bear information about direction. In that case, audition bears information about space, and learning about space on the basis of audition is possible, but it does not follow that auditory experience is spatial or that audition represents space.

3.1.1 Skepticism about Spatial Audition

Notably, a tradition of skepticism about audition’s spatiality exists in philosophy. Certainly, our capacity to glean information about space is less acute in audition than in vision. Vision reveals fine-grained spatial details that audition cannot convey, such as patterns and textures. But philosophers who are skeptical about spatial audition are not just concerned about a difference in spatial acuity between audition and vision. Malpas says of the expression, ‘the location of sound’:

I do not mean by ‘location’ ‘locality’, but ‘the act of locating’, and by ‘the act of locating’ I do not mean ‘the act of establishing in a place’, but ‘the act of discovering the place of’. Even so ‘location’ is misleading, because it implies that there is such a thing as discovering the place of sounds. Since sounds do not have places there is no such act. (Malpas 1965, 131)

O’Shaughnessy states, “…We absolutely never immediately perceive sounds to be at any place. (Inference from auditory data being another thing)” (O’Shaughnessy 2002, 446). The claim is that, in contrast to visible objects, audible sounds are not experienced as having locations. Rather, we determine the places of sounds and sources from acoustic features, such as loudness and interaural differences, that bear information about distance and direction. We do not auditorily experience spatial features.

This debate, and the purported contrast between vision and audition, has consequences for perceptual theorizing. One route to the conclusion that hearing sounds involves auditory awareness of sensations involves denying that audition satisfies spatial prerequisites on experiencing sounds as objective or public. For instance, Maclachlan (1989) claims that audition’s phenomenology—in particular, its aspatial phenomenology—provides reasons to think sounds are sensations. Comparing sounds with pains, which we readily recognize as sensations, he says, “[A]lthough the sounds we hear are just as much effects produced in us as are the pains produced by pins and mosquitoes, there is no variety in the location of these effects [the sounds]. Because of the lack of contrast, we are not even aware that the sounds we hear are bodily sensations” (Maclachlan 1989, 31, my emphasis). Maclachlan means that, in contrast even to the case of pains, which are felt at different bodily locations, sounds are not experienced to be at differing locations, and so we are not even inclined to recognize that they are bodily sensations. Maclachlan then suggests that we associate sounds with things and happenings outside the body rather than appreciate that they are effects in us. Given the lack of spatial variation among experienced sounds, we projectively associate sounds with distal sources. This explanation assumes that experienced sounds exhibit no audibly apparent spatial variation: sounds seem located at the ears or lack apparent location altogether. Denying that auditory experiences present sounds at varying locations beyond the ears invites difficulty in finding a place for sounds in the world. If audition is wholly aspatial, this may encourage a retreat to the view that sounds lack locations outside the mind.

This kind of strategy has companions and precursors. Lycan’s suggestion that olfactory experiences are apparent as modifications of one’s own consciousness depends heavily on the aspatial phenomenology of olfactory experience (Lycan 2000, 278–82). Each recalls the Kantian claim that objectivity requires space, or that grasping something as independent from oneself requires the experience of space, a version of which is deployed by Strawson (1959, ch 2) in his famous discussion of sounds.

Two lines of response are open. The first appeals to the thriving empirical research program in “spatial hearing” (see, e.g., Blauert 1997). Scientists aim to discover the cues and perceptual mechanisms that ground spatial audition, such as interaural time and level differences, secondary and reverberant signals, and head-related transfer functions. Audition clearly cannot match vision’s singular acuity—vision’s resolution limit is nearly two orders of magnitude better than audition’s (Blauert 1997, 38–9). Nevertheless, this research strongly supports the claim that human subjects auditorily perceive such spatial characteristics as direction and distance.

Second, a number of philosophers have objected on phenomenological grounds. Audition, they argue, involves experiencing or perceptually representing such spatial characteristics as direction and distance (Pasnau 1999, Casati and Dokic 2005, Matthen 2005, O’Callaghan 2007, 2010). Introspection and performance support the claim that sounds or sound sources are in many ordinary cases perceptually experienced as located in the environment at a distance in some direction. We hear the sound of the knocking over near the door; we hear footsteps approaching from behind and to the left; hearing sound to “fill” a room is itself a form of spatial hearing. Though hearing is more error prone than vision, we frequently do not need to work out the locations of sounds or sources—we simply hear them.

3.1.2 Strawson and the Purely Auditory Experience

A subtler form of skepticism about spatial audition aims just to block the requirements on objectivity. Strawson (1959) famously argues in Chapter 2 of Individuals that because auditory experience is not intrinsically spatial—spatial concepts have no intrinsic auditory significance—a purely auditory experience would be non-spatial. Thus, it would not satisfy the requirements on non-solipsistic consciousness. Others have endorsed versions of Strawson’s claim. “[T]he truth of a proposition to the effect that there is a sound at such-and-such a position must consist in this: if someone was to go to that position, he would have certain auditory experiences,” states Evans (1980, 274).

The claim that audition is not intrinsically spatial admits at least two readings. First, since Strawson suggests that audition might inherit spatial content from other sense modalities, such as vision or touch, it could mean that audition depends for its spatial content upon that of other modalities. If, unlike vision and touch, audition’s spatial capacities are parasitic upon those of other modalities, audition is spatial only thanks to its relations to other intrinsically spatial modalities. Second, it might be understood as a claim about the objects of audition. Strawson indicates that sounds themselves are not intrinsically spatial. He says that although sounds have pitch, timbre, and loudness, they lack “intrinsic spatial characteristics” (1959, 65). Since these interpretations are not clearly distinguished by Strawson, it is helpful to consider his master argument.

Strawson claims that a purely auditory experience would be non-spatial. By “purely auditory experience” Strawson means an exclusively auditory experience, or an auditory experience in absence of experience associated with any other modality. However, if any modality in isolation ever could ground spatial experience, audition could. On one hand, given the mechanisms of spatial hearing, it is empirically implausible that a normal acoustic environment with rich spatial cues would fail to produce even a minimally spatial purely auditory experience. Even listening only to stereo headphones could produce a directional auditory experience. On the other hand, it does seem possible that there could be a non-spatial but impoverished exclusively auditory experience if no binaural or other spatial cues were present. But similarly impoverished, non-spatial experiences seem possible for other modalities. Consider visually experiencing a uniform gray ganzfeld, or floating weightlessly in a uniformly warm bath. Neither provides the materials for spatial concepts, so neither differs from audition in this respect. One might contend that we therefore lack a good reason to think that, in contrast to a purely visual or tactile experience, a purely auditory experience would be an entirely non-spatial experience (see O’Callaghan 2010).

3.1.3 Does Audition Have Spatial Structure?

Nudds (2001) suggests another way to understand the claim, and interprets Strawson as making an observation about the internal structure of audition:

When we see (or seem to see) something, we see it as occupying or as located within a region of space; when we hear (or appear to hear) a sound we simply hear the sound, and we don’t experience it as standing in any relation to the space it may in fact occupy. (Nudds 2001, 213–14)

Audition, unlike vision, lacks a spatial structure or field, claims Nudds. A purely auditory experience thus would not comprise a spatial field into which individuals independent from oneself might figure. Following an example from Martin (1992), Nudds argues that while vision involves awareness of unoccupied locations, audition does not involve awareness of regions of space as empty or unoccupied. Martin’s example is seeing the space in the center of a ring as empty. In audition, Nudds claims, one never experiences a space as empty or unoccupied.

In response, one might simply deny a difference between vision and audition in this respect. If one can attend to a location near the center of the visible ring as empty, one can attend to the location between the sounding alarm clock and the slamming door as a place where there is no audible sound—as acoustically empty space. Of course, auditory space generally is less replete than visual space, but this is contingent. Consider seeing just a few stars flickering on and off against a dark sky. Since such an experience may have spatial structure, and since it is analogous to audition, one might on these grounds defend the claim that audition has spatial structure (see also Young 2017).

3.1.4 How Spatial Audition Differs from Spatial Vision

What about the second way mentioned above to understand Strawson’s claim? Though audition’s status as intrinsically spatial may not differ from that of vision or touch, perhaps sounds are not intrinsically spatial. But without further argument, or a commitment to a theory of sounds, it is difficult to state confidently the intrinsic features of sounds and thus whether they include spatial features. If, for instance, wavelength is among a sound’s intrinsic features, sounds are intrinsically spatial.

Nonetheless, the claim might be that, as they are perceptually experienced to be, sounds lack apparent intrinsic or non-relational spatial features. Roughly, independent from spatial relations to other sounds, experienced sounds seem to lack internal spatial structure. That is why you cannot auditorily experience the empty space at the center of a sound or hear its edges. Interpreted as such—that sounds are not experienced or perceptually represented to have inherent spatial features—the claim is plausible (though consider diffuse or spread out sounds in contrast to focused or pinpoint sounds). It certainly marks an important difference from vision, whose objects frequently not only seem to have rich internal spatial structure, but also are individuated in terms of inherent spatial features.

This difference, however, does not ground an argument that any purely auditory experience is non-spatial or that sounds fail to satisfy the requirement on objectivity, since sounds’ being experienced to have internal, intrinsic, or inherent spatial characteristics is necessary neither for spatial auditory experience nor to experience sounds as objective. Since sounds phenomenologically seem to be located in space and to bear extrinsic spatial relations to each other, auditory experience satisfies the requirements for objectivity, which need only secure the materials for a conception of a place for sounds to exist when not experienced.

So, vision and audition differ with respect to space in two ways. First, vision’s spatial acuity surpasses that of audition. Second, vision’s objects are perceptually experienced to have rich internal spatial structure, and audition’s are not. However, given the spatial characteristics evident in audition, such as direction and distance, the spatial status of audition presents no barrier to understanding its objects as perceiver-independent. The spatial aspects of auditory phenomenology thus may fail to ground an argument to the conclusion that sounds are modifications of one’s consciousness. If that is the case, then audition provides no special intuitive support for accounts on which private entities are the direct objects of perception.

3.2 Audible Qualities

3.2.1 Sounds and Colors

According to theories in which sounds are individuals, sounds are not secondary or sensible qualities. But, humans hear audible qualities, such as pitch, loudness, and timbre, that are analogous to colors, tastes, and scents. Thus, familiar accounts of colors and other sensible attributes or secondary qualities might apply to the audible qualities. For instance, pitches might be either dispositions to cause certain kinds of experiences in suitable subjects, the physical or categorical bases of such dispositions, sensations or projected features of auditory experiences, or simple primitive properties of (actual or edenic) sounds.

Tradition suggests that the form of a philosophical account of visible qualities, such as color, and their perception applies to other sensible qualities, such as pitch, flavor, and smell, and their perception. Thus, according to tradition, if dispositionalism, physicalism, projectivism, or primitivism about sensible qualities is true for features associated with one modality, it is true for features associated with others. Despite tradition, we should be wary to accept that a theory of sensible qualities translates plausibly across the senses.

Debates about sensible qualities and their perception begin with concerns about whether sensible features can be identified with or reduced to any objective physical features. What follows has two aims. The first is to give a sense of how such debates might go in the case of audible qualities. The focus is on pitch, since pitch is often compared to color, and the case of color is well known (for discussion of similar questions concerning timbre, see Isaac 2017). The second is to point out the most salient differences and similarities between the cases of color and pitch that impact the plausibility of arguments translated from one case to the other.

First, I consider two noteworthy arguments that are founded on aspects of color perception. Each aims to establish that the colors we perceive cannot be identified with objective physical features. Neither argument transposes neatly to the case of pitch. Thus, we should not assume arguments that are effective in the case of color have equal force when applied to other sensible qualities. Color perhaps is a uniquely difficult case.

Second, however, I discuss two respects in which pitch experience is similar to color experience. It is instructive that these aspects of pitch experience do raise difficulties for an objective physical account of pitch that are familiar from the case of color.

3.2.2 Pitch, Timbre, and Loudness

What are pitch, timbre, and loudness? Pitch is a dimension along which tones can be ordered according to apparent “height”. The pitch of fingernails scratching a blackboard generally is higher than that of thumping a washtub. Loudness can be glossed as the volume, intensity, or quantity of sound. A jet plane makes louder sounds than a model plane. Timbre is more difficult to describe. Timbre is a quality in which sounds that share pitch and loudness might differ. So, a violin, a cello, and a piano all playing the same note differ in timbre. Sometimes timbre is called “tone color”.

Physics and psychoacoustics show that properties including frequency, amplitude, and wave shape determine the audible qualities sounds (auditorily) appear to have. To simplify, take the case of pitch, since pitch often is compared to color. Not all sounds appear to have pitch. Some sounds appear to have pitch thanks to a simple, sinusoidal pattern of vibration at some frequency in an object or in the air. Some sounds appear pitched thanks to a complex pattern of vibration that can be decomposed into sinusoidal constituents at multiple frequencies, since any pattern of vibration can be analyzed as some combination of simple sinusoids. Sounds appear pitched, however, just when they have sinusoidal constituents, or partials, that all are integer multiples of a common fundamental frequency. Sounds with pitch thus correspond to regular or periodic patterns of vibration that differ in fundamental frequency and complexity. Simple sinusoids and complex waveforms match in pitch (though they typically differ in timbre) when they share fundamental frequency. This is true even when the complex tone lacks a sinusoidal constituent at the fundamental frequency, which is referred to as the phenomenon of the missing fundamental.

3.2.3 Is Pitch Physical?

A straightforward account identifies pitch with periodicity (perhaps within some range). Having pitch is being periodic (see O’Callaghan 2007, ch. 6). Periodicity can be expressed in terms of fundamental frequency, so individual pitches are fundamental frequencies. This has advantages as an account of pitch. It captures the linear ordering of pitches. It also explains the musical intervals, such as the octave, fifth, and fourth, for example, which are pitch relations that hold among periodic tones. Musical intervals correspond to whole-number ratios between fundamental frequencies. Sounds that differ by an octave have fundamental frequencies that stand in 1:2 ratios. Fifths involve a 2:3 relationship, fourths are 3:4, and so on. This also allows us to revise the linear pitch ordering to accommodate the auditory sense in which tones that differ by an octave nonetheless are the same pitch. If the pitch ordering is represented as a helix, upon which successive octave-related tones fall at a common angular position, each full rotation represents doubling frequency.

Is the periodicity theory of pitch plausible as an account of the audible features we perceive when hearing sounds? If so, then objective physicalism about at least some sensible qualities might succeed.

3.2.4 Disanalogies with Color

The periodicity theory of pitch fares better on two counts than theories that identify colors with objective physical properties.

First, consider the phenomenological distinction between unique and binary hues. Some colors appear to incorporate other colors, and some do not. Purple, for instance, appears both reddish and bluish; red just looks red. Some philosophers contend that the leading physical theories of color cannot explain the unique-binary distinction without essentially invoking the color experiences of subjects. How, for instance, do reflectance classes identified with unique hues differ from those associated with binary hues?

Consider a related issue concerning pitch. Some tones with pitch sound simple, while other pitched tones, such as sounds of musical instruments, auditorily appear to be complex and to have discernible components. However, the difference between audibly simple and audibly complex pitched tones is captured by the simplicity or complexity of a sound’s partials. Simple tones are sinusoids, and complex tones have multiple overtones. So, one response is to hold that the unique-binary color distinction and the simple-complex pitch distinction are disanalogous. Unlike the case of color, one might contend, no pitch that is essentially a mixture of other pitches solely occupies a distinctive place in pitch space.

Second, consider metamerism. Some surfaces with very different reflectance characteristics match in color. Metameric pairs share no obvious objective physical property. Some philosophers argue that unless color experience fails to distinguish distinct colors, metamers preclude identifying colors with natural physical properties of surfaces (see the entry on color).

Now consider the case of pitch. Are there pitch metamers? Some sounds with very different spectral frequency profiles match in pitch. A simple sinusoidal tone at a given frequency matches the pitch of each complex tone with that fundamental frequency (even those that lack a constituent at the fundamental). But, again, the case of pitch differs from the case of color. For each matching pitch, a single natural property does unify the class. The tones all share a fundamental frequency.

3.2.5 Analogies with Color

Two kinds of argument familiar from the case of color are equally pressing when applied to the case of pitch.

First, arguments from intersubjective variation transpose. Actual variations in frequency sensitivity exist among perceivers; for instance, subjects differ in which frequency they identify as middle C. If there is no principled way to legislate whose experience is veridical, pitch might be subjective or perceiver-relative. One response is that, in contrast to the case of unique red, there is an objective standard for middle C: fundamental frequency. But, whose pitch experience has the normative significance to settle the frequency of middle C?

Some might wonder whether there is a pitch analog of the trouble posed by the kind of variation associated with spectrum inversion in the case of color (see the entry on inverted qualia). Spectral shift in pitch, sometimes dramatic, commonly occurs after cochlear implant surgery. This is not spectral inversion for pitch; but, a dramatic shift makes most of the same trouble as inversion. Not quite all the trouble, since cochlear implants preserve the pitch ordering and its direction. But, there could be a cochlear implant that switched the placement of electrodes sensitive to 100 hertz and 1000 hertz, respectively; and there could be one that reversed the entire electrode ordering. This goes some distance to grounding the conceivability of a pitch inversion that reverses the height ordering of tones.

Second, consider an argument that frequencies cannot capture the relational structure among the pitches. This is loosely analogous to the argument that physicalism about color fails to capture the relational structure of the hues—for instance, that red is more similar to orange than either is to green. In the case of pitch, psychoacoustics experiments show that perceived pitch does not map straightforwardly onto frequency. Though each unique pitch corresponds to a unique frequency (or small frequency range), the relations among apparent pitches do not match those among frequencies. In particular, equivalent pitch intervals do not correspond to equal frequency intervals. For example, the effect upon perceived pitch of a 100 hertz change in frequency varies dramatically across the frequency range. It is dramatic at low frequency and barely detectable at high frequency. Similarly, doubling frequency does not make for equivalent pitch intervals. A 1000 hertz tone must be tripled in frequency to produce the same increase in pitch as that produced by quadrupling the frequency of a 2000 hertz tone. Apparent pitch is a complex function of frequency; it is neither linear nor logarithmic (see, e.g., Hartmann 1997, ch 12, Gelfand 2004, ch 12, Zwicker and Fastl 2006, ch 5). Pitch scales that capture the psychoacoustic data assign equal magnitudes, commonly measured in units called mels, to equal pitch intervals. The mel scale of pitch thus is an extensive or numerical pitch scale, in contrast to the intensive frequency scale for pitch. The former, but not the latter, preserves ratios among pitches.

S. S. Stevens famously argued on the basis of results drawn from psychoacoustic experiments that pitch is not frequency (see, e.g., Stevens et al. 1937, Stevens and Volkmann 1940). In light of similar results, contemporary psychoacoustics researchers commonly reject the identification of pitch with frequency or periodicity. The received scientific view thus holds that pitch is a subjective or psychological quality that is no more than correlated with objective frequency (see, e.g., Gelfand 2004, Houtsma 1995). Pitch, on this understanding, belongs only to experiences. The received view of pitch therefore implies an error theory according to which pitch experience involves a widespread projective illusion.

What is the argument against the periodicity theory of pitch? Compare an argument against reflectance physicalism about color. Reflectance physicalism identifies each hue with a class of reflectances. Periodicity physicalism identifies each pitch with a fundamental frequency. In both cases, each determinate sensible feature is identified with a determinate physical property. In the color case, it is objected that reflectance classes do not bear the relations to each other that the colors bear. In the pitch case, the frequencies do not bear the relations to each other that the pitches bear. Thus, if the relational features among a class of sensible qualities are essential to them, an account that does not accurately capture those relations fails. Frequencies, according to this line of argument, do not stand in the relations essential to pitch.

This, of course, is a quite general phenomenon among sensible qualities. Brightness and loudness vary logarithmically with simple physical quantities. Even if we identified candidate molecules for smells, nothing suggests physical similarities would mirror their olfactory similarities.

In the case of pitch and other sensible features that can be put in a linear ordering, one might respond that the relational order is essential while the magnitudes are not. In that case, if pitch is frequency, pitch experience has the right structure, but distorts magnitudes of difference in pitch. This retains the periodicity theory and explains away the results in terms of pitch experiences.

Nonetheless, Pautz (2014, 3.5) has replied that this partial error account cannot be reconciled with certain types of possible intersubjective difference. So, suppose instead we accept that the mel scale is well-founded and that it accurately captures essential relationships among pitches. This does not by itself imply a projective or subjective theory of pitch. Pitches might be dispositions to produce certain kinds of experiences, or they might be simple or primitive properties. It also is open to seek a more adequate physical candidate for pitch. For instance, pitches might be far more complex physical properties than frequencies. Such physical properties may be of no interest in developing the simplest, most complete natural physical theory, but they may be anthropocentrically interesting.

It is an important question whether a physical theory of sensible features should just provide a physical candidate for each determinate sensible feature, or whether the physical relationships among those physical candidates should capture the structural relations among sensible qualities (and, if so, which structural relations it should capture). This is an example of how considering in detail the nature and the experience of sensible qualities other than color promises insights into traditional debates concerning the sensible qualities. Pautz (2014) offers an empirically-grounded argument concerning a variety of sensible qualities, including audible qualities, that advances such discussion.

4. Varieties of Auditory Perception

4.1 Musical Listening

Musical listening is a topic that bears on questions about the relationship between hearing sounds and hearing sources. While the philosophy of music has its own vast literature (see the entry on the philosophy of music), musical experience has not been used as extensively to explore general philosophical questions about auditory perception. This section discusses links that are relevant to advancing philosophical work on auditory perception.

4.1.1 Acousmatic Experience

An account of listening to pure or non-vocal music should capture the aesthetic significance of musical listening. Appreciating music is appreciating sounds and sequences, arrangements, or structures of sounds. Thus, the temporal aspects of auditory experiences are critical to appreciatively listening to music.

One might go further and hold that sounds are all that matters in music. In particular, some have argued that appreciatively listening to music demands listening in a way that abstracts from the environmental significance, and thus from the specific sources, of the sounds it includes (Scruton 1997, 2–3). Such acousmatic listening involves experiencing sounds in a way that is “detached from the circumstances of their production,” rather than “as having a certain worldly cause” (Hamilton 2007, 58; see also Hamilton 2009). Listening to music and being receptive to its aesthetically relevant features requires not listening to violins, horns, or brushes on snare drums. It requires hearing sounds and grasping them in a way removed from their common sources. Hearing a high fidelity recording thus furnishes an aesthetically identical musical experience despite having a speaker cone rather than a violin as source. “The acousmatic experience of sound is precisely what is exploited by the art of music” (Scruton 1997, 3).

This suggests an intuitive difference between music and visual arts such as painting and sculpture. As Kivy (1991) explains, it is difficult even with the most abstract paintings and sculptures to see them in a way that takes them to be entirely formal or abstract. That is, it is difficult to avoid seeing pictures and sculptures as representational. In contrast, it seems easier to listen attentively to the formal acoustical features of musical sounds, without being compelled to think of what makes them.

Musical listening thus may be thought to provide a prima facie argument against the claim that in hearing sounds one typically hears sound sources such as the strumming of guitars and bowing of violins. If such “interested” audition were the rule, musical listening would be far more challenging.

4.1.2 Acousmatic Listening as Attention to Sounds

Acousmatic experience, however, may be a matter of attention. Nothing prevents focusing one’s attention on the sounds and audible qualities without attending to the instruments, acts, and events that are their sources, even if each is auditorily available. That musical listening requires effort and training supports the idea that one can direct attention differently in auditory experience, depending on one’s interests. Caring for an infant and safely crossing the street require attending to sound sources, while listening with aesthetic appreciation to a symphony may require abstracting from the circumstances of its production, such as the finger movements of the oboist. This response holds that musical listening is a matter of auditorily attending in a certain way. It is attending to features of sounds themselves, but does not imply failing to hear sound sources.

The acousmatic thesis is a limited view about which aspects of the things one can auditory experience are aesthetically significant. These include audible aspects of sounds themselves, but exclude, for example, other contents of auditory experience. However, room exists for debate over the aesthetically significant aspects of what you hear (see Hamilton 2007, 2009). For example, one might argue that live performances have aesthetic advantages over recordings because one hears the performance of the sounds and songs, rather than their reproduction by loudspeakers (cf. Mag Uidhir 2007). Circumstances of sound production, such as that skillful gestures generate a certain passage, or that a particularly rare wood accounts for a violin’s sounds, might be aesthetically relevant in a way that outstrips the sounds, and some such features may be audible in addition to sounds. For instance, hearing the spatial characteristics of a performance may hold aesthetic significance beyond the tones and structures admitted by traditional accounts of musical listening. Composers may even intend “spatial gestures” among aspects essential for the appreciation of a piece (see, e.g., Solomon 2007). To imagine auditorily experiencing the spatial characteristics of music in a way entirely divorced from the environmental significance of the sounds is difficult. Appreciating the relationship between experiences of sounds and of sources makes room for a view of the aesthetic value of musical listening that is more liberal than acousmatic experience allows.

4.2 Speech Perception

4.2.1 Is Speech Special?

Speech perception presents uniquely difficult twists, and few philosophers have confronted it directly (Appelbaum 1999, Trout 2001a, Matthen 2005, ch 9, and Remez and Trout 2009 are recent exceptions). Something striking and qualitatively distinctive—perhaps uniquely human—seems to set the perception of speech apart from ordinary hearing. The main philosophical issues about speech perception concern versions of the question, Is speech special? (See O’Callaghan 2015 for a comprehensive review and discussion.)

How does perceiving speech differ from perceiving ordinary non-linguistic sounds? Listening to music and listening to speech each differ from listening to other environmental sounds in the following respect. In each case, one’s interest in listening is to some degree distanced from the specific environmental happenings involved in the production of sounds.

But this is true of listening to music and of listening to speech for different reasons. In music, it is plausible that one’s interest is in the sounds themselves, rather than in the sources of their production. However, speech is a vehicle for conventional linguistic meaning. In listening to speech, one’s main interest is in the meanings, rather than in the sources of sound. Ultimately, the information conveyed is what matters.

Nevertheless, according to the most common philosophical understanding, perceiving spoken utterances is just a matter of hearing sounds. The sounds of speech are complex audible sound structures. Listening to speech in a language you know typically involves grasping meanings, but grasping meanings requires first hearing the sounds of speech. According to this account, grasping meanings itself is a matter of extra-perceptual cognition.

The commonplace view—that perceiving speech is a variety of ordinary auditory perception that just involves hearing the sounds of speech—has been challenged in a number of ways. The challenges differ in respect of how speech perception is held to differ from non-linguistic audition.

4.2.2 The Objects of Speech Perception

First, consider the objects of speech perception. What are the objects of speech perception, and do they differ from those of ordinary or non-linguistic auditory perception? According to the commonplace understanding, hearing speech involves hearing sounds. Thus, hearing spoken language shares perceptual objects with ordinary audition. Alternatively, one might hold that the objects of speech perception are not ordinary sounds at all. Perhaps they are language-specific entities, such as phonemes or words. Perhaps, as some have argued, perceiving speech involves perceiving articulatory gestures or movements of the mouth and vocal organs (see the supplement on Speech Perception: Empirical and Theoretical Considerations). Note that if audition’s objects typically include distal events, speech in this respect is not special, since its objects do not belong to an entirely different kind from ordinary sounds.

4.2.3 The Contents of Speech Perception

Second, consider the contents of speech perception. Does the content of speech perception differ from that of ordinary audition? If it does, how does the experience of perceiving speech differ from that of hearing ordinary sounds? Perceiving speech might involve hearing ordinary sounds but auditorily ascribing distinctive features to them. These features might simply be, or comprise, finer grained qualitative and temporal acoustical details than non-linguistic sounds audibly possess. But perceiving speech also might involve perceiving sounds as belonging to language-specific types, such as phonemes, words, or other syntactic categories.

Furthermore, speech perception’s contents might differ in a more dramatic way from those of non-linguistic audition. Listening with understanding to speech involves grasping meanings. The commonplace view is conservative. It holds that grasping meanings is an act of the understanding rather than of audition. Thus, the difference between the experience of listening to speech in a language you know and the experience of listening to speech in a language you do not know is entirely cognitive.

But one might think that there also is a perceptual difference. A liberal account of this perceptual difference holds that perceiving speech in a language you know may involve hearing sounds as meaningful or auditorily representing them as having semantic properties (see, e.g., Siegel 2006, Bayne 2009, Azzouni 2013, Brogaard 2018; cf. O’Callaghan 2011b, Reiland 2015). Alternatively, a moderately liberal account holds that the perceptual experience of speech in a language you know involves perceptually experiencing language-specific but nevertheless non-semantic features. For instance, O’Callaghan (2011b) argues that listening to speech in a familiar language typical involves perceiving its phonological features.

4.2.4 Is Speech Perception Auditory?

Third, consider the processes responsible for speech perception. To what extent does perceiving speech implicate processes that are continuous with those of ordinary or general audition, and to what extent does perceiving speech involve separate, distinctive, or modular processes? While some defend general auditory accounts of speech perception (see, e.g, Holt and Lotto 2008), some argue that perceiving speech involves dedicated perceptual resources, or even an encapsulated perceptual system distinct from ordinary non-linguistic audition (see, e.g., Fodor 1983, Pinker 1994, Liberman 1996, Trout 2001b). These arguments typically are grounded in several types of phenomena, including the multimodality of speech perception—visual cues about the movements of the mouth and tongue impact the experience of speech, as demonstrated by the McGurk effect (see the section 4.3 Crossmodal Influences); duplex perception—a particular stimulus sometimes contributes simultaneously both to the experience of an ordinary sound and to that of a speech sound (Rand 1974); and the top-down influence of linguistic knowledge upon the experience of speech. A reasonable challenge is that each of these characteristics—multimodality, duplex perception, and top-down influence—also is displayed in general audition.

See the supplement on Speech Perception: Empirical and Theoretical Considerations.

4.3 Crossmodal Influences

4.3.1 Crossmodal Illusions

Auditory perception of speech is influenced by cues from vision and touch (see Gick et al. 2008). The McGurk effect in speech perception leads to an illusory auditory experience caused by a visual stimulus (McGurk and Macdonald 1976). Do such multimodal effects occur in ordinary audition? Visual and tactile cues commonly do shape auditory experience. The ventriloquist illusion is an illusory auditory experience of location that is produced by an apparent visible sound source (see, e.g., Bertelson 1999). Audition even impacts experience in other modalities. The sound-induced flash effect involves a visual illusion as of seeing two consecutive flashes that is produced when a single flash is accompanied by two consecutive beeps (Shams et al. 2000, 2002). Such crossmodal illusions demonstrate that auditory experience is impacted by other modalities and that audition influences other modalities. In general, experiences associated with one perceptual modality are influenced by stimulation to other sensory systems.

4.3.2 Causal or Constitutive?

An important question is whether the impact is merely causal, or whether perception in one modality is somehow constitutively tied to other modalities. If, for instance, vision merely causally impacts your auditory experience of a given sound, then processes associated with audition might be proprietary and characterizable in terms that do not appeal to other modalities. Relying on information from vision or touch could simply improve the existing capacity to perceive space, time, or spoken language auditorily. On the other hand, coordination between audition and other senses could enable a new perceptual capacity. In that case, audition might rely constitutively on another sense.

A first step in resolving this question is recognizing that crossmodal illusions are not mere accidents. Instead, they are intelligible as the results of adaptive perceptual strategies. In ordinary circumstances, crossmodal processes serve to reduce or resolve apparent conflicts in information drawn from several senses. In doing so, they tend to make perception more reliable overall. Thus, crossmodal illusions differ from synaesthesia. Synaesthesia is just a kind of accident. It results from mere quirks of processing, and it always involves illusion (or else is accidentally veridical). Crossmodal recalibrations, in contrast, are best understood as attempts “to maintain a perceptual experience consonant with a unitary event” (Welch and Warren 1980, 638).

In the first place, the principled reconciliation of information drawn from different sensory sources suggests that audition is governed by extra-auditory perceptual constraints. Moreover, since conflict requires a common subject matter, such constraints must concern common sources of stimulation to multiple senses. If so, audition and vision share a perceptual concern for a common subject matter. And that concern is reflected in the organization of auditory experience. But this by itself does not establish constitutive dependence of audition on another sense.

However, the perceptual concern for a common subject matter could be reflected as such in certain forms of auditory experience. For instance, the commonality may be experientially evident in jointly perceiving shared spatio-temporal features, or in the perceptual experience of audio-visual intermodal feature binding. If so, some forms of auditory perceptual experience may share with vision a common multimodal or amodal content or character (see O’Callaghan 2008b, Clark 2011). More to the point, if coordination with another sense enables a new auditory capacity, then vision or touch could have a constitutive rather than merely causal impact upon corresponding auditory experiences.

4.3.3 Multimodality in Perception

What hangs on this? First, it bears on questions about audition’s content. If we cannot exhaustively characterize auditory experience in terms that are modality-specific or distinctive to audition, then we might hear as of things we can see or experience with other senses. This is related to one puzzling question about hearing sound sources: How could you hear as of something you could see? Rather than just a claim about audition’s content that requires further explanation, we now have a story about why things like sound sources figure in the content of auditory experience. Second, all of this may bear on how to delineate what counts as auditory perception, as opposed to visual or even amodal perception. If hearing is systematically impacted by visual processes, and if it shares content and phenomenology with other sense experiences, what are the boundaries of auditory perception? Multimodal perception may bear on the question of whether there are clear and significant distinctions among the sense modalities (cf. Nudds 2003). Finally, multimodal perceptual experiences, illusions, and explanatory strategies may illuminate the phenomenological unity of experiences in different modalities, or the sense in which, for instance, an auditory experience and a visual experience of some happening comprise a single encompassing experience (see the entry on the unity of consciousness).

We can ask questions about the relationships among modalities in different areas of explanatory concern. Worthwhile areas for attention include the objects, contents, and phenomenology of perception, as well as perceptual processes and their architecture. Crossmodal and multimodal considerations might shed doubt on whether vision-based theorizing alone can deliver a complete understanding of perception and its contents. This approach constitutes an important methodological advance in the philosophical study of perception (for further discussion, see O’Callaghan 2012, 2019, Matthen 2015, Stokes et al. 2015).

5. Conclusion and Future Directions

Considering modalities other than vision enhances our understanding of perception. It is necessary to developing and vetting an adequate comprehensive and general account of perception and its roles. Auditory perception is a rich territory for philosophical exploration in its own right, but it also provides a useful contrast case in which to evaluate claims about perception proposed in the context of vision. One of the most promising directions for future work concerns the nature of the relationships among perceptual modalities, how these relationships shape experience across modalities, and how they may prove essential to understanding perception itself. Philosophical work on auditory perception thus is part of the advance beyond considering modalities in isolation from each other.


  • Appelbaum, I., 1996, “The lack of invariance problem and the goal of speech perception,” ICSLP-1996, 3(435): 1541–1544.
  • –––, 1999, “The dogma of isomorphism: A case study from speech perception,” Philosophy of Science, 66 (Supplement. Proceedings of the 1998 Biennial Meetings of the Philosophy of Science Association. Part I: Contributed Papers): S250–S259.
  • Azzouni, J., 2013, Semantic Perception: How the Illusion of a Common Language Arises and Persists, Oxford: Oxford University Press.
  • Batty, C., 2010, “Scents and sensibilia,” American Philosophical Quarterly, 47: 103–118.
  • –––, 2011, “Smelling lessons,” Philosophical Studies, 153: 161–174.
  • Bayne, T., 2009, “Perception and the reach of phenomenal content,” Philosophical Quarterly, 59: 385–404.
  • Bermúdez, J. L., 2000, “Naturalized sense data,” Philosophy and Phenomenological Research, 61(2): 353–374.
  • Bertelson, P., 1999, “Ventriloquism: A case of cross-modal perceptual grouping,” in G. Aschersleben, T. Bachmann, and J. Músseler (eds.), Cognitive Contributions to the Perception of Spatial and Temporal Events, Amsterdam: Elsevier, pp. 347–317.
  • Blauert, J., 1997, Spatial Hearing: The Psychophysics of Human Sound Localization, Cambridge, MA: MIT Press.
  • Bloomfield, L., 1933, Language, New York: Holt.
  • Blumstein, S. E. and K. N. Stevens, 1981, “Phonetic features and acoustic invariance in speech,” Cognition, 10: 25–32.
  • Bosch, L. and N. Sebastián-Gallés, 1997, “Native-language recognition abilities in 4-month-old infants from monolingual and bilingual environments,” Cognition, 65(1): 33–69.
  • Bregman, A. S., 1990, Auditory Scene Analysis: The Perceptual Organization of Sound, Cambridge, MA: MIT Press.
  • Brogaard, B., 2018, “In defense of hearing meanings,” Synthese, 195: 2967–2983.
  • Bullot, N. and P. Egré (eds.), 2010, Objects and Sound Perception, Review of Philosophy and Psychology, 1.
  • Casati, R. and J. Dokic, 1994, La Philosopie du Son, Nîmes: Chambon.
  • –––, 2005, “Sounds,” in The Stanford Encyclopedia of Philosophy (Spring 2009 Edition), Edward N. Zalta (ed.), URL = <>.
  • Casati, R., E. Di Bona, and J. Dokic, 2013, “The Ockhamization of the event sources of sound,” Analysis, 73(3): 462–466.
  • Clark, A., 2000, A Theory of Sentience, New York: Oxford University Press.
  • –––, 2011, “Cross-modal cuing and selective attention,” in F. MacPherson (ed.), The Senses. Oxford: Oxford University Press.
  • Cohen, J., 2009, “Sounds and temporality,” Oxford Studies in Metaphysics, 5: 303–320.
  • Cooper, F. S., P. C. Delattre, A. M. Liberman, J. M. Borst, and L. J. Gerstman, 1952, “Some experiments on the perception of synthetic speech sounds,” Journal of the Acoustical Society of America, 24: 597–606.
  • Diehl, R. L., A. J. Lotto, and L. L. Holt, 2004, “Speech perception,” Annual Review of Psychology, 55: 149–179.
  • Evans, G., 1980, “Things without the mind—a commentary upon Chapter Two of Strawson’s Individuals,” in Z. van Straaten (ed.), Philosophical Subjects: Essays Presented to P. F. Strawson, Oxford: Clarendon Press; reprinted in G. Evans, 1985, Collected Papers, Oxford: Clarendon Press.
  • Fodor, J. A., 1983, The Modularity of Mind, Cambridge, MA: MIT Press.
  • Fulkerson, M., 2013, The First Sense: A Philosophical Study of Human Touch, Cambridge, MA: MIT Press.
  • –––, 2016, “Touch,” in The Stanford Encyclopedia of Philosophy (Spring 2016 Edition), Edward N. Zalta (ed.), URL = <>.
  • Fowler, C. A., 1986, “An event approach to the study of speech perception from a direct-realist perspective,” Journal of Phonetics, 14: 3–28.
  • Fowler, G., 2013, “Against the primary sound account of echoes,” Analysis, 73: 466–473.
  • Gelfand, S. A., 2004, Hearing: An Introduction to Psychological and Physiological Acoustics, 4th edition, New York: Marcel Dekker.
  • Gick, B., K. M. Jóhannsdóttir, D. Gibraiel, and J. Mühlbauer, 2008, “Tactile enhancement of auditory and visual speech perception in untrained perceivers,” Journal of the Acoustical Society of America, 123(4): EL72–76.
  • Green, E. J., 2019, “A theory of perceptual objects,” Philosophy and Phenomenological Research, 99(3): 663–693.
  • Griffiths, T. D. and J. D. Warren, 2004, “What is an auditory object?” Nature Reviews Neuroscience, 5: 887–892.
  • Hamilton, A., 2007, Aesthetics and Music. London: Continuum.
  • –––, 2009, “The sound of music,”, in Nudds and O’Callaghan 2009, pp. 146–182.
  • Handel, S., 1995, “Timbre perception and auditory object identification,” in B. C. Moore (ed.), Hearing, San Diego, CA: Academic Press, pp. 425–461.
  • Hartmann, W. M., 1997, Signals, Sound, and Sensation, New York: Springer.
  • Heald, S. L. M., S. C. Van Hedger, and H. C. Nusbaum, 2017, “Perceptual plasticity for auditory object recognition,” Frontiers in Psychology, 8: 781.
  • Holt, L. L. and A. J. Lotto, 2008, “Speech perception within an auditory cognitive science framework,” Current Directions in Psychological Science, 17(1): 42–46.
  • Houtsma, A., 1995, “Pitch perception,” in B. C. J. Moore (ed.), Hearing, New York: Academic Press, pp. 267–291.
  • Isaac, A. M. C., 2018, “Prospects for timbre physicalism,” Philosophical Studies, 175(2): 503–529.
  • Jackson, F., 1977, Perception: A Representative Theory, Cambridge: Cambridge University Press.
  • Kivy, P., 1991, Music Alone, Ithaca, NY: Cornell University Press.
  • Kubovy, M., 1981, “Concurrent pitch-segregation and the theory of indispensable attributes,” in M. Kubovy and J. R. Pomerantz (eds.), Perceptual Organization, Hillsdale, NJ: Erlbaum, pp. 55–98.
  • Kubovy, M. and D. Van Valkenburg, 2001, “Auditory and visual objects,” Cognition, 80: 97–126.
  • Kuhl, P. K., 2000, “A new view of language acquisition,” Proceedings of the National Academy of Science, 97(22): 11850–11857.
  • Kulvicki, J., 2008, “The nature of noise,” Philosophers’ Imprint, 8(11): 1–16.
  • Leddington, J. P., 2019, “Sounds fully simplified,” Analysis, 79(4): 621–629.
  • Leslie, A. M., F. Xu, P. D. Tremoulet, and B. J. Scholl, 1998, “Indexing and the object concept: developing ‘what’ and ‘where’ systems,” Trends in Cognitive Sciences, 2(1): 10–18.
  • Liberman, A. M., 1970, “The grammars of speech and language,” Cognitive Psychology, 1(4): 301–323.
  • –––, 1996, Speech: A Special Code, Cambridge, MA: MIT Press.
  • Liberman, A. M., F. S. Cooper, D. P. Shankweiler, and M. Studdert-Kennedy, 1967, “Perception of the speech code,” Psychological Review, 74(6): 431–461.
  • Liberman, A. M. and I. G. Mattingly, 1985, “The motor theory of speech perception revised,” Cognition, 21: 1–36.
  • –––, 1989, “A specialization for speech perception,” Science, 243(4890): 489–494.
  • Locke, J., 1689/1975, An Essay Concerning Human Understanding, Oxford: Clarendon Press.
  • Lotto, A. J., K. R. Kluender, and L. L. Holt, 1997, “Animal models of speech perception phenomena,” in K. Singer, R. Eggert, and G. Anderson (eds.), Chicago Linguistic Society, 33, Chicago: Chicago Linguistic Society, pp. 357–367.
  • Lycan, W., 2000, “The slighting of smell,” in N. Bhushan and S. Rosenfeld (eds.), Of Minds and Molecules: New Philosophical Perspectives on Chemistry, Oxford: Oxford University Press, pp. 273–89.
  • Maclachlan, D. L. C., 1989, Philosophy of Perception, Englewood Cliffs, NJ: Prentice Hall.
  • Mag Uidhir, C., 2007, “Recordings as performances,” British Journal of Aesthetics, 47(3): 298–314.
  • Malpas, R. M. P., 1965, “The location of sound,” in R. J. Butler (ed.), Analytical Philosophy, Second Series, Oxford: Basil Blackwell, pp. 131–144.
  • Martin, M. G. F., 1992, “Sight and touch,” in T. Crane (ed.), The Contents of Experience, Cambridge: Cambridge University Press.
  • –––, 1993, “Sense modalities and spatial properties,” in N. Eilan, R. McCarthy, and B. Brewer (eds.), Spatial Representation: Problems in Philosophy and Psychology, Oxford: Blackwell.
  • Matthen, M., 2005, Seeing, Doing, and Knowing: A Philosophical Theory of Sense Perception, Oxford: Oxford University Press.
  • –––, 2010, “On the diversity of auditory objects,” Review of Philosophy and Psychology, 1: 63–89.
  • ––– (ed.), 2015, Oxford Handbook of Philosophy of Perception, Oxford: Oxford University Press.
  • McGurk, H. and J. MacDonald, 1976, “Hearing lips and seeing voices,” Nature, 264: 746–748.
  • Mehler, J., P. Jusczyk, G. Lambertz, N. Halsted, J. Bertoncini, and C. Amiel-Tison, 1988, “A precursor of language acquisition in young infants,” Cognition, 29: 143–178.
  • Mole, C., 2009, “The Motor Theory of speech perception,” in M. Nudds and C. O’Callaghan (eds.), Sounds and Perception: New Philosophical Essays, Oxford: Oxford University Press.
  • Nakayama, K., Z. J. He, and S. Shimojo, 1995, “Visual surface representation,” in S. M. Kosslyn and D. N. Osherson (eds.), Visual Cognition, Volume 2 of An Invitation to Cognitive Science, second edition, Cambridge, MA: MIT, pp. 1–70.
  • Noë, A., 2004, Action in Perception, Cambridge, MA: MIT Press.
  • Nudds, M., 2001, “Experiencing the production of sounds,” European Journal of Philosophy, 9: 210–229.
  • –––, 2003, “The significance of the senses,” Proceedings of the Aristotelian Society, 104(1): 31–51.
  • –––, 2010, “What are auditory objects?” Review of Philosophy and Psychology, 1: 105–122.
  • Nudds, M. and C. O’Callaghan, 2009, Sounds and Perception: New Philosophical Essays, Oxford: Oxford University Press.
  • O’Callaghan, C., 2007, Sounds: A Philosophical Theory, Oxford: Oxford University Press.
  • –––, 2008a, “Object perception: Vision and audition,” Philosophy Compass, 3: 803–829.
  • –––, 2008b, “Seeing what you hear: Cross-modal illusions and perception,” Philosophical Issues, 18: 316–338.
  • –––, 2010, “Perceiving the locations of sounds,” Review of Philosophy and Psychology, 1: 123–140.
  • –––, 2011a, “Hearing properties, effects or parts?” Proceedings of the Aristotelian Society, 111: 375–405.
  • –––, 2011b, “Against hearing meanings,” Philosophical Quarterly, 61: 783–807.
  • –––, 2012, “Perception and multimodality,” in E. Margolis, R. Samuels, and S. Stich (eds.), Oxford Handbook of Philosophy of Cognitive Science, Oxford: Oxford University Press, pp. 92–117.
  • –––, 2015, “Speech perception,” in M. Matthen (ed.), Oxford Handbook of Philosophy of Perception, Oxford: Oxford University Press, pp. 475–494.
  • –––, 2016, “Objects for multisensory perception,” Philosophical Studies, 173(5): 1269–1289.
  • –––, 2019, A Multisensory Philosophy of Perception, Oxford: Oxford University Press.
  • O’Shaughnessy, B., 1989, “The sense of touch,” Australasian Journal of Philosophy, 69: 37–58.
  • –––, 2002, Consciousness and the World, Oxford: Oxford University Press.
  • Pasnau, R., 1999, “What is sound?” Philosophical Quarterly, 49: 309–324.
  • –––, 2000, “Sensible qualities: The case of sound,” Journal of the History of Philosophy, 38: 27–40.
  • Pautz, A., 2014, “The real trouble for phenomenal externalists,” in R. Brown (ed.), Consciousness Inside and Out: Phenomenology, Neuroscience, and the Nature of Experience, New York: Springer, pp. 237–298.
  • –––, 2017, “Experiences are representations: An empirical argument,” in B. Nanay (ed.), Current Controversies in Philosophy of Perception, New York: Routledge, pp. 23–43.
  • Pinker, S., 1994, The Language Instinct, New York: William Morrow.
  • Rand, T. C., 1974, “Dichotic release from masking for speech,” Journal of the Acoustical Society of America, 55: 678–680.
  • Remez, R. E. and J. D. Trout, 2009, “Philosophical messages in the medium of spoken language,” in M. Nudds and C. O’Callaghan (eds.), Sounds and Perception: New Philosophical Essays, Oxford: Oxford University Press, pp. 234–263.
  • Rey, G., 2012, “Externalism and inexistence in early content,” in R. Schantz (ed.), Prospects for Meaning, New York: de Gruyter, pp. 503–530.
  • Richardson, L., 2013, “Flavour, taste and smell,” Mind and Language, 28(3): 322–341.
  • Rosenblum, L. D., 2004, “Perceiving articulatory events: Lessons for an ecological psychoacoustics,” in J. G. Neuhoff (ed.), Ecological Psychoacoustics, Chapter 8, San Diego, CA: Elsevier, pp. 220–248.
  • Scholl, B. J., 2001, “Objects and attention: the state of the art,” Cognition, 80: 1–46.
  • Scott, M., 2001, “Tactual perception,” Australasian Journal of Philosophy, 79(2): 149–160.
  • Scruton, R., 1997, The Aesthetics of Music, Oxford: Oxford University Press.
  • Shams, L., Y. Kamitani, and S. Shimojo, 2000, “What you see is what you hear,” Nature, 408: 788.
  • –––, 2002, “Visual illusion induced by sound,” Cognitive Brain Research, 14: 147–152.
  • Siegel, S., 2006, “Which properties are represented in perception?” in T. Gendler and J. Hawthorne (eds.), Perceptual Experience, New York: Oxford University Press, pp. 481–503.
  • Smith, A. D., 2002, The Problem of Perception, Cambridge, MA: Harvard University Press.
  • Smith, B. C., 2015, “The chemical senses,” in M. Matthen (ed.), Oxford Handbook of Philosophy of Perception, Oxford: Oxford University Press, pp. 314–352.
  • Solomon, J., 2007, Spatialization in Music: The Analysis and Interpretation of Spatial Gestures, Ph.D. thesis, Department of Music, University of Georgia, Athens, GA. [available online (in PDF)]
  • Sorensen, R., 2008, Seeing Dark Things, New York: Oxford University Press.
  • Soto-Faraco, S., J. Navarra, W. M. Weikum, A. Vouloumanos, N. Sebastián-Gallés, and J. F. Werker, 2007, “Discriminating languages by speech-reading,” Perception and Psychophysics, 69(2): 218.
  • Spelke, E. S., 1990, “Principles of object perception,” Cognitive Science, 14: 29–56.
  • Stevens, S. and J. Volkmann, 1940, “The relation of pitch to frequency: A revised scale,” American Journal of Psychology, 53: 329–353.
  • Stevens, S., J. Volkmann, and E. Newman, 1937, “A scale for the measurement of the psychological magnitude pitch,” Journal of the Acoustical Society of America, 8(3): 185–190.
  • Stokes, D., M. Matthen, and S. Biggs (eds.), 2015, Perception and Its Modalities, New York: Oxford University Press.
  • Strawson, P. F., 1959, Individuals, New York: Routledge.
  • Trout, J. D., 2001a, “Metaphysics, method, and the mouth: Philosophical lessons of speech perception,” Philosophical Psychology, 14(3): 261–291.
  • –––, 2001b, “The biological basis of speech: What to infer from talking to the animals,” Psychological Review, 108(3): 523–549.
  • Van Valkenburg, D. and M. Kubovy, 2003, “In defense of the theory of indispensable attributes,” Cognition, 87: 225–233.
  • Vouloumanos, A. and J. F. Werker, 2007, “Listening to language at birth: evidence for a bias for speech in neonates,” Developmental Science, 10(2): 159–164.
  • Weikum, W. M., A. Vouloumanos, J. Navarra, S. Soto-Faraco, N. Sebastián-Gallés, and J. F. Werker, 2007, “Visual language discrimination in infancy,” Science, 316(5828): 1159.
  • Welch, R. B. and D. H. Warren, 1980, “Immediate perceptual response to intersensory discrepancy,” Psychological Bulletin, 88(3): 638–667.
  • Werker, J., 1995, “Exploring developmental changes in cross-language speech perception,” in L. Gleitman and M. Liberman (eds.), Language: An Invitation to Cognitive Science, Volume 1, 2nd edition, Cambridge, MA: MIT Press, pp. 87–106.
  • Young, N., 2017, “Hearing spaces,” Australasian Journal of Philosophy, 95(2): 242–255.
  • –––, 2018, “Hearing objects and events,” Philosophical Studies, 175(11): 2931–2950.
  • Zahorik, P. and F. Wightman, 2001, “Loudness constancy with varying sound source distance,” Nature Neuroscience, 4: 78–83.
  • Zwicker, E. and H. Fastl, 2006, Psychoacoustics: Facts and Models, 3rd edition, New York: Springer.


I am very grateful to David Chalmers, Maddy Kilbride, and Shaun Nichols for extensive and helpful comments on previous versions of this entry.

Copyright © 2020 by
Casey O’Callaghan <>

Open access to the SEP is made possible by a world-wide funding initiative.
The Encyclopedia Now Needs Your Support
Please Read How You Can Help Keep the Encyclopedia Free