Auditory Perception

First published Thu May 14, 2009

Auditory perception raises a host of challenging philosophical questions. What do we hear? What are the objects of hearing? What is the content and phenomenology of audition? Is hearing spatial? How does audition differ from vision and other sense modalities? How does the perception of sounds differ from that of colors and ordinary objects? This entry presents the main debates in this developing area and discusses promising avenues for future inquiry. It discusses the motivation for exploring non-visual modalities, how audition bears on theorizing about perception, and questions concerning the objects, contents, varieties, and bounds of auditory perception.

1. Other Modalities and the Philosophy of Perception

The philosophy of sounds and auditory perception is one emerging area of the philosophy of perception that reaches beyond vision for insights about the nature, objects, contents, and varieties of perception. This entry characterizes critical issues in the philosophy of auditory perception, which bear upon theorizing about perception more generally, and mentions outstanding questions and promising future areas for inquiry in this developing literature. Before beginning the substantive discussion of audition itself, it is worthwhile to discuss the motivation and rationale for this kind of work.

Philosophical thinking about perception has focused on vision. The philosophical puzzle of perception and proposed solutions have been shaped by concern for visual experience and illusions. Questions about the nature of perceptual content have been framed and evaluated in visual terms, and detailed accounts of what we perceive frequently address just the visual case. Vision informs our understanding of perception's epistemological role and its role in guiding action. It is not a great exaggeration to say that much of the philosophy of perception translates roughly as philosophy of visual perception.

Recently, however, other perceptual modalities have attracted attention. In addition to auditory perception and the experience of sound, touch and tactile awareness have generated philosophical interest concerning, for instance, the tactile and proprioceptive experience of space, the objects of touch, whether contact is required for touch, and whether distinct modalities detect pressure, heat, and pain (see, e.g., O'Shaughnessy 1989, Martin 1993, Scott 2001). The unique phenomenology of olfaction and smells has been used to argue that vision is atypical in supporting the transparency of perceptual experience (Lycan 2000, 282; cf. Batty 2007) and that perceptual objectivity does not require spatiality (Smith 2002, ch 5). Lycan (2000) even suggests that the philosophy of perception would have taken a different course had it focused upon olfaction instead of vision.

This kind of work is philosophically interesting in its own right. But it is also worthwhile because theorizing about perception commonly aims to address general questions about perception, rather than concerns specific to vision. Hope for a comprehensive and general understanding of perception rests upon extending and testing claims, arguments, and theories beyond vision. At least three approaches might be adopted, with potential for increasingly revisionist outcomes.

First, one might view work on non-visual modalities as filling out the particulars required for a thoroughly detailed account of perceiving that applies not just to vision but across the modalities. The idea is to translate what we have learned from the visual case into terms that apply to other modalities. This approach proceeds by assuming that we have a good working understanding of perception derived from the visual case and that vision is a representative or paradigmatic case of perceiving. One example of this kind of approach would be to develop a representational account of the phenomenology of auditory experience.

Second, however, we might ask whether considering other modalities either extends or challenges our vision-based understanding of perception. Non-visual cases might force us either to accommodate new kinds of phenomena absent from the visual case or to revise general conclusions supported by vision. The former suggests that while the vision-based understanding of perception is satisfactory as far as it goes, it leaves out critical pieces—for example, speech perception and multimodal perception might involve novel kinds of perceptual phenomena absent from the visual case. The latter involves discovering falsifying evidence in non-visual cases—for example, if olfactory experience is not diaphanous, the transparency thesis for perceptual experience fails.

Finally, we might attempt to determine whether any unified account exists that applies generally to all of the perceptual modalities. We can ask this question either at the level of quite specific claims, such as those concerning the objects of perception or the nature and structure of content. We can ask it about the relationships among perceiving, believing, and acting. Or we can ask it about the general theory of necessary and sufficient conditions for perceiving. Some philosophers, impressed by their findings concerning non-visual modalities, express skepticism whether a unified theory exists (e.g., Martin 1992).

Whatever the approach, extending our knowledge about perception beyond the visual requires systematic attention to individual modalities and careful accounting to determine how the results bear on general questions about perception. Whatever the outcome, not only is audition a rich new subject matter in its own right, but developing this subject matter is crucial to our overall understanding of perception.

2. The Objects of Auditory Perception

What do we hear? One way to address this question concerns the intentional objects of auditory perception.

2.1 Sounds

Maclachlan (1989) claims that, in the first instance, sounds are what we hear. Sounds are the direct objects of auditory perception. But sounds are not ordinary objects like the staplers and bottles we see. Sounds strike us as byproducts or effects of such ordinary things and their transactions.

What are sounds? Among the objects of audition, sounds traditionally have been counted with colors, smells, and tastes as secondary, sensible, or sensory qualities (see, e.g., Locke 1689/1975, Pasnau 1999, 2000). Recently, however, a trend has emerged towards understanding sounds as individuals to which sensible features are attributed. In particular, several philosophers recently have proposed to understand sounds as event-like individuals (Casati and Dokic 1994, 2005, Scruton 1997, O'Callaghan 2007, Matthen forthcoming).

Four questions about audition's objects define the debate and constrain theories of sound that aim for phenomenological adequacy (see also the entry on sounds, for extensive discussion).

2.1.1 Private or Public?

Are sounds private or public? Maclachlan argues that the best candidates for the sounds we hear are sensations (rather than, for instance, the pressure waves that cause auditory experiences). Such sensations are internal and private, and we experience them directly, or without apparent mediation. On Maclachlan's account, we hear the ordinary things and happenings that are the sources of sounds only indirectly, by means of inference from auditory data.

Maclachlan's story is noteworthy partly because he uses hearing and sounds to motivate a general claim about perception. He claims that what seems perfectly intuitive and obvious in the case of sounds and hearing—that something other than material objects are the direct objects of hearing; that the direct objects of audition are internal; and that we indirectly hear things in the world by hearing their sounds—helps us to discover what is true of all perception. Seeing, for instance, involves direct awareness of sensations of patterns of light, while surfaces and ordinary objects figure only indirectly and thanks to inference among the intentional objects of perception. The model of sounds and audition reveals that perceiving involves awareness of sensations in the first instance, and of the external world only indirectly.

There is, in fact, much to like about Maclachlan's description of sounds and auditory experience. First, sounds do figure among the things we hear. Moreover, sounds are among the direct or immediate objects of audition in the relatively innocuous sense that hearing a sound does not seem to require hearing as of something else. Hearing a collision, on the other hand, does seem to require awareness as of a sound. Furthermore, sounds are unlike the ordinary material objects we see. You cannot reach out and grab a sound, or determine its temperature. Sounds result from activities or interactions of material bodies and thus are experienced as distinct or independent from them (cf. Nudds 2001). Nevertheless, audition does afford some variety of awareness of the sources of sounds, or at least provides information about them.

The claim that sounds are sensations will be unattractive to many. Good reasons suggest that even if sounds are not identical with ordinary objects and events, such as clothespins and collisions, they nonetheless are public rather than private. Suppose I am near the stage in a hall listening to some music, and that I have a headache. While I am confused if I ask whether you feel my headache, I just assume that you hear the sounds. Suppose I decide to move to the back of the hall, and the headache then gets better. My experiences of the headache and of the sounds of the music differ once I am at the back of the room. The headache changes, but I need not think the musicians make different sounds. If I stop experiencing the headache, it is gone, but the sounds can continue once I leave the room. Moreover, while the notion of an unfelt headaches is puzzling, it makes sense to say that a tree makes a sound when it falls in the woods without being heard. Finally, while there are no illusory headaches, tinnitus, or ringing of the ears, is an illusory or hallucinatory experience as of a sound.

This suggests that audition does not provide special reasons to believe that the objects of perception are private sensations. Sounds, construed as objects of auditory perception, plausibly are held to inhabit the public world. (See section 3.1 Spatial Hearing for further discussion.)

2.1.2 Proximal or Distal?

Are sounds proximal or distal? The customary science-based view holds that sounds are pressure waves that travel through a medium. On this account, sounds are caused by objects and events such as collisions, and sounds cause auditory experiences. However, sounds are not auditorily experienced to travel through the surrounding medium, as do waves. Some argue, for instance, that, phenomenologically, audition presents sounds as being located in some direction at a distance. On such an account, sounds commonly appear auditorily to be in the neighborhood of their sources and thereby furnish useful information about the locations of those sources. The sound of the drumming across the street seems to come from across the street but does not seem audibly to travel. When sounds do appear to fill a room, sound seems located all around. Sounds that seem to “bounce” around a room appear intermittently at different locations rather than as traveling continuously from place to place. Experiencing a missile-like sound speeding towards your ears illustrates the contrast with ordinary hearing (O'Callaghan 2007, 35). Sounds, it is argued, ordinarily appear to have distal locations and to remain stationary relative to their sources.

If sounds are not usually experienced to travel, then unless auditory experience is illusory with respect to the apparent locations of sounds, sounds themselves do not travel. Sounds thus are not identical with and do not supervene upon the waves, since waves travel (Pasnau 1999). A number of philosophers have argued on these and related grounds that sounds are located distally, near their sources (Pasnau 1999, Casati and Dokic 2005, O'Callaghan 2007, Matthen forthcoming). On this view, pressure waves bear information about sounds and are the proximal causes of auditory experiences, but are not identical with sounds. One might object by resisting the phenomenological claim that we experience sounds as distally located, for instance by suggesting that audition is aspatial, or that audition is spatial but sound sources rather than sounds are auditorily localized (see section 3.1 Spatial Hearing for further discussion). Or, one might accept some measure of illusion. Another possibility is that we experience only a small subset of the locations sounds occupy during their lifetimes (for instance, while at their sources), and simply fail to experience where they are at other times. This avoids ascribing illusion.

2.1.3 Properties or Individuals?

Are sounds properties or individuals? Among both proximal and distal theories, disagreement exists concerning the ontological category to which sounds belong. Philosophers traditionally have understood sounds as properties—either as sensible or secondary qualities, or as the categorical or physical properties that ground powers to affect subjects. Commonly, sounds are attributed to the medium that intervenes between sources and perceivers. More recently, however, some distal theorists have argued that sounds are properties of what we ordinarily understand as sound sources—bells and whistles have or possess rather than make or produce sounds. Pasnau (1999), for instance, claims that sounds are transient properties that are identical with or supervene upon vibrations of objects. Kulvicki (2008) argues against transience in an attempt to subsume sounds to the model of colors, and claims that sounds are persistent, stable dispositional properties of objects to vibrate in response to being “thwacked”. He distinguishes “having” a stable sound from “making” a sound on some occasion (manifesting the stable disposition). This account implies that sounds sometimes make sounds they do not have, and that they have sounds when silent.

One might ask property theorists whether events such as collisions and strummings, rather than objects, bear sounds. A more serious challenge comes from those who argue that sounds are individuals rather than properties. Several arguments support this understanding. First, empirical work on auditory scene analysis suggests that one primary task of audition is to carve up the acoustic scene into distinct sounds, each of which may possess its own pitch, timbre, and loudness (Bregman 1990). Multiple distinct sounds with different audible attributes can be heard simultaneously. An analog of Jackson's (1977, see also Clark 2000) many properties problem thus arises for audition since feature awareness alone cannot explain the bundling or grouping of audible attributes into distinct sounds. Such bundling or grouping of audible features suggests that sounds are perceptible individuals to which these features are attributed.

Furthermore, the temporal characteristics of experienced sounds suggest that sounds are not simple qualities. Sounds audibly seem to persist through time and to survive change. A particular sound, such as that of an emergency siren, might begin high-pitched and loud and end low-pitched and soft. This suggests that sounds are individuals that bear different features at different times, rather than sensible qualities (cf. Cohen 2009).

Several responses to these arguments are available. One might argue that sounds are complex properties, such as pitch-timbre-loudness complexes, instantiated at a time. To account for feature binding, one might hold that such complex properties are ascribed to ordinary objects such as bells and whistles. Or, one might hold that they are particularized properties, such as tropes. To accommodate sounds that survive change through time, a property account could hold that sounds are yet more complex properties that have patterns of change built into their identity conditions. Any such view differs a great deal from the familiar secondary or sensible quality view pioneered by Locke. Pitch, timbre, and loudness are better candidates for simple sensible features (see section 3.2 Audible Qualities).

2.1.4 Objects or Events?

If sounds are individuals, are they object-like or event-like individuals? Intuitively, the material objects we see are capable of existing wholly at any given moment, and all that is required to perceptually recognize such individuals is present at a moment. On the other hand, event-like individuals occupy time and need not exist wholly at any given moment. Their individuation and recognition frequently appeal to patterns of features over time. Event-like individuals intuitively comprise temporal parts, while object-like individuals intuitively do not. The issue here is not the truth of endurantism and perdurantism as accounts of the persistence of objects and events. The issue instead concerns a difference in how we perceptually individuate, experience, and recognize individuals.

No contemporary philosopher has yet claimed that sounds are objects in the ordinary sense. Those who argue that sounds are individuals commonly point out that sounds not only persist and survive change (as do ordinary material objects), but also require time to occur or unfold. It is difficult to imagine an instantaneous sound, or one that lacks duration. Sounds are not commonly treated as existing wholly at a given moment during their duration. Indeed, the identities of many common sounds are tied to patterns of change in qualities through time. The sound of an ambulance siren differs from that of a police siren precisely because the two differ in patterns of qualitative change through time. The sound of the spoken word ‘team’ differs from that of ‘meat’ because each instantiates a common set of audible qualities in a different temporal pattern. Proponents level these considerations in support of the view that sounds, among the intentional objects of audition, are event-like individuals.

2.2 Auditory Objects

Though most philosophers construe sounds either as properties or as event-like individuals (see section 2.1 Sounds), much recent discussion among psychologists has concerned auditory objects (see, e.g., Kubovy and Van Valkenburg 2001, Griffiths and Warren 2004). The target of such discussion is not simply audition's intentional objects or proper (specific to audition) objects. The intended analogy is with visual objects. Talk of auditory objects gestures at the visual processes involved in perceiving, attentively tracking, and recognizing ordinary material objects. What justifies talk of object perception in audition?

2.2.1 Object Perception in Audition

First of all, we do not auditorily perceive three-dimensional, bounded material objects as such, though it is plausible to think we visually perceive them. Hearing does not resolve the edges, boundaries, and filled volumes in space that I see, and I do not hear audible items to complete spatially behind occluders as do visible surfaces of objects. If perceiving a three-dimensional object requires awareness of its edges, boundaries, and extension, I do not auditorily perceive such objects.

Nevertheless, striking and illuminating parallels do exist between the perceptual processes and experiences that take place in vision and audition. Such parallels may warrant talk of object perception in a more general sense that is common to both vision and audition (and perhaps touch, though I don't pursue that here).

Perceiving objects requires parsing a perceptual scene into distinct units that one can attend to and distinguish from each other and from a background. In vision, bounded, cohesive collections of surfaces that are extended in space and that persist through time play this role (see, e.g., Spelke 1990, Nakayama et al. 1995, Leslie et al. 1998, Scholl 2001, Matthen 2005). In audition, as in vision, multiple distinct perceptible individuals might exist simultaneously, and each might persist and survive change (see the discussion of auditory scene analysis in section 2.1 Sounds). A critical difference, however, is that while vision's objects are extended in space, and are individuated and recognized primarily in virtue of spatial characteristics, audible individuals are extended in time, and are perceptually individuated and recognized primarily in virtue of pitch and temporal characteristics (see, e.g., Bregman 1990, Kubovy and Van Valkenburg 2001). For instance, audible individuals have temporal edges and boundaries, and boundary elements can belong only to a single audible individual. They also are susceptible to figure-ground effects over time. One can, for instance, shift attention among continuous audible individuals that differ in pitch. Furthermore, they are susceptible to completion effects over time in much the same way that visible objects are perceptually completed in space. Seeing a single visible region to continue behind a barrier is analogous to hearing a sound stream to continue through masking noise, even when its signal is absent (Bregman 1990, 28). Finally, multiple distinct, discrete audible individuals, such as the temporally bounded notes in a tune, can form audible streams that comprise a single perceptible unit. Such streams are subject to figure-ground shifts, and, like collections of surfaces, they can be attentively tracked through changes to their features and to one's perspective. Though such complex audible individuals include sounds, they comprise temporally unified collections of sounds and silence that are analogous to spatially complex visible objects, such as tractors.

Such audible individuals are temporally extended and bounded, serve as the locus for auditory attention, prompt completion effects, and are subject to figure-ground distinctions in pitch space. For these reasons, the auditory processes involved in their perception parallel those involved in the visual perception of ordinary three-dimensional objects. The parallels suggest a shared sense in which vision and audition involve a more general form of object perception (see, e.g., Kubovy and Van Valkenburg 2001, Scholl 2001, Griffiths and Warren 2004, O'Callaghan 2008a, Matthen forthcoming).

2.2.2 What is an Auditory Object?

What is the shared sense in which both visible and audible individuals deserve to be called ‘objects’? Kubovy and Van Valkenburg (2001, 2003) define objecthood in terms of figure-ground segregation, which requires perceptual grouping. They propose the theory of indispensable attributes as an account of the necessary conditions on perceptual grouping (see also Kubovy 1981). Indispensable attributes for a modality are those without which perceptual numerosity is impossible. They claim that while space and time are indispensable attributes for vision (and color is not), pitch and time are indispensable attributes for auditory objects. Though they are more skeptical about whether audition parallels vision, Griffiths and Warren (2004) sympathize with a figure-ground characterization but suggest a working notion of an auditory object as “an acoustic experience that produces a two-dimensional image with frequency and time dimensions” (Griffiths and Warren 2004, 891). O'Callaghan (2008a) proposes that both visible and audible objects are mereologically complex individuals in the world, though their mereology differs in noteworthy respects. While vision's objects possess a spatial mereology and are individuated and tracked in terms of spatial features, audition's objects have a temporal mereology and are individuated and tracked in terms of both pitch and temporal characteristics.

Discussion of auditory objects in fact usefully draws attention to two roles that space plays in vision. First, there is the role of space in determining the structure internal to visible objects, which facilitates identifying and recognizing visible objects. Second, space serves as the external structure among visible objects, and is critical in distinguishing objects from each other. In audition, time plays a role similar to space in vision in determining the structure internal to auditory objects. Pitch, on the other hand, serves as an external structural framework, along with space, that helps to distinguish among audible individuals.

Why is it useful to perceive such individuals in audition? One promising account is that they provide useful information about the happenings that produce sounds. Carving the acoustic world into mereologically complex individuals informs us about what is going on in the extra-acoustic environment. It provides ecologically significant information about what the furniture is doing, rather than just how it is arranged. It is one thing to perceive a tree; it is another to hear that it is falling behind you.

Discussion of auditory objects and accounts of their nature and perception is new among philosophers (see, e.g., O'Callaghan 2008a, and essays in Bullot and Egré forthcoming, including Matthen forthcoming, Nudds forthcoming). This area is ripe for philosophical contributions.

2.3 Sound Sources

Sounds are among the intentional objects of audition. Plausibly, so are complex, temporally extended individuals composed of sounds. Do we hear anything else? Reflection suggests we hear things beyond sounds and complexes of sounds. In hearing sounds, one may seem to experience the backfiring of the car or the banging of the drum. One might hold that a primary part of audition's function is to reveal things and happenings that make sounds.

2.3.1 Do Humans Hear Sound Sources?

If sounds were internal sensations or sense-data, then, as Maclachlan (1989) observes, we would hear sound sources only indirectly, in an epistemological sense, perhaps thanks to something akin to inference. Acquiring beliefs about the environment would require mediation by propositions connecting experienced internal sounds with environmental causes.

If, however, sounds are properties attributed either to ordinary objects, as Pasnau (1999) and Kulvicki (2008) hold, or to events, then hearing a tuba or the playing of a tuba might just require hearing its sounds. Perceptually ascribing such audible attributes to their sources might ground epistemically unmediated awareness of tubas or their playings.

One might, however, hold that the individuals to which audible attributes are, in the first instance, perceptually attributed are not identical with ordinary objects or events, since sounds are experienced as distinct from ordinary or extra-acoustic individuals (Casati and Dokic 2005, O'Callaghan 2007, Matthen forthcoming). But one cannot hear an ordinary object without hearing a sound, and sounds can mislead about their sources. It might sound like drumming but be hammering. Given this, forming beliefs about ordinary things and happenings connected with sounds might seem to require inference, association, or some otherwise cognitive process, and so their representation might appear always to involve more than perceptual awareness. On this account, representing environmental things and happenings thanks to audition is epistemically mediated by awareness as of sounds and auditory objects, but does not itself constitute auditory perceptual awareness as of those things and happenings. You are inclined to think you hear the source because your representing it co-occurs with, but is no more than a downstream consequence triggered by, your auditory experience.

This account is not entirely satisfactory. First, the phenomenology of audition suggests stronger than indirect or epistemically mediated awareness of things like collisions or guitar strummings or lions roaring. Reflection suggests auditory awareness as of collisions, strummings, and lions. The capacity to refer demonstratively to such things and events also suggests genuine perceptual awareness. Second, we commonly perceptually individuate sounds in terms of their apparent sources (and our taxonomy reflects this). “What did you hear?” “I heard paper ripping,” or, “The sound of a dripping faucet.” We distinguish two quite similar rattles once we hear one as of a muffler clamp and the other as of a loose fender. Furthermore, characterizing certain audible features and explaining perceptual constancy effects for them requires appeal to sound sources. Handel says of timbre: “At this point, no known acoustic invariants can be said to underlie timbre... The cues that determine timbre quality are interdependent because all are determined by the method of sound production and the physical construction of the instrument” (Handel 1995, 441). Explaining loudness constancy—why moving to the back of the room does not change how loudly the lecturer seems to speak—appeals to facts about the sources of sounds (Zahorik and Wightman 2001). Auditory processing proceeds under natural constraints concerning characteristics of sound sources, and information concerning sources shapes how auditory experiences are organized. This is to say that processes responsible for auditory experience proceed as if acoustic information is information about sound sources. Finally, audition-guided action supports the claim that we hear such things and events. Turning to look toward the source of a sound or ducking out of the way of something we hear to be approaching—behaviors guided by auditory experience—would make little sense if we heard only sounds. These reasons make a prima facie case for seeking an alternative to the standard account on which auditory perceptual experience strictly ends with sounds and auditory objects. Some might contend that awareness as of a source, though dependent upon awareness as of a sound, is constitutive of one's auditory perceptual experience.

The main barrier to an alternative is that the relation between sounds and ordinary things or happenings is commonly understood as causal. Awareness as of an effect does not itself furnish epistemically unmediated awareness of its cause. Seeing smoke isn't seeing fire. The right sort of dependence between characteristics of the experience and the cause is not apparent, and awareness as of an effect does not by itself ground perceptual demonstratives concerning the cause. The metaphysical indirectness of the causal relation appears to block epistemic directness.

2.3.2 The Mereology of Sounds and Sources

Is there another explanatory route? Suppose that instead of a causal relation, we understand the relationship between sounds and sources mereologically, or as one of part to whole. Parthood frequently does ground perceptual awareness. For instance, seeing distinct parts of a surface interrupted by an occluder leads to perceptual experience as of a single surface (imagine seeing a dog behind a picket fence). Seeing the facing surfaces of a cube affords awareness as of a cube, and we can attentively track that same cube as it rotates and reveals different surfaces. Suppose, then, that a sound is an event-like individual (recall, property accounts escape the worry). This event is part of a more encompassing event, such as a collision or the playing of a trumpet, that occurs in the environment and that includes the sound. So, the horse race includes the sounds, and you might auditorily perceive the racing by hearing some of its proper parts: the sounds. More probably, you hear the galloping by hearing the sounds it includes. You fail to hear certain parts of the racing event, such as the jockey's glance back after crossing the wire, but you also fail to see parts of the race, such as the misstep of the horse in second place. If the sounds are akin to the audible “profile” of the event, analogous to the visible surfaces of objects and visible parts of events, you might then enjoy auditory awareness as of the galloping of the horses in virtue of your awareness as of the sounds of the hooves. The sound is not identical with the galloping, and it is not just a property or a causal byproduct of the galloping. It is a part of a particular event of galloping. The metaphysical relation of part to whole, in contrast to that between effect and cause, might ground the sort of epistemically unmediated awareness of interest (cf. Nakayama et al. 1995, Bermúdez 2000, Noë 2004). Auditory perceptual awareness as of the whole occurs in virtue of experiencing the part.

One objection is that this cannot account for hearing ordinary objects by hearing their sounds. You cannot strictly hear a tuba by hearing its sound because a tuba is not an event of which a sound is a part. However, the sound is part of the event of playing the tuba, and the tuba is a participant in that playing. So, though you are not aware as of a tuba, you are aware as of an event that involves a tuba. That perhaps is enough to explain talk of hearing tubas and to assuage the worry.

Another more serious objection contends that the events we seem to hear are ones that do not constitutively involve sounds or that might have taken place without sounds. For instance, we hear the collision, but the collision is something that could have occurred in a vacuum and not made a sound. If so, the collision and the sound differ and the collision does not strictly include a sound. The collision therefore must have made the sound as a causal byproduct. The mereological view suggests that, strictly speaking, you could not hear that very collision event (since it causes the sound). The best response is to bite the bullet and accept that events that do occur or that could occur in vacuums cannot be heard since they include no sounds. This is not so bad, since you could hear a different, more encompassing event that includes a sound (along with a collision). Alternatively, one might say the very same event that occurs in a vacuum also could occur in air, but that it would have involved a sound had it occurred in air. In that case, one can only hear such events when they occur in air and include a sound. The choice depends upon one's metaphysics of events. In either case it seems reasonable that token events that do not include sounds are inaudible.

What hinges on the debate about hearing sources? The first upshot is epistemological and concerns the nature of the justification for empirical beliefs grounded in perceptual experience. The evidential status of beliefs about what one perceptually experiences differs from that of beliefs about what is causally responsible for what one perceptually experiences. The second upshot concerns the relation between audition and certain actions. If we hear only sounds and auditory objects, what appears to be effortless, auditorily guided action to avoid or orient toward sound sources requires another explanation because sounds are invisible and usually do no harm. Finally, it impacts how we understand the adaptive significance of audition. Did audition evolve so as to furnish awareness of sounds alone, while leaving their environmental significance to extra-perceptual cognition, or did it evolve so as to furnish perceptual responsiveness to the sources of sounds?

3. The Contents of Auditory Perception

Another way to address the question, “What do we hear?” concerns the contents of auditory perception. Two topics are noteworthy in the context of related debates about vision and its contents. The first concerns the whether audition has spatial content. The second concerns the perception of audible qualities.

3.1 Spatial Hearing

One topic where the contrast between vision and audition has been thought to be particularly philosophically significant concerns space. Vision is a robustly spatial perceptual modality. Vision furnishes awareness of space and spatial features. Some claim vision has an inherently spatial structure, or, further, that vision's spatial structure is a necessary condition on being visually aware of things as independent from oneself.

Hearing also provides information about space—humans learn about space on the basis of hearing. If audition represents space or spatial features, a natural account of such learning follows. We might form beliefs about spatial features of environments on the basis of auditory perceptual experiences simply by accepting or endorsing what seems evident in those experiences. But learning about spatial features on the basis of audition and audition's bearing information about space both are consistent with entirely aspatial auditory phenomenology. For instance, volume might bear information about distance, and differences in volume at the two ears might bear information about direction. In that case, audition bears information about space, and learning about space on the basis of audition is possible.

3.1.1 Skepticism about Spatial Audition

Notably, a tradition of skepticism about audition's spatiality exists in philosophy. Certainly, our capacity to glean information about space is less acute in audition than in vision. Vision reveals fine-grained spatial details, such as patterns and textures, that audition cannot convey. But philosophers who are skeptical about spatial audition are not just concerned about a difference in spatial acuity between audition and vision. Malpas says of the expression, ‘the location of sound’:

I do not mean by ‘location’ ‘locality’, but ‘the act of locating’, and by ‘the act of locating’ I do not mean ‘the act of establishing in a place’, but ‘the act of discovering the place of’. Even so ‘location’ is misleading, because it implies that there is such a thing as discovering the place of sounds. Since sounds do not have places there is no such act. (Malpas 1965, 131)

O'Shaughnessy states, “...We absolutely never immediately perceive sounds to be at any place. (Inference from auditory data being another thing)” (O'Shaughnessy 2002, 446). The claim is that, in contrast to the case of vision, the objects of audition are not experienced as having locations. Rather, we determine the places of sounds and sources from acoustic features, such as loudness and interaural differences, that bear information about distance and direction. We do not auditorily experience spatial features.

This debate, and the purported contrast between vision and audition, has consequences for perceptual theorizing. One route to the conclusion that hearing sounds is auditory awareness of sensations involves denying that audition satisfies spatial prerequisites on experiencing sounds as objective or public. For instance, Maclachlan (1989) claims that audition's phenomenology—in particular, its aspatial phenomenology—provides reasons to think sounds are sensations. Comparing sounds with pains, which we readily recognize as sensations, he says, “[A]lthough the sounds we hear are just as much effects produced in us as are the pains produced by pins and mosquitoes, there is no variety in the location of these effects [the sounds]. Because of the lack of contrast, we are not even aware that the sounds we hear are bodily sensations” (Maclachlan 1989, 31, my emphasis). Maclachlan means that, in contrast even to the case of pains, which are felt at different bodily locations, sounds are not experienced to be at differing locations, and so we are not even inclined to recognize that they are bodily sensations. Maclachlan then suggests that we associate sounds with things and happenings outside the body rather than appreciate that they are effects in us. Given the lack of spatial variation among experienced sounds, we projectively associate sounds with distal sources. This explanation assumes that experienced sounds exhibit no spatial variation: sounds seem located at the ears or lack apparent locations. Denying that auditory experiences present sounds at varying locations beyond the ears invites difficulty finding a place for sounds in the world. The aspatial aspect of audition encourages retreat to the view that sounds lack locations outside the mind.

This kind of strategy has companions and precursors. Lycan's suggestion that olfactory experiences are apparent as modifications of one's own consciousness depends heavily on the aspatial phenomenology of olfactory experience (Lycan 2000, 278-82). Each recalls the Kantian claim that objectivity requires space, or that grasping something as independent from oneself requires the experience of space, a version of which is deployed by Strawson (1959, ch 2) in his famous discussion of sounds.

Two lines of response are open. The first appeals to the thriving empirical research program in “spatial hearing” (see, e.g., Blauert 1997). Scientists aim to discover the cues and perceptual mechanisms that ground spatial audition, such as interaural time and level differences, secondary and reverberant signals, and head-related transfer functions. Audition clearly cannot match vision's singular acuity—vision's resolution limit is nearly two orders of magnitude better than audition's (Blauert 1997, 38-9). Nevertheless, this research strongly supports the claim that human subjects auditorily perceive such spatial characteristics as direction and distance.

Second, a number of philosophers have objected on phenomenological grounds. Audition, they argue, involves experiencing or perceptually representing such spatial characteristics as direction and distance (Pasnau 1999, Casati and Dokic 2005, Matthen 2005, O'Callaghan 2007, forthcoming). Introspection and performance support the claim that sounds or sound sources are in many ordinary cases perceptually experienced as located in the environment at a distance in some direction. We hear the sound of the knocking over near the door; we hear footsteps approaching from behind and to the left; hearing sound to “fill” a room is itself a form of spatial hearing. Though hearing is more error prone than vision, we frequently do not need to work out the locations of sounds or sources—we simply hear them.

3.1.2 Strawson and the Purely Auditory Experience

A subtler form of skepticism about spatial audition aims just to block the requirements on objectivity. Strawson (1959) famously argues in Chapter 2 of Individuals that because auditory experience is not intrinsically spatial—spatial concepts have no intrinsic auditory significance—a purely auditory experience would be non-spatial. It thus would not satisfy the requirements on non-solipsistic consciousness. Others have endorsed versions of Strawson's claim. “[T]he truth of a proposition to the effect that there is a sound at such-and-such a position must consist in this: if someone was to go to that position, he would have certain auditory experiences,” states Evans (1980, 274).

The claim that audition is not intrinsically spatial admits at least two readings. First, since Strawson suggests that audition might inherit spatial content from other sense modalities such as vision or touch, it could mean that audition depends for its spatial content upon that of other modalities. If, unlike vision and touch, audition's spatial capacities are parasitic upon those of other modalities, audition is spatial only thanks to its relations to other intrinsically spatial modalities. Second, it might be understood as a claim about the objects of audition. Strawson indicates that sounds themselves are not intrinsically spatial. He says that although sounds have pitch, timbre, and loudness, they lack “intrinsic spatial characteristics” (1959, 65). Since these interpretations are not clearly distinguished by Strawson, it is helpful to consider his master argument.

Strawson claims that a purely auditory experience would be non-spatial. By “purely auditory experience” Strawson means an exclusively auditory experience, or an auditory experience in absence of experience associated with any other modality. Given the mechanisms of spatial hearing, however, it is empirically implausible that a normal acoustic environment with rich spatial cues would fail to produce even a minimally spatial purely auditory experience. Even listening only to stereo headphones could produce a directional auditory experience. If any modality in isolation ever could ground spatial experience, audition could. On the other hand, it does seem possible that there could be a non-spatial but impoverished exclusively auditory experience if no binaural or other spatial cues were present. But similarly impoverished, non-spatial experiences seem possible for other modalities. Consider visually experiencing a uniform gray ganzfeld, or floating weightlessly in a uniformly warm bath. Neither provides the materials for spatial concepts, so neither differs from audition in this respect. One might contend that we therefore lack a good reason to think that, in contrast to a purely visual or tactile experience, a purely auditory experience would be an entirely non-spatial experience.

3.1.3 Does Audition Have Spatial Structure?

Nudds (2001) suggests another way to understand the claim, and interprets Strawson as making an observation about the internal structure of audition:

When we see (or seem to see) something, we see it as occupying or as located within a region of space; when we hear (or appear to hear) a sound we simply hear the sound, and we don't experience it as standing in any relation to the space it may in fact occupy. (Nudds 2001, 213-14)

Audition, unlike vision, lacks a spatial structure or field, claims Nudds. A purely auditory experience thus would not comprise a spatial field into which individuals independent from oneself might figure. Following an example from Martin (1992), Nudds argues that while vision involves awareness of unoccupied locations, audition does not involve awareness of regions of space as empty or unoccupied. Martin's example is seeing the space in the center of a ring as empty. In audition, Nudds claims, one never experiences a space as empty or unoccupied.

In response, one might simply deny a difference between vision and audition on this count. If one can attend to a location near the center of the visible ring as empty, one can attend to the location between the sounding alarm clock and the slamming door as a place where there is no audible sound—as acoustically empty space. Of course, auditory space generally is less replete than visual space, but this is contingent. Consider seeing just a few stars flickering on and off against a dark sky. Such an experience lacks spatial structure if audition does.

3.1.4 How Spatial Audition Differs from Spatial Vision

What about the second way mentioned above to understand Strawson's claim? Though audition's status as intrinsically spatial may not differ from that of vision or touch, perhaps audition's objects—sounds—are not intrinsically spatial. Without further argument, or a commitment to a theory of sounds, it is difficult to state confidently the intrinsic features of sounds and thus whether they include spatial features. (If, for instance, wavelength is among a sound's intrinsic features, sounds are intrinsically spatial.) Nonetheless, the claim might be that sounds, as they are perceptually experienced, lack intrinsic or non-relational spatial features. Roughly, independent from spatial relations to other sounds, experienced sounds seem to lack internal spatial structure. That is why you cannot auditorily experience the empty space at the center of a sound or hear its edges. Interpreted as such—that sounds are not experienced or perceptually represented to have intrinsic or inherent spatial features—the claim is plausible (though consider diffuse or spread out sounds in contrast to focused or pinpoint sounds). It certainly marks an important difference from vision, whose objects frequently not only seem to have rich internal spatial structure, but also are individuated in terms of inherent spatial features. This difference, however, does not ground an argument that any purely auditory experience is non-spatial or that sounds fail to satisfy the requirement on objectivity, since sounds' being experienced to have internal or intrinsic spatial characteristics is necessary neither for spatial auditory experience nor to experience sounds as objective. Since sounds phenomenologically seem to be located in space and to bear extrinsic spatial relations to each other, auditory experience satisfies the requirements for objectivity.

So, vision and audition differ with respect to space in two ways. First, vision's spatial acuity surpasses that of audition. Second, vision's objects are perceptually experienced to have rich internal spatial structure, and audition's are not. However, given the spatial characteristics evident in audition, such as direction and distance, the spatial status of audition presents no barrier to understanding its objects as perceiver-independent. The spatial aspects of auditory phenomenology thus may fail to ground an argument to the conclusion that sounds are modifications of one's consciousness. If that is the case, then audition provides no special intuitive support for accounts on which sensations are the direct objects of perception.

3.2 Audible Qualities

3.2.1 Sounds and Colors

According to theories on which sounds are individuals, sounds are not secondary or sensible qualities. But, humans hear audible qualities, such as pitch, loudness, and timbre, that are analogous to colors, tastes, and smells. Thus, familiar accounts of colors and other sensible attributes or secondary qualities might apply to the audible qualities. For instance, pitches might be either dispositions to cause certain kinds of experiences in suitable subjects, the physical or categorical bases of such dispositions, sensations or projected features of auditory experiences, or simple primitive properties of sounds.

Tradition suggests that the form of a philosophical account of visible qualities, such as color, and their perception applies to other sensible qualities, such as pitch, flavor, and smell, and their perception. Thus, according to tradition, if dispositionalism, physicalism, projectivism, or primitivism about sensible qualities is true for features associated with one modality, it is true for features associated with others. Despite tradition, we should be wary to accept that a theory of sensible qualities translates plausibly across the senses.

Debates about sensible qualities and their perception begin with concerns about whether sensible features can be identified with or reduced to any objective physical features. What follows has two aims. The first is to give a sense of how such debates might go in the case of audible qualities. The focus is on pitch, since pitch is often compared to color, and the case of color is well known. The second is to point out the most salient differences and similarities between the cases of color and pitch that impact the plausibility of arguments translated from one case to the other. In particular, I consider two noteworthy arguments that are founded on aspects of color perception. Each aims to establish that the colors we perceive cannot be identified with objective physical features. Neither transposes neatly to the case of pitch. We should not assume that arguments effective in the case of color have equal force applied to other sensible qualities. Color perhaps is a uniquely difficult case. However, two aspects of pitch experience that are familiar from the case of color experience do raise similar difficulties for an objective physical account of pitch.

3.2.2 Pitch, Timbre, and Loudness

What are pitch, timbre, and loudness? Pitch is a dimension along which tones can be ordered according to apparent height. The pitch of fingernails scratching a blackboard generally is higher than that of thumping a washtub. Loudness can be glossed as the volume, intensity, or quantity of sound. A jet plane makes louder sounds than a model plane. Timbre is more difficult to describe. Timbre is a quality in which sounds that share pitch and loudness might differ. So, a violin, a cello, and a piano all playing the same note differ in timbre. Sometimes timbre is called “tone color”.

Physics and psychoacoustics show that properties including frequency, amplitude, and wave shape determine the audible qualities sounds (auditorily) appear to have. To simplify, take the case of pitch, since pitch often is compared to color. Not all sounds appear to have pitch. Some sounds appear to have pitch thanks to a simple, sinusoidal pattern of vibration at some frequency in an object or in the air. Some sounds appear pitched thanks to a complex pattern of vibration that can be decomposed into sinusoidal constituents at multiple frequencies, since any pattern of vibration can be analyzed as some combination of simple sinusoids. Sounds appear pitched, however, just when they have sinusoidal constituents, or partials, that all are integer multiples of a common fundamental frequency. Sounds with pitch thus correspond to regular or periodic patterns of vibration that differ in fundamental frequency and complexity. Simple sinusoids and complex waveforms yield a match in pitch, though a difference in timbre, when they share fundamental frequency, even when the complex tone lacks a sinusoidal constituent at the fundamental frequency (the phenomenon of the missing fundamental).

3.2.3 Is Pitch Physical?

A straightforward account identifies pitch with periodicity (perhaps within some range). Having pitch is being periodic. Periodicity can be expressed in terms of fundamental frequency, so individual pitches are fundamental frequencies. This has advantages as an account of pitch. It captures the linear ordering of pitches. It also explains the musical intervals, such as the octave, fifth, and fourth, for example, which are pitch relations that hold among periodic tones. Musical intervals correspond to whole-number ratios between fundamental frequencies. Sounds that differ by an octave have fundamental frequencies that stand in 1:2 ratios. Fifths involve a 2:3 relationship, fourths are 3:4, and so on. This also allows us to revise of the linear pitch ordering to accommodate the auditory sense in which tones that differ by an octave nonetheless are the same pitch. If the pitch ordering is represented as a helix, upon which successive octave-related tones fall at a common angular position, each full rotation represents doubling frequency.

Is the periodicity theory of pitch plausible as an account of the audible features we perceive when hearing sounds? If so, then objective physicalism about at least some sensible qualities might succeed.

3.2.4 Disanalogies with Color

The periodicity theory of pitch fares better on two counts than theories that identify colors with objective physical properties.

First, consider the phenomenological distinction between unique and binary hues. Some colors appear to incorporate other colors, and some do not. Purple, for instance, appears both reddish and bluish; red just looks red. Some philosophers contend that the leading physical theories of color cannot explain the unique-binary distinction without essentially invoking the color experiences of subjects. How, for instance, do reflectance classes identified with unique hues differ from those associated with binary hues? Consider an analogous issue for pitch. Some tones with pitch sound simple, while other pitched tones, such as sounds of musical instruments, auditorily appear to be complex and to have discernible components. However, the difference between audibly simple and audibly complex pitched tones is captured by the simplicity or complexity of a sound's partials. Simple tones are sinusoids, and complex tones have multiple overtones. One response is to hold that the unique-binary color distinction and the simple-complex pitch distinction are disanalogous. Unlike the case of color, one might contend, no pitch that is essentially a mixture of other pitches occupies a unique place in pitch space.

Second, consider metamerism. Some surfaces with very different reflectance characteristics match in color. Metameric pairs share no obvious objective physical property. Some philosophers argue that unless color experience fails to distinguish distinct colors, metamers preclude identifying colors with natural physical properties of surfaces (see the entry on color). Now consider the case of pitch. Are there pitch metamers? Some sounds with very different spectral frequency profiles match in pitch. A simple sinusoidal tone at a given frequency matches the pitch of each complex tone with that fundamental frequency (even those that lack a constituent at the fundamental). But, again, the case of pitch differs from the case of color. For each matching pitch, a single natural property does unify the class. The tones all share a fundamental frequency.

3.2.5 Analogies with Color

Two kinds of argument are equally pressing in the case of pitch.

First, arguments from intersubjective variation transpose. Variations in frequency sensitivity exist among perceivers; for instance, subjects differ in which frequency they identify as middle C. If there is no principled way to legislate whose experience is veridical, pitch might be subjective or perceiver-relative. One response is that, in contrast to the case of unique red, there is an objective standard for middle C: fundamental frequency. But, whose pitch experience has the normative significance to settle the frequency of middle C?

Some might wonder whether there is a pitch analog of the trouble posed by the kind of variation associated with spectrum inversion in the case of color (see the entry on inverted qualia). Spectral shift in pitch, sometimes dramatic, commonly occurs after cochlear implant surgery. This is not spectral inversion for pitch; but, a dramatic shift makes most of the same trouble as inversion. Not quite all the trouble, since cochlear implants preserve the pitch ordering and its direction. But, there could be a cochlear implant that switched the placement of electrodes sensitive to 100 hertz and 1000 hertz, respectively. Arguably, there could be one that reversed the entire electrode ordering. This goes some distance to grounding the conceivability of a pitch inversion that reverses the height ordering of tones.

Second, consider an argument that frequencies cannot capture the relational structure among the pitches. This is loosely analogous to the argument that physicalism about color fails to capture the relational structure of the hues—for instance, that red is more similar to orange than either is to green. In the case of pitch, psychoacoustics experiments show that pitch does not map straightforwardly onto frequency. Though each unique pitch corresponds to a unique frequency (or small frequency range), the relations among pitches do not match those among frequencies. In particular, equivalent pitch intervals do not correspond to equal frequency intervals. For example, the effect upon perceived pitch of a 100 hertz change in frequency varies dramatically across the frequency range. It is dramatic at low frequency and barely detectable at high frequency. Similarly, doubling frequency does not make for equivalent pitch intervals. A 1000 hertz tone must be tripled in frequency to produce the same increase in pitch as that produced by quadrupling the frequency of a 2000 hertz tone. Apparent pitch is a complex function of frequency; it is neither linear nor logarithmic (see, e.g., Hartmann 1997, ch 12, Gelfand 2004, ch 12, Zwicker and Fastl 2006, ch 5). Pitch scales that capture the psychoacoustic data assign equal magnitudes, commonly measured in units called mels, to equal pitch intervals. The mel scale of pitch thus is an extensive or numerical pitch scale, in contrast to the intensive frequency scale for pitch. The former, but not the latter, preserves ratios among pitches.

S. S. Stevens famously argued on the basis of results drawn from psychoacoustic experiments that pitch is not frequency (see, e.g., Stevens et al. 1937, Stevens and Volkmann 1940). In light of similar results, contemporary psychoacoustics researchers commonly reject the identification of pitch with frequency or periodicity. The received scientific view thus holds that pitch is a subjective or psychological quality that is no more than correlated with objective frequency (see, e.g., Gelfand 2004, Houtsma 1995). Pitch, on this understanding, belongs only to experiences. The received view of pitch therefore implies an error theory according to which pitch experience involves a widespread projective illusion.

What is the argument against the periodicity theory of pitch? Compare an argument against reflectance physicalism about color. Reflectance physicalism identifies each hue with a class of reflectances. Periodicity physicalism identifies each pitch with a fundamental frequency. In both cases, each determinate sensible feature is identified with a determinate physical property. In the color case, it is objected that reflectance classes do not bear the relations to each other that the colors bear. In the pitch case, the frequencies do not bear the relations to each other that the pitches bear. Thus, if the relational features among a class of sensible qualities are essential to them, an account that does not accurately capture those relations fails. Frequencies, according to this line of argument, do not stand in the relations essential to pitch.

This, of course, is a quite general phenomenon among sensible qualities. Brightness and loudness vary logarithmically with simple physical quantities. Even if we identified candidate molecules for smells, nothing suggests physical similarities would mirror their olfactory similarities.

One might respond, in the case of pitch and other sensible features that can be put in a linear ordering, that the relational order is essential, while the magnitudes are not. In that case, if pitch is frequency, pitch experience has the right structure, but distorts magnitudes of difference in pitch. This retains the periodicity theory and explains away the results in terms of pitch experiences.

Suppose, however, we accept that the mel scale is well-founded and that it accurately captures essential relationships among pitches. This does not by itself imply a projective or subjective theory of pitch. Pitches might be dispositions to produce certain kinds of experiences, or they might be simple or primitive properties. It also is open to seek a more adequate physical candidate for pitch. For instance, pitches might be far more complex physical properties than frequencies. Such physical properties may be of no interest in developing the simplest, most complete natural physical theory, but they may be anthropocentrically interesting.

It is an important question whether a physical theory of sensible features should just provide a physical candidate for each determinate sensible feature, or whether the physical relationships among those physical candidates should capture the structural relations among sensible qualities (and, if so, which structural relations it should capture). This is an example of how considering in detail the nature and the experience of sensible qualities other than color promises insights into traditional debates concerning the sensible qualities.

4. Varieties of Auditory Perception

4.1 Musical Listening

Musical listening is a topic that bears on questions about the relationship between hearing sounds and hearing sources. While the philosophy of music has its own vast literature (see the entry on the philosophy of music), musical experience has not been explored extensively in connection with general philosophical questions about auditory perception. This section discusses links that should advance philosophical work on auditory perception.

4.1.1 Acousmatic Experience

An account of listening to pure or non-vocal music should capture the aesthetic significance of musical listening. Appreciating music is appreciating sounds and sequences, arrangements, or structures of sounds. Thus, the temporal aspects of auditory experiences are critical to appreciatively listening to music. One might think, moreover, that sounds are all that matters in music. In particular, some have argued that appreciatively listening to music demands listening in a way that abstracts from the environmental significance, and thus from the specific sources, of sounds that comprise it (Scruton 1997, 2-3). Such acousmatic listening involves experiencing sounds in a way that is “detached from the circumstances of their production,” rather than “as having a certain worldly cause” (Hamilton 2007, 58). Listening to music and being receptive to its aesthetically relevant features requires not listening to violins, horns, or brushes on snare drums. It requires hearing sounds and grasping them in a way removed from their common sources. (Hearing a high fidelity recording thus furnishes an aesthetically identical musical experience despite having a speaker cone rather than a violin as source.) “The acousmatic experience of sound is precisely what is exploited by the art of music” (Scruton 1997, 3).[1]

Musical listening thus provides a prima facie argument against the claim that in hearing sounds one hears sound sources such as the strumming of guitars and bowing of violins. If such “interested” audition were the rule, musical listening would be impossible.

4.1.2 Acousmatic Listening as Attention

Acousmatic experience, however, may be a matter of attention. Nothing prevents focusing one's attention on the sounds and audible qualities without attending to the instruments, acts, and events that are their sources, even if each is auditorily available. That musical listening requires effort and training supports the idea that one can direct attention differently in auditory experience, depending on one's interests. Not getting eaten and safely crossing the street requir attending to sound sources, while listening with aesthetic appreciation to a symphony may require abstracting from the circumstances of its production, such as the finger movements of the oboist. This response holds that musical listening is a matter of auditorily attending in a certain way. It is attending to features of sounds themselves, but does not imply failing to hear sound sources.

This assumes a limited view about which aspects of one's auditory experience are aesthetically significant. These include aspects involved in hearing sounds proper, but exclude, for example, other contents of auditory experience. However, room exists for debate over the aesthetically significant aspects of what you hear (see Hamilton 2007). For example, one might argue that live performances have aesthetic advantages over recordings because one hears the performance of the sounds and songs, rather than their reproduction by speakers (cf. Mag Uidhir 2007). Circumstances of sound production, such as that skillful gestures generate a certain passage, or that a particularly rare wood accounts for a violin's sounds, might be aesthetically relevant in a way that outstrips the sounds. Furthermore, hearing the spatial characteristics of a performance may hold aesthetic significance beyond the tones and structures admitted by traditional accounts of musical listening. Composers may even intend “spatial gestures” among aspects essential for the appreciation of a piece (see, e.g., Solomon 2007). To imagine auditorily experiencing the spatial characteristics of music in a way entirely divorced from the environmental significance of the sounds is difficult. Appreciating the relationship between experiences of sounds and of sources makes room for a view of the aesthetic value of musical listening that is more liberal than acousmatic experience allows.

4.2 Speech Perception

4.2.1 Is Speech Special?

Speech perception presents uniquely difficult twists, and few philosophers have confronted it directly (Appelbaum 1999, Trout 2001a, Matthen 2005, ch 9, and Remez and Trout 2009 are noteworthy recent exceptions). Something striking and qualitatively distinctive—perhaps uniquely human—seems to set the perception of speech apart from ordinary hearing. The main philosophical issues about speech perception concern versions of the question, Is speech special?

It is natural to think that listening to speech and listening to music are similar. In each case, one's interest in sounds seems divorced from the specific environmental happenings involved in their production. But hearing speech differs from hearing music. Notably, speech is a vehicle for meaning. Ultimately, the information conveyed is what matters. In music, the interest is in sounds as such. In speech, the interest is in meanings.

In one sense, this also makes perceiving speech different from hearing ordinary non-linguistic sounds. Environmental sounds do not usually have conventional linguistic meanings. But, according to the most common philosophical understanding, there is another sense in which perceiving speech is a lot like hearing non-linguistic sounds. Listening to speech in a language you know may involve grasping meanings, but grasping meanings requires first hearing the sounds of speech. What you perceive in perceiving speech is individuated in part in terms of morphological characteristics evident in audition. While grasping the meanings of speech sounds depends upon perceiving complex sound structures, according to the commonplace understanding, perceiving speech involves hearing sounds of a common ontological kind with the ones you hear when you are not hearing speech.

The commonplace view—that perceiving speech is a variety of ordinary auditory perception—has been challenged in a number of ways. One way to see how the challenges differ is to consider the ways in which they suggests that speech perception differs from hearing non-linguistic sounds. In what sense, then, is perceiving speech distinctive? The question admits at least three readings.

4.2.2 The Objects of Speech Perception

First, we might ask about the objects of speech perception. What are the objects of speech perception, and do they differ from those of ordinary auditory perception? According to the commonplace understanding, hearing speech involves hearing sounds. Thus, hearing spoken language shares perceptual objects with ordinary audition. Alternatively, one might hold that the objects of speech perception are not ordinary sounds at all. Perhaps they are language-specific types of sounds, such as phonemes or words. Perhaps, instead, they belong to an entirely different kind from ordinary sounds. For example, some have argued that perceiving speech involves perceiving articulatory gestures or movements of the mouth and vocal organs (see the supplement on Speech Perception: Empirical and Theoretical Considerations).

4.2.3 The Contents of Speech Perception

Second, we might ask about the contents of speech perception. Does the content of speech perception differ from that of ordinary audition? If it does, how does the experience of perceiving speech differ from that of hearing ordinary sounds? Perceiving speech might involve hearing ordinary sounds, but auditorily ascribing distinctive features to them. These features might simply be, or comprise, finer grained qualitative and temporal acoustical details than non-linguistic sounds audibly possess. But perceiving speech also might involve perceiving sounds as belonging to language-specific types, such as phonemes, words, or other syntactic categories. Furthermore, speech perception's contents might differ in a more dramatic way from those of non-linguistic audition. Listening with understanding to speech involves grasping meanings. The commonplace view holds that grasping meanings is an act of the understanding rather than of audition. Thus, the difference between the experience of listening to speech in a language you know and the experience of listening to speech in a language you do not know is entirely cognitive. But one might think that there also is a perceptual difference. So, more contentiously, perceiving speech in a language you know might involve hearing sounds as meaningful, or auditorily representing them as having semantic properties.

4.2.4 Is Speech Perception Auditory?

Third, we might ask either about the processes or about the perceptual modality responsible for speech perception. To what extent does perceiving speech implicate processes that are continuous with those of ordinary or general audition, and to what extent does perceiving speech involve separate, distinctive, or modular processes? While some defend general auditory accounts of speech perception (see, e.g, Holt and Lotto 2008), some argue that perceiving speech involves dedicated perceptual resources, or even an encapsulated perceptual system distinct from ordinary non-linguistic audition (see, e.g., Fodor 1983, Pinker 1994, Liberman 1996, Trout 2001b). They cite, for example: the multimodality of speech perception—visual cues about the movements of the mouth and tongue impact the experience of speech, as demonstrated by the McGurk effect (see section 4.3 Crossmodal Influences); duplex perception—a particular stimulus sometimes contributes simultaneously both to the experience of an ordinary sound and to that of a speech sound (Rand 1974); and the top-down influence of linguistic knowledge upon the experience of speech.[2]

See the supplement on Speech Perception: Empirical and Theoretical Considerations.

4.3 Crossmodal Influences

4.3.1 Crossmodal Illusions

Auditory perception of speech is influenced by cues from vision and touch (see Gick et al. 2008). The McGurk effect in speech perception is an illusory auditory experience produced by a visual stimulus (McGurk and Macdonald 1976). Do such multimodal effects occur in ordinary audition? Visual and tactile cues commonly do shape auditory experience. The ventriloquist illusion is an illusory auditory experience of location that is produced by an apparent visible sound source (see, e.g., Bertelson 1999). Audition even impacts experience in other modalities. The recently discovered sound-induced flash illusion involves a visual illusion as of seeing two consecutive flashes that is produced when a single visible flash is accompanied by two consecutive audible beeps (Shams et al. 2000, 2002). Such crossmodal illusions demonstrate that auditory experience is impacted by other modalities and that audition influences other modalities. In general, experiences associated with one perceptual modality are influenced by stimulation associated with other modalities.

4.3.2 Causal or Constitutive?

An important question is whether the impact is merely causal, or whether perception in one modality is somehow constitutively tied to other modalities. If, for instance, vision merely causally impacts your auditory experience of a given sound, then processes associated with audition might be proprietary and characterizable in terms that do not appeal to other modalities.

Suppose, though, that such illusions are intelligible as the results of adaptive perceptual strategies. In ordinary circumstances, crossmodal processes serve to reduce or resolve apparent conflicts in information drawn from separate senses, and thereby make perception more reliable overall. Thus, crossmodal illusions differ from synaesthesia. Synaesthesia is just a kind of accident. It results from mere quirks of processing, and it always involves illusion (or else is accidentally veridical). Crossmodal recalibrations, in contrast, are best understood as attempts “to maintain a perceptual experience consonant with a unitary event” (Welch and Warren 1980, 638). If so, the principled reconciliation of information drawn from different sensory sources suggests, first, that audition is governed by extra-auditory perceptual constraints. Second, since such constraints concern the common sources of stimulation to multiple senses (since they govern conflict resolution), it also suggests that audition and vision share a common subject matter at some level of description. One might, perhaps on the basis of intermodal feature binding or on the basis of space, argue that the commonality is experientially evident. This could ground a case that auditory and visual experiences share common multimodal or amodal content (see O'Callaghan 2008b, Clark forthcoming). One modality could have a constitutive rather than a merely causal impact upon processes and experiences associated with another.

4.3.3 Multimodality in Perception

What hangs on this? First, it bears on questions about audition's content. If we cannot exhaustively characterize auditory experience in terms that are modality-specific or distinctive to audition, and doing so requires amodal or multimodal contents, then we might hear as of things we can see or experience with other senses. This is related to one puzzling question about hearing sound sources: How could you hear as of something you could see? Rather than just a claim about audition's content that requires further explanation, we now have a story about why things like sound sources figure in the content of auditory experience. Second, all of this may bear on how to delineate what counts as auditory perception, as opposed to visual or even amodal perception. If hearing is systematically impacted by visual processes, and if it shares content and phenomenology with other sense experiences, what are the boundaries of auditory perception? Multimodal perception may bear on the question of whether there are clear and significant distinctions among the sense modalities (cf. Nudds 2003). Finally, multimodal perceptual experiences, illusions, and explanatory strategies may illuminate the phenomenological unity of experiences in different modalities, or the sense in which, for instance, an auditory experience and a visual experience of some happening comprise a single encompassing experience (see the entry on the unity of consciousness).

We can ask questions about the relationships among modalities in different areas of explanatory concern. Worthwhile areas for attention include the objects, contents, and phenomenology of perception, as well as perceptual processes and their architecture. Crossmodal and multimodal considerations might shed doubt on whether vision-based theorizing alone can deliver a complete understanding of perception and its contents. This approach constitutes an important methodological advance in the philosophical study of perception.

5. Conclusion and Future Directions

Considering modalities other than vision enhances our understanding of perception. It is necessary to developing and vetting an adequate comprehensive and general account of perception and its roles. Auditory perception reveals a new rich territory for philosophical exploration in its own right, but it also provides a useful contrast case to evaluate claims about perception proposed in the visual context. One of the most promising directions for future work concerns the nature of the relationships among perceptual modalities and how these relationships might prove essential to understanding perception itself. Recent philosophical work on auditory perception thus encourages an advance beyond considering modalities in isolation from each other.


  • Appelbaum, I., 1996, “The lack of invariance problem and the goal of speech perception,” ICSLP-1996, 3(435): 1541-1544.
  • Appelbaum, I., 1999, “The dogma of isomorphism: A case study from speech perception,” Philosophy of Science, 66 (Supplement. Proceedings of the 1998 Biennial Meetings of the Philosophy of Science Association. Part I: Contributed Papers): S250-S259.
  • Batty, C., 2007, Lessons In Smelling: Essays on Olfactory Perception, Ph.D. thesis, Department of Linguistics and Philosophy, MIT. URL = <>.
  • Bermúdez, J. L., 2000, “Naturalized sense data,” Philosophy and Phenomenological Research, 61(2): 353-374.
  • Bertelson, P., 1999, “Ventriloquism: A case of cross-modal perceptual grouping,” in G. Aschersleben, T. Bachmann, and J. Músseler (eds.), Cognitive Contributions to the Perception of Spatial and Temporal Events, Amsterdam: Elsevier, pp. 347-317.
  • Blauert, J., 1997, Spatial Hearing: The Psychophysics of Human Sound Localization, Cambridge, MA: MIT Press.
  • Bloomfield, L., 1933, Language, New York: Holt.
  • Blumstein, S. E. and K. N. Stevens, 1981, “Phonetic features and acoustic invariance in speech,” Cognition, 10: 25-32.
  • Bosch, L. and N. Sebastián-Gallés, 1997, “Native-language recognition abilities in 4-month-old infants from monolingual and bilingual environments,” Cognition, 65(1): 33-69.
  • Bregman, A. S., 1990, Auditory Scene Analysis: The Perceptual Organization of Sound, Cambridge, MA: MIT Press.
  • Bullot, N. and P. Egré (eds.), forthcoming, Objects and Sound Perception, European Review of Philosophy, 7.
  • Casati, R. and J. Dokic, 1994, La Philosopie du Son, Nîmes: Chambon.
  • Casati, R. and J. Dokic, 2005, “Sounds,” in The Stanford Encyclopedia of Philosophy (Spring 2009 Edition), Edward N. Zalta (ed.), URL = <>.
  • Clark, A., 2000, A Theory of Sentience, New York: Oxford University Press.
  • Clark, A., forthcoming, “Cross-modal cuing and selective attention,” in F. MacPherson (ed.), The Senses. Oxford: Oxford University Press.
  • Cohen, J., 2009, “Sounds and temporality,” Oxford Studies in Metaphysics, 5: forthcoming.
  • Cooper, F. S., P. C. Delattre, A. M. Liberman, J. M. Borst, and L. J. Gerstman, 1952, “Some experiments on the perception of synthetic speech sounds,” Journal of the Acoustical Society of America, 24: 597-606.
  • Diehl, R. L., A. J. Lotto, and L. L. Holt, 2004, “Speech perception,” Annual Review of Psychology, 55: 149-179.
  • Evans, G., 1980, “Things without the mind—a commentary upon Chapter Two of Strawson's Individuals,” in Z. van Straaten (ed.), Philosophical Subjects: Essays Presented to P. F. Strawson, Oxford: Clarendon Press; reprinted in G. Evans, 1985, Collected Papers, Oxford: Clarendon Press.
  • Fodor, J. A., 1983, The Modularity of Mind, Cambridge, MA: MIT Press.
  • Fowler, C. A., 1986, “An event approach to the study of speech perception from a direct-realist perspective,” Journal of Phonetics, 14: 3-28.
  • Gelfand, S. A., 2004, Hearing: An Introduction to Psychological and Physiological Acoustics, 4th edition, New York: Marcel Dekker.
  • Gick, B., K. M. Jóhannsdóttir, D. Gibraiel, and J. Mühlbauer, 2008, “Tactile enhancement of auditory and visual speech perception in untrained perceivers,” Journal of the Acoustical Society of America, 123(4): EL72-76.
  • Griffiths, T. D. and J. D. Warren, 2004, “What is an auditory object?” Nature Reviews Neuroscience, 5: 887-892.
  • Hamilton, A., 2007, Aesthetics and Music. London: Continuum.
  • Handel, S., 1995, “Timbre perception and auditory object identification,” in B. C. Moore (ed.), Hearing, San Diego, CA: Academic Press, pp. 425-461.
  • Hartmann, W. M., 1997, Signals, Sound, and Sensation, New York: Springer.
  • Holt, L. L. and A. J. Lotto, 2008, “Speech perception within an auditory cognitive science framework,” Current Directions in Psychological Science, 17(1): 42-46.
  • Houtsma, A., 1995, “Pitch perception,” in B. C. J. Moore (ed.), Hearing, New York: Academic Press, pp. 267-291.
  • Jackson, F., 1977, Perception: A Representative Theory, Cambridge: Cambridge University Press.
  • Kivy, P., 1991, Music Alone, Ithaca, NY: Cornell University Press.
  • Kubovy, M., 1981, “Concurrent pitch-segregation and the theory of indispensible attributes,” in M. Kubovy and J. R. Pomerantz (eds.), Perceptual Organization, Hillsdale, NJ: Erlbaum, pp. 55-98.
  • Kubovy, M. and D. Van Valkenburg, 2001, “Auditory and visual objects,” Cognition, 80: 97-126.
  • Kuhl, P. K., 2000, “A new view of language acquisition,” Proceedings of the National Academy of Science, 97(22): 11850-11857.
  • Kulvicki, J., 2008, “The nature of noise,” Philosophers' Imprint, 8(11): 1-16.
  • Leslie, A. M., F. Xu, P. D. Tremoulet, and B. J. Scholl, 1998, “Indexing and the object concept: developing ‘what’ and ‘where’ systems,” Trends in Cognitive Sciences, 2(1): 10-18.
  • Liberman, A. M., 1970, “The grammars of speech and language,” Cognitive Psychology, 1(4): 301-323.
  • Liberman, A. M., 1996, Speech: A Special Code, Cambridge, MA: MIT Press.
  • Liberman, A. M., F. S. Cooper, D. P. Shankweiler, and M. Studdert-Kennedy, 1967, “Perception of the speech code,” Psychological Review, 74(6): 431-461.
  • Liberman, A. M. and I. G. Mattingly, 1985, “The motor theory of speech perception revised,” Cognition, 21: 1-36.
  • Liberman, A. M. and I. G. Mattingly, 1989, “A specialization for speech perception,” Science, 243(4890): 489-494.
  • Locke, J., 1689/1975, An Essay Concerning Human Understanding, Oxford: Clarendon Press.
  • Lotto, A. J., K. R. Kluender, and L. L. Holt, 1997, “Animal models of speech perception phenomena,” in K. Singer, R. Eggert, and G. Anderson (eds.), Chicago Linguistic Society, 33, Chicago: Chicago Linguistic Society, pp. 357-367.
  • Lycan, W., 2000, “The slighting of smell,” in N. Bhushan and S. Rosenfeld (eds.), Of Minds and Molecules: New Philosophical Perspectives on Chemistry, Oxford: Oxford University Press, pp. 273-89.
  • Maclachlan, D. L. C., 1989, Philosophy of Perception, Englewood Cliffs, NJ: Prentice Hall.
  • Mag Uidhir, C., 2007, “Recordings as performances,” British Journal of Aesthetics, 47(3): 298-314.
  • Malpas, R. M. P., 1965, “The location of sound,” in R. J. Butler (ed.), Analytical Philosophy, Second Series, Oxford: Basil Blackwell, pp. 131-144.
  • Martin, M. G. F., 1992, “Sight and touch,” in T. Crane (ed.), The Contents of Experience, Cambridge: Cambridge University Press.
  • Martin, M. G. F., 1993, “Sense modalities and spatial properties,” in N. Eilan, R. McCarthy, and B. Brewer (eds.), Spatial Representation: Problems in Philosophy and Psychology, Oxford: Blackwell.
  • Matthen, M., 2005, Seeing, Doing, and Knowing: A Philosophical Theory of Sense Perception, Oxford: Oxford University Press.
  • Matthen, M., forthcoming, “Auditory objects,” European Review of Philosophy, 7.
  • McGurk, H. and J. MacDonald, 1976, “Hearing lips and seeing voices,” Nature, 264: 746-748.
  • Mehler, J., P. Jusczyk, G. Lambertz, N. Halsted, J. Bertoncini, and C. Amiel-Tison, 1988, “A precursor of language acquisition in young infants,” Cognition, 29: 143-178.
  • Mole, C., 2009, “The Motor Theory of speech perception,” in M. Nudds and C. O'Callaghan (eds.), Sounds and Perception: New Philosophical Essays, Oxford: Oxford University Press.
  • Nakayama, K., Z. J. He, and S. Shimojo, 1995, “Visual surface representation,” in S. M. Kosslyn and D. N. Osherson (eds.), Visual Cognition, Volume 2 of An Invitation to Cognitive Science, second edition, Cambridge, MA: MIT, pp. 1-70.
  • Noë, A., 2004, Action in Perception, Cambridge, MA: MIT Press.
  • Nudds, M., 2001, “Experiencing the production of sounds,” European Journal of Philosophy, 9: 210-229.
  • Nudds, M., 2003, “The significance of the senses,” Proceedings of the Aristotelian Society, 104(1): 31-51.
  • Nudds, M., forthcoming, “What are auditory objects?” European Review of Philosophy, 7.
  • Nudds, M. and C. O'Callaghan, 2009, Sounds and Perception: New Philosophical Essays, Oxford: Oxford University Press.
  • O'Callaghan, C., 2007, Sounds: A Philosophical Theory, Oxford: Oxford University Press.
  • O'Callaghan, C., 2008a, “Object perception: Vision and audition,” Philosophy Compass, 3: 803-829.
  • O'Callaghan, C., 2008b, “Seeing what you hear: Cross-modal illusions and perception,” Philosophical Issues, 18: 316-338.
  • O'Callaghan, C., forthcoming, “Perceiving the locations of sounds,” European Review of Philosophy, 7.
  • O'Shaughnessy, B., 1989, “The sense of touch,” Australasian Journal of Philosophy, 69: 37-58.
  • O'Shaughnessy, B., 2002, Consciousness and the World, Oxford: Oxford University Press.
  • Pasnau, R., 1999, “What is sound?” Philosophical Quarterly, 49: 309-324.
  • Pasnau, R., 2000, “Sensible qualities: The case of sound,” Journal of the History of Philosophy, 38: 27-40.
  • Pinker, S., 1994, The Language Instinct, New York: William Morrow.
  • Rand, T. C., 1974, “Dichotic release from masking for speech,” Journal of the Acoustical Society of America, 55: 678-680.
  • Remez, R. E. and J. D. Trout, 2009, “Philosophical messages in the medium of spoken language,” in M. Nudds and C. O'Callaghan (eds.), Sounds and Perception: New Philosophical Essays, Oxford: Oxford University Press.
  • Rosenblum, L. D., 2004, “Perceiving articulatory events: Lessons for an ecological psychoacoustics,” in J. G. Neuhoff (ed.), Ecological Psychoacoustics, Chapter 8, San Diego, CA: Elsevier, pp. 220-248.
  • Scholl, B. J., 2001, “Objects and attention: the state of the art,” Cognition, 80: 1-46.
  • Scott, M., 2001, “Tactual perception,” Australasian Journal of Philosophy, 79(2): 149-160.
  • Scruton, R., 1997, The Aesthetics of Music, Oxford: Oxford University Press.
  • Shams, L., Y. Kamitani, and S. Shimojo, 2000, “What you see is what you hear,” Nature, 408: 788.
  • Shams, L., Y. Kamitani, and S. Shimojo, 2002, “Visual illusion induced by sound,” Cognitive Brain Research, 14: 147-152.
  • Smith, A. D., 2002, The Problem of Perception, Cambridge, MA: Harvard University Press.
  • Solomon, J., 2007, Spatialization in Music: The Analysis and Interpretation of Spatial Gestures, PhD thesis, Department of Music, University of Georgia, Athens, GA. [Available online (PDF)]
  • Soto-Faraco, S., J. Navarra, W. M. Weikum, A. Vouloumanos, N. Sebastián-Gallés, and J. F. Werker, 2007, “Discriminating languages by speech-reading,” Perception and Psychophysics, 69(2): 218.
  • Spelke, E. S., 1990, “Principles of object perception,” Cognitive Science, 14: 29-56.
  • Stevens, S. and J. Volkmann, 1940, “The relation of pitch to frequency: A revised scale,” American Journal of Psychology, 53: 329-353.
  • Stevens, S., J. Volkmann, and E. Newman, 1937, “A scale for the measurement of the psychological magnitude pitch,” Journal of the Acoustical Society of America, 8(3): 185-190.
  • Strawson, P. F., 1959, Individuals, New York: Routledge.
  • Trout, J. D., 2001a, “Metaphysics, method, and the mouth: Philosophical lessons of speech perception,” Philosophical Psychology, 14(3): 261-291.
  • Trout, J. D., 2001b, “The biological basis of speech: What to infer from talking to the animals,” Psychological Review, 108(3): 523-549.
  • Van Valkenburg, D. and M. Kubovy, 2003, “In defense of the theory of indispensible attributes,” Cognition, 87: 225-233.
  • Vouloumanos, A. and J. F. Werker, 2007, “Listening to language at birth: evidence for a bias for speech in neonates,” Developmental Science, 10(2): 159-164.
  • Weikum, W. M., A. Vouloumanos, J. Navarra, S. Soto-Faraco, N. Sebastián-Gallés, and J. F. Werker, 2007, “Visual language discrimination in infancy,” Science, 316(5828): 1159.
  • Welch, R. B. and D. H. Warren, 1980, “Immediate perceptual response to intersensory discrepancy,” Psychological Bulletin, 88(3): 638-667.
  • Werker, J., 1995, “Exploring developmental changes in cross-language speech perception,” in L. Gleitman and M. Liberman (eds.), Language: An Invitation to Cognitive Science, Volume 1, 2nd edition, Cambridge, MA: MIT Press, pp. 87-106.
  • Zahorik, P. and F. Wightman, 2001, “Loudness constancy with varying sound source distance,” Nature Neuroscience, 4: 78-83.
  • Zwicker, E. and H. Fastl, 2006, Psychoacoustics: Facts and Models. 3rd edition, New York: Springer.

Academic Tools

sep man icon How to cite this entry.
sep man icon Preview the PDF version of this entry at the Friends of the SEP Society.
sep man icon Look up this entry topic at the Indiana Philosophy Ontology Project (InPhO).
sep man icon Enhanced bibliography for this entry at PhilPapers, with links to its database.

Other Internet Resources

Related Entries

color | consciousness: unity of | music, philosophy of | perception: the contents of | perception: the problem of | qualia: inverted | sounds


I am very grateful to David Chalmers, Maddy Kilbride, and Shaun Nichols for extensive and helpful comments on drafts of this entry.

Copyright © 2009 by
Casey O'Callaghan <>

Open access to the SEP is made possible by a world-wide funding initiative.
Please Read How You Can Help Keep the Encyclopedia Free