Innateness and Language

Dupre, Gabe

Innateness and Language

First published Mon Jun 29, 2026

[Editor’s Note: The following new entry by Gabe Dupre replaces the former entry on this topic by the previous author.]

Questions about the role of experience in biological development, specifically cognitive development, are as old as philosophy (discussed by, e.g., Plato and Mencius). In the early modern period, these debates took central stage, focusing on the role of experience in the provision and justification of various forms of knowledge, with rationalists like Descartes and Leibniz arguing that knowledge of, for example, God or mathematics is in some sense independent of sensory experience, against empiricists like Locke and Hume who viewed all knowledge as a product of sensory experience (see Markie & Folescu 2021).

In the twentieth century, this debate was revived, and reinterpreted, as a properly empirical dispute within the cognitive sciences, due in large part to the work of Noam Chomsky. Chomsky argued that human language provided a particularly clear case for the structuring role of innate constraints on cognitive development. Adult humans display a mastery of a complex and subtle linguistic system which, Chomsky argued, cannot be explained simply with reference to the linguistic data they have been exposed to during their development. To account for these capacities in adults, we must appeal to innate language-specific faculties, constraints, and/or knowledge, which the child brings to the process of language acquisition.

Like any bold empirical conjecture, this work provoked a strong response by a wide range of researchers, who aimed to develop alternative, empiricist, accounts of language acquisition. One of the reasons this proposal was so exciting, but also so controversial, is that claims about the innateness of language are not the sole purview of any one field of inquiry. Such claims must cohere with what is known not just in linguistics (both theoretical and developmental), but also psychology more generally, evolutionary theory, neuroscience, anthropology, biology, and more. This entry will aim to survey some of the major results and debates within these fields as they pertain to the central topic of the innateness, or lack thereof, of human language.

1. Introduction
2. Chomsky’s Poverty of the Stimulus Argument
3. Fleshing Out the Nativist Picture
4. Innate Morpho-Phonology
5. Nativism in Semantics
6. Empiricist Responses
- 6.1 Enhancing the Primary Learning Data
- 6.2 Enhancing the Learning System
7. Empiricist Alternatives
Bibliography
Academic Tools
Other Internet Resources
Related Entries

1. Introduction

Before we begin, it is worth noting that the question of whether language is innate is really somewhat ill-posed. ‘Language’ refers to a varied set of phenomena, and to generate any empirically tractable research question, it will be crucial to narrow our focus onto specific linguistic phenomena. Traditionally, theoretical linguistics divides into the study of linguistic sounds (phonetics and phonology), the combinatory operations which generate meaningful complex structures out of simpler meaningful atoms (morphology and syntax), and linguistic meaning (semantics and pragmatics). In each of these areas, it can be asked: what aspects of developed linguistic competence are plausibly acquired on the basis of linguistic experience? And in each area, there has been robust debate about the answer, generating a number of partially parallel “innateness of language” debates. In part due to this diversity of targets in questions about language acquisition, the empirical literature on these subjects is enormous, encompassing decades of work across a wide range of fields. An entry like this cannot hope to cover but a small fraction of this. The aim is instead to give a flavor for the kinds of arguments being put forward, and to point towards some of the most promising avenues. I will centrally focus on the foundational texts in these various areas, which should give the reader the ability to follow up on more recent developments.

While logically independent, it is not without reason that advocates of nativism/empiricism about innateness in one domain frequently also endorse similar proposals in other domains. Those convinced of the necessity of positing innate structures to explain one feature of the mind will likely be moved also by structurally similar arguments applied to other areas. And the cost, if it is a cost, of positing innate structures or constraints in one domain may seem to reduce the costs of positing them in other domains. For this reason, an overall assessment of the debate about the innateness of language might be furthered by comparing analogous arguments in other areas of cognitive science, such as the case for innateness in, for example, our mathematical (Clarke 2025), spatial/navigatory (Spelke 2022; Gallistel 1998), and social/agential capacities (Spelke, Bernier, & Skerry 2013).

We are now far beyond the days of viewing empiricist alternatives to nativism as endorsing an (incoherent) “blank slate” picture of cognitive development. The debate instead centers on the existence of dedicated, domain-specific systems, as opposed to domain-general, flexible systems utilized across a wide range of learning tasks. In the linguistic case, as we shall see, empiricist approaches appeal to complex, innately structured minds and learning systems. What they deny is that humans have innate systems dedicated to the acquisition of language. For example, Michael Tomasello’s work presents one of the most carefully and thoroughly developed non-nativist approaches to language, which appeals to a suite of innate systems for pattern recognition, mind-reading, attention, co-ordination, and collaboration.

This raises the question: given that all parties agree that there must be something special about human beings in virtue of which they, but no other species, are capable of learning natural language, what makes an account of linguistic development nativist? There are a range of proposals in the philosophical literature about what “innate” and its cognates signify within the developmental sciences. For the purposes of this entry, I will rely on two. Firstly, innate traits are understood in contrast to traits which are structured by interactions with the perceived environment. So, a trait is innate to the extent that its structure or features are not explained with reference to the creature’s interaction with, and identification/extraction of, features of the environment in which the trait developed (Dupre 2021a; R. Samuels 2002, 2004, and canalization theories of innateness, as in Collins 2005; Ariew 1996, 1999). On such an account, a trait can be innate even if not present at birth, so long as its development is not a matter of attunement to, or reflection of, the learning environment. Such a proposal can differentiate innate from learned traits, but would not differentiate between linguistic nativists and those, like Tomasello, whose empiricist theories posit substantial innate constraints. To draw this boundary, we need to add the further condition that the relevant contributions of the learning system, which account for the disparity between the developed trait and the environment, be language-specific, i.e., these systems are uniquely dedicated to language and language acquisition (Margolis & Laurence 2013). Thus, we can conclude that language, or some linguistic capacity, is innate if and only if: developed linguistic traits cannot be explained by appeal to the developing organism’s sensitivity to the language learner’s environment, but must instead be explained with reference to the influence of a language-specific learning system.

This entry will be focused on the original, and still most significant, argument for linguistic nativism: the poverty of the stimulus argument (PoS). In section 2, I will discuss Chomsky’s original PoS argument as well as several variations. Section 3 situates these discussions of innateness within the broader scope of linguistic, psychological, and biological theory. Section 4 expands the range of targets from syntax to phonology and morphology. Section 5 details PoS arguments within semantics. Sections 6 and 7 introduce the major forms of empiricist response to PoS arguments, and shows how these result in alternative models of linguistic competence and acquisition. Two supplements

A. Innateness and (Large) Language Models addresses the relevance of recent developments in Generative AI, specifically Language Models, to the nativism debate, and
B. Suggestive, But Inconclusive, Lines of Research into Nativism addresses the wide range of considerations other than PoS arguments which have been appealed to in discussions of nativism, such as the speed and ease of language acquisition, neural localization, dissociation, language universals, and critical periods.

2. Chomsky’s Poverty of the Stimulus Argument

The most widely discussed, and in my view the most powerful, argument for linguistic nativism is Chomsky’s Poverty of the Stimulus argument. Chomsky presents this argument in numerous places, but it receives its fullest discussion in Chomsky (1968 [2006], 1980, 1986; see also Crain & Pietroski 2001; Jerry Fodor 1981; Janet Fodor & Crowther 2002; Laurence & Margolis 2001). While the details can get highly complex, the argument can be stated quite simply: there are aspects of developed linguistic competence which cannot be explained with respect to the evidence available to the language learning child. That is, the adult language user knows things about their language which the child cannot have learned on the basis of their experience, and which must therefore be instead a contribution of their innate endowment. Note that, as stated, this argument can, if successful, establish that language acquisition is innately constrained/guided, but says nothing about the specificity of the psychological system which constrains it. To argue for genuinely linguistic nativism, some further argument must be given to establish that the relevant information in the adult competence is not likely to be a product of non-language-specific innate systems.

We can thus specify a schema for POS arguments. These will involve three key ingredients: a target, some primary linguistic data, and a collection of potential learning mechanisms. Firstly, some data concerning the adult’s linguistic competence must be gathered. From these data, some facts about what the adult knows must be established, i.e., the theorist motivates a claim about the psychological traits of a competent language user. This forms the target of a POS argument: we ask, could a language learner learn this without innate guidance? This argument will then be successful if a negative answer to this question can be established. To establish this, we must know what relevant linguistic experience the child has, from which they might in principle learn. Call this the primary linguistic data (PLD). And, we must have a picture of what the relevant learning mechanisms are, which a learner can use in order to, in principle, learn these facts, where these mechanisms are assumed to be domain-general, including principles of statistical, e.g., Bayesian, inference. The first stage in a POS argument for linguistic nativism involves showing that the target cannot be acquired on the basis solely of these learning mechanisms applied to the PLD. This establishes that some innate contribution, beyond these general learning mechanisms, is required to explain acquisition of this linguistic knowledge. The next step then involves arguing that the relevant contribution is specific to language. We can schematize this argument like so:

1.: Children acquire the linguistic target T.
2.: Children are exposed to PLD, consisting of some set of utterances, perhaps including information about the extra-linguistic context.
3.: The child is equipped with some set of domain-general learning mechanisms LM.
4.: T is not acquirable on the basis of PLD and LM.
IC:: Therefore, some factors F relevant to the acquisition of T must be innate.
5.: F is language specific.

Conclusion: Therefore, children possess language specific factors, in virtue of which they acquire their adult language.

Exactly what form the innate contribution F takes is itself up for debate. While traditional (e.g., Jerry Fodor 2001) accounts treated the child as equipped with innate knowledge, in the form of propositional attitudes specifying facts about which languages are possible, most recent discussion has instead assumed that F arises from computational constraints on the possible development of a language-specific cognitive system, which need not have propositional form/content, nor even be representational (Collins 2004 and Hornstein 2013). Other options, e.g., an innately given hypothesis space within which learning can occur, or biases towards some hypotheses over others, are appealed to as well.

A fully fleshed out POS argument would thus have to specify values for T, PLD, and LM, and show on this basis why each premise (1–5) is supported, i.e., why LM must be supplemented by F in order for T to be acquired. The values for T can be highly diverse. They may be linguistic items, such as expressions or their features, operations, such as rules, or even less obvious things such as preferences for one kind of structure over another. Any of these which can be identified in learners must be explained, and such explanations can appeal either to innate features of the learner or encountered features of the environment. As Lasnik (2000: 3) points out, many proposed learning theories work by presupposing the categories into which linguistic entities are placed:

The list of behaviors of which knowledge of language purportedly consists has to rely on notions like “utterance” and “word”. But what is a word? What is an utterance? These notions are already quite abstract. Even more abstract is the notion “sentence”. Chomsky has been and continues to be criticized for positing such abstract notions as transformations and structures, but the big leap is what everyone takes for granted. It’s widely assumed that the big step is going from sentence to transformation, but this in fact isn’t a significant leap. The big step is going from “noise” to “word”.

Responses to this argument can attack any of these premises: they can argue that T is not in fact acquired or is misdescribed, that the PLD is richer than assumed, that LM is more powerful than assumed, or that F is involved in non-linguistic tasks and development. As we will see, all these moves have been made in the literature.

Before presenting a specific POS argument in more detail, it is worth stressing the size of the task facing someone who wishes to reject these arguments. To reject linguistic nativism, there must be suitable responses to each instantiation of the above argument, i.e., for all T, T must be learnable on the basis of PLD. Further, POS arguments need only establish that a child can learn the relevant features of their language without suitable guidance from their sensed environment. If any child manages this feat, even if other children are exposed to sufficient linguistic experience to learn the relevant facts, then the POS argument is successful. That is, we can run this argument for each child. Finally, it is not enough to show that the data is, in some sense, available to each child, but that the child is capable of attending to and processing the data sufficiently powerfully and reliably to extract the information needed to acquire T.

Chomsky’s most famous versions of the POS argument involve syntax. Chomsky appeals to some data characteristic of adult language users’ judgements concerning a range of sentences which display some perhaps surprising effects of sentential structure on interpretation and/or acceptability. For example, Chomsky (1975: 30–33) discusses the relations between interrogative and indicative sentences, easily enough recognized by adult speakers, indicated in the following:

1.

a.: The man is tall.
b: Is the man tall?

2.

a.: The man who is tall is in the room.
b.: Is the man who is tall in the room?
c.: *Is the man who tall is in the room?

These data suggest that there is some grammatical rule which relates the a sentences to the b sentences. Specifically, 1b corresponds to 1a, except that the auxiliary verb in 1a has been moved to the beginning of the sentence in 1b. The class of sentences of this sort exemplifies some adult knowledge relating the form of indicative sentences to the form of imperatives. The question is: what form does this knowledge take? The 2 sentences provide crucial information here. Because 1a includes only one auxiliary verb, many distinct rules would be collapsed in their application of this sentence, generating the correct 1b. For example,

raise any auxiliary sentence-initially,
raise the first auxiliary sentence-initially,
raise the main auxiliary sentence-initially.

But in 2, there are multiple auxiliaries, and thus we can pull the predictions of positing these distinct rules apart. Rule (ii) would predict that 2c would be the interrogative corresponding to 2a. Rule (i) would predict that 2b and 2c would both be legitimate. Both of these are contrary to fact. The right rule, at least with respect to these cases, is rule (iii). Chomsky infers from this a general principle of grammar, Structure Dependence (SD). According to SD, linguistic rules apply to linguistic structures on the basis of their structural properties, not on the basis of linear order. Being the main auxiliary of the sentence involves, roughly, being the least embedded auxiliary verb within the verbal/inflectional construct with which the sentential subject combines. Because subjects can be arbitrarily long and can contain both auxiliaries and subject Determiner Phrases (“The man who is tall”, “The man who is taller than the man who is short”, and so on), there is no way to define the target of this rule in terms of the linear ordering of words in the sentence. It can only be defined in structural terms. If this is correct, it provides the target of a POS argument: adults know that rules apply on the basis of structures, not linearly ordered strings.

Assuming that SD is a principle of adult English grammar, the question then is: could a child learn this principle on the basis of their PLD? Chomsky’s argument is less fleshed-out on this point. He asserts that “A person may go through a considerable portion of his life without facing relevant evidence”, i.e., without encountering sentences like 2b, which indicate to the child that it is rule (iii) which governs interrogative formation in English, not rule (i) or rule (ii). As we will see, this claim has been disputed, but let us assume it in spelling out the argument. That is, assume that the PLD consists of data consisting of sentences like 1a and 1b, but not 2a or 2b.

The next question is: would a child be able to identify rule (iii), the structure-dependent rule, on the basis of these data? Again, Chomsky is less than explicit here on the nature of the assumed non-nativist alternatives, and the LMs they posit. Chomsky claims that rule (ii) is “simpler and more elementary”. The idea here is that it requires complex and language-specific machinery to state rule (iii), specifically reference to linguistic structure and perhaps notions like auxiliary and subject, whereas rule (ii) requires only that the language user can count the serial ordering of words in an utterance, a capacity involved in pattern recognition and other cognitive tasks. If Chomsky is right that such inferential mechanisms are characteristic of non-nativist learning strategies, then we would expect children to infer from their experience of sentences like 1a and 1b that (ii) is the relevant rule, leading to mistakes like 2c, which are never observed. This argument can thus double as an argument for language-specificity. If the child must assume that main auxiliaries are the ones which are raised, and main is specified in linguistic structural terms, then the child’s innate assumptions are language-specific. Short of an argument that these representational categories can be acquired from the PLD, we can then conclude that the child’s linguistic development can only be explained with reference to innate and language-specific assumptions.

With this general structure of POS arguments on the table, we can identify some characteristic extensions, distinctive forms, and particularly significant instances of these arguments in the literature.

It is, or at least used to be, customary in the literature to identify a special case of POS, dubbed the “Logical Problem of Language Acquisition” (Hornstein & Lightfoot 1981). Despite the name, it is important to note that this is not, in fact, an a priori argument for linguistic nativism. It differs from the above version of the argument solely in the assumptions it makes about the specific character of the PLD. In the above, classical, POS argument, it is assumed that children do not, in the general case, encounter the data required to explain, in concert with LM, the grammatical knowledge they end up with (e.g., sentences like 2a and 2b). However, it is allowed that such data could be available, and perhaps are available to some children. The Logical Problem instead focuses on grammatical knowledge which seems in principle unavailable to children, specifically knowledge of what is excluded from the language. The assumption driving this argument is that children are exposed to only positive instances of their language in sufficient quantity to learn from (Marcus 1993). Knowledge that some constructions are impossible, especially when such constructions are prima facie extensions of the PLD, has been alleged to pose particularly difficult problems for non-nativist approaches.

One famous example of this kind of argument comes from the constraints on wh-movement identified in Ross (1967). We gestured above to the ways in which yes-no questions in English are formed, namely by raising auxiliary verbs above sentential subjects. However, these are not the only kinds of questions made available in adult English. Wh-questions, formed with wh-expressions like what, who, why, etc., enable us to ask not merely whether some claim is true, but in a more fine-grained fashion which claim is true. Questions of this sort, however, still exemplify the above-discussed relations to corresponding indicative sentences. To relate indicatives to wh-expressions, we can imagine beginning with an indicative, generating from it an interrogative through subject-auxiliary inversion, replacing one of its constituent expressions with the relevant wh-expression, and then raising this expression to the beginning of the sentence. For example:

3.

a.: Selim will sell his car.
b.: Will Selim sell his car?
c.: Will Selim sell what?
d.: What will Selim sell?

What Ross observed is that there are a wide range of structurally specifiable contexts in which this process results in unacceptability. For example, when declarative sentences contain conjunctions, we are unable to form well-formed questions in which we ask about just one of the conjuncts:

4.

a.: Selim will sell his car and his bike.
b.: *What will Selim sell and his bike?
c.: *What will Selim sell his car and?

On the assumption that children are exposed only to acceptable sentences, they would never encounter the negative data assumed to be needed to indicate the impossibility of 4b and 4c. Despite this, adult English speakers universally know that 4b and 4c are unacceptable. So again the question arises: what, if not innate knowledge, can explain this fact about adult linguistic competence? These kinds of arguments strengthen the case for linguistic nativism in two ways. Firstly, they make weaker, and therefore easier to assume and/or establish, assumptions about the shape of the PLD. All that is needed for this argument is the general claim that children do not (have to) learn from negative data, in the form of adult corrections and the like. Secondly, the target of these arguments seems more clearly language-specific. While general linguistic principles, such as SD, could potentially be analogized to non-linguistic information, either innate or acquired, it is harder to see any analogues of these constraints elsewhere in the human mind.

These kinds of constraints on possible grammatical constructions provide examples, in the linguistic case, of what Laurence and Margolis (2024) call “Cognitive Quirks”, i.e., “surprising or unexpected facts about people’s minds” (2024: 393). These traits are surprising or unexpected, of course, not in some objective a priori sense (whatever that would mean), but on the basis of the experiences children can be expected to have, and the kinds of statistical inferential processes they can be assumed to apply to these. Where we find these quirks, as exemplified by Ross’ wh-islands, they provide particularly powerful arguments for specific, innate biases in children’s acquisition. Every wh-question the child has heard will have a corresponding declarative form, exemplified by the relations between 3a and 3d above. But, for some reason, children refuse to perform the reverse mapping, relating declaratives of familiar forms like 4a to forms like 4b or 4c. As such constraints are found in all examined world languages, they present a particularly important case study for non-nativist accounts of human language to explain, without resort to the kinds of innate knowledge and biases nativists propose.

Another important class of PoS arguments centers on the resilience of human language acquisition, despite various kinds of impoverishment to their PLDs. Barring drastic and brutal developmental conditions (of the kind discussed in Curtiss et al. 1974), all children achieve fluency in their native language. This, on its own, is a remarkable fact, providing much of the impetus for nativists. Language seems more like the visual or vestibular system, developing in roughly the same way, barring pathology, across the population, than like mathematical or musical abilities, which are highly unevenly distributed, and require specific, constructed learning environments. This becomes even more remarkable when the full range of developmental environments is considered. Cross-culturally, children’s linguistic experience varies significantly. In many cultures, parents often speak to children in ways that seem particularly well-suited for language acquisition (cf. Newport, Gleitman, & Gleitman 1977; Gleitman, Newport, & Gleitman 1984). However, this is far from a cultural universal. Ochs and Schieffelin (1982) discuss the cases of Kaluli speakers (Papua New Guinea), who, due to “their belief that infants ‘have no understanding,’ never treat their infants as partners (speaker/addressee)” (1982: 289), and of Samoan, in which “typically caregivers do not engage in ‘conversations’ with infants over several exchanges” (1982: 296). Despite these substantially different courses of linguistic experience, there is no evidence that Samoan- or Kaluli- learning children display any delays or deficiencies in their ability to learn a language.

But the resilience of language learners extends far beyond these cases of cultural variation. Investigation of the ways that children who lack normal, perceptual access either to the evidence relevant to learning language as a whole or to specific aspects of language has provided some of the most important case studies in the relative importance of experience and innate structures in language acquisition. Goldin-Meadow and Feldman (1977) discussed the structural features found within ‘home-signs’, gestural communicative systems developed by deaf children in non-signing households, arguing that such features must be explained by language’s innate components, as they have no linguistic experience to learn from. Significantly, such studies have shown that these home-sign languages display many of the structural properties of spoken languages, thus furthering the case that this structure is part of the innate linguistic endowment, as it is evidenced with or without instruction (see Goldin-Meadow 2003a for an overview).

Another example of acquisition in perceptually deprived conditions, in this case in the domain of lexical semantics, come from blind children’s acquisition of intuitively visual meanings, such as the verbs ‘look’ and ‘see’ (Landau & Gleitman 1985), and, perhaps even more remarkably, color terminology (Kim et al. 2021). In these cases, children have linguistic experience, of course, but the specific items they acquire competency with do not seem learnable on the basis of their experience.

Other cases of language acquisition in environments without a suitable model to learn from come about when a number of speakers come together in an environment without a shared language, and thus children in this environment will need to learn from the often ad hoc and simplified strategies used to facilitate communication without a shared language. Political and economic forces often bring together people from a number of different linguistic populations, without the required resources to learn a common language, and so they are forced to adopt a pidgin, a grammatically impoverished communicative code, often drawing from the lexicons of multiple source languages. When children are raised in these environments, as often happened on a plantation, they may have only these pidgins as their data sources. Remarkably, however, children raised in such environments do not end up speaking the simplified, irregular, and restricted pidgins of their parent’s generation. Instead, in the space of a single generation, they develop a creole, a full-blown language with complex characteristics found in other sorts of language (recursive embedding, inflection for Tense, Modality, and Aspect, and so on), but absent in the pidgins from which they learned (see Bickerton 1984, 2014).

One final example, which ties together the last few kinds of cases, comes from communal languages which develop out of home-sign, such as Nicaraguan Sign Language (Senghas & Coppola 2001) and Al-Sayyid Bedouin Sign Language (Sandler, Aronoff, Meir, & Padden 2011). We see in these cases the characteristic features of natural languages, even though initial learners have not been exposed to these features in their development.

What all these have in common is that we appear to find features in the developed competence of language users which are not present in their linguistic experience. This provides a compelling case for some innate knowledge: what could explain these features, if not constraints on the developing minds of the language learners? Although, as noted, further argument is needed to establish that these innate components are specific to language. Several theorists (Bickerton 2014; Goldin-Meadow 2003b) have further argued that many of these cases provide particularly clear windows into the nature of the innate component, as they exemplify the development of linguistic cognition without the distorting effects of a specific target language. This argument is again bolstered to the extent that we can find regularities among these entirely isolated languages, both between one another and with languages acquired in more typical circumstances. Finally, the existence of these languages poses explanatory worries for proposals which aim to explain language structure and acquisition not in terms of innate features of the human mind, which bias children towards learning some languages over others, but instead in terms of languages themselves being shaped over time by learnability and usability considerations (e.g., Christiansen & Chater 2008, 2016; Deacon 1997; see §7.2). Such proposals may be possible for languages with long lineages, such as English or Mandarin, but are not applicable to languages of such recent origins.

A related, and potent, form of PoS argument appeals to the “continuity assumption” (Crain 1991; Crain, Koring, & Thornton 2017). This is the idea that children are not merely restricted by UG in where they end up, but also in the kinds of mistakes they are liable to make along the way. Specifically, when children adopt some rule or structure which is impermissible in their target language, what they adopt will be permissible in some other language. That is, acquisition errors will not simply be random, but will reflect the general structure and biases of their innate endowment. Such examples provide a particularly powerful case for the nativist. Acquisition errors are, by their nature, not reflective of the environment. And so any patterns observable in these errors must be explained by features of the learners themselves. If this explanation involves appeal to something language-specific, that is the best case for a linguistic nativist: the child’s linguistic hypotheses are constrained by the possibilities of languages they have never encountered, and so must have some pre-experiential knowledge of the possible shape of a human language.

See Crain, Koring, and Thornton (2017) for detailed discussion of a number of cases of just this sort. I will just describe one to give the flavor of this particular style of PoS argument, drawing on the proposed language learning bias, the Semantic Subset Principle (SSP). According to SSP, roughly, children who are unsure about the interpretation of a particular expression, or expression type, should favor a narrower interpretation, i.e., one which is true in fewer circumstances. This is because narrow semantic hypotheses can be overturned in the face of falsifying evidence, whereas the child cannot decisively overturn an over-general hypothesis. SSP makes concrete predictions about children learning languages in which the more general interpretation of a given structure is correct, namely that they will hypothesize incorrect interpretations with narrower truth-conditions.

Here is an example. Mandarin and English differ with respect to the default interpretation of sentences containing both a negation and a disjunction. In English, surface word order dictates the most natural scope ordering: “Ted did not order pasta or sushi” is naturally heard as saying that Ted ordered neither pasta nor sushi, i.e., with negation scoping over the disjunction. The linearly equivalent sentence in Mandarin, however, is naturally interpreted with scopes reversed: “Tàidé méiyǒu diǎn yìdàlìmiànshí huòzhě shòusī”, literally Ted not order pasta or sushi, means that either Ted did not order pasta or Ted did not order sushi. Note that the English interpretation entails the Mandarin interpretation, but not vice versa. And so, a child who hypothesized the Mandarin reading, when the English interpretation was correct, would never be given (direct) disconfirming evidence: every time they encountered a sentence of the form “…not…or…”, their interpretation that at least one of the disjuncts was false would be confirmed, as this sentence would only be used when both were. Thus, they would be trapped with their incorrect interpretation. SSP thus predicts that children learning Mandarin should hypothesize that sentences like this mean what they do in languages like English, but not what they do in Mandarin, at least until they get the decisive evidence for the latter interpretation (i.e., encounter an adult utterance of this sentence type in which only one disjunct is false), at which point they can revise and adopt the less restrictive interpretation. Crain, Koring, and Thornton (2017) cite work, starting with Goro and Akiba (2004), which confirms this prediction, not just for Mandarin, but also Japanese, Russian, and Turkish, in which adults interpret such expressions permissively, while children interpret them in the restrictive, English-type, way (although see Han, Lidz, & Musolino 2007 for experimental evidence that children learning Korean, where evidence for scope ordering of negation and quantification is sparse, do not opt for the subset option, but instead seem to select one option at random). This presents a powerful PoS argument because such interpretations cannot be learned from the child’s parents or caregivers, as they would not use these expressions in these ways. But these interpretations are not random, they are as would be predicted for a learning system designed to avoid traps and dead-ends. And they provide evidence for a cross-linguistic acquisitional universal, which at least initially seems language-specific.

3. Fleshing Out the Nativist Picture

A central desideratum in developing a nativist theory of language is balancing the burden such a theory places on phylogeny and ontogeny (Dupre 2022). The more of developed competence that is innate, the less there is for the child to learn, relieving pressure on a theory of linguistic development. However, this relief is mirrored by the greater work that a theory of human cognitive evolution must play, in order to explain how all this innate structure came about in the first place. In the last few decades, several works (e.g., Berwick & Chomsky 2016; Bickerton 2014; Boeckx 2021) have directly addressed this question.

3.1 Innateness and Evolution

Innate systems do not just appear by magic, and so some story about how they got there is needed. Nativist theories solve developmental problems by incurring debts to evolutionary theories. If it turns out that we can’t explain how a lineage would evolve with the right sort of innate structure, such theories are undermined. Note that even if (like Berwick & Chomsky 2016), you deny that the existence and features of the essential character of human language (i.e., what humans have, but non-humans lack, in virtue of which the former but not the latter can learn and use language) is explained by appeal to natural selection, this does not get you off the hook for an evolutionary account. Even on this unorthodox view (see, e.g., Pinker & Bloom 1990; Pinker & Jackendoff 2005 for selectionist alternatives), we need to explain how it was that a “language-ready” mind came about, such that a minor addition could create a fully-fledged language user.

Exciting work in this area has also come from asking comparative psychological questions, a relative novelty in theorizing about language, concerning systems found in non-human/non-linguistic animals, but which may serve as scaffolds, precursors, or preconditions for human language. For example, work by Bridget Samuels (2015; B. Samuels, Hauser, & Boeckx 2016) finds evidence for some of the basic components of human phonological systems, such as vocal learning and abstract computation, in avian cognition. Taking this a step further, Miyagawa’s (2017; Miyagawa et al. 2014) Integration Hypothesis views human language as built on top of a foundation of the novel combination of combinatory operations found in birdsong, and the ‘word-like’ properties of primate call signals (discrete symbols, displaced reference, and so on).

Nativist approaches are best served, then, by work in this vein being fairly, but not completely, successful. Human linguistic behavior is highly complex, relying on a wide range of distinct cognitive and physiological systems, most of which are not dedicated to language. This suggests an area ripe for comparative studies, minimizing those language-specific factors which nativists view as crucial, and which must have arrived in the human lineage relatively recently. Up to a point, the more that can be ‘off-loaded’ to these programs, the less a linguistic nativist will need to posit, and so the less likely it is that evolutionary theory will be unable to underwrite its claims. However, if these programs are too successful, they may show how all aspects of language can be explained in these ways, as a matter of the right arrangement of language non-specific components, perhaps with analogs or homologs in other species. If taken to this extreme, the resulting theory looks much more like the non-nativist theories of, for example, Michael Tomasello. After much neglect in the twentieth century, evolutionary accounts of natural language are becoming more popular. This is good news for the nativist debate, as nativist theories of language seem to require quite a specific story of evolution.

The constraints on nativist proposals coming from evolutionary theory have played a significant role in shaping linguistic theory. As noted above, it is only when we have a clear picture of what the acquired language is like that we can seriously ask whether this system is significantly shaped by innate, and language-specific, factors, or whether it is plausibly extracted from the environment. Early generative theory (Chomsky 1957, 1965) described languages in terms of a large number of highly (language and construction) specific rules. These rules divided into phrase structure rules, which detailed how simple expressions could be combined to generate complex structures, and transformations, which manipulated these complex structures, by re-ordering, combining, or removing elements. While not fully made explicit, the innate “knowledge of language” posited by this class of theories was quite substantial. The child comes to the task of language acquisition with knowledge of which rules are possible and which are not (e.g., knowledge that all such rules must be structure-dependent), a representational schema for analyzing input sentences, a method for comparing such representations with each hypothesized grammar, and an evaluation metric for determining which possible grammar best comports with their linguistic experience (Chomsky 1965: 30).

Fairly quickly (cf. Ross 1967; Jackendoff 1977; Chomsky 1981) it was realized that this grammatical formalism was too unconstrained to plausibly account for acquisition, and to explain why we find the languages we find. This resulted in the first major shift in generative linguistics, from Transformational Grammar to the Principles and Parameters (P&P) theory.

3.2 Principles and Parameters

The core proposal of P&P was that probabilistic search through a vast hypothesis space, in order to identify rules capable of generating the primary linguistic data, was inefficient, computationally costly, and explanatorily unsatisfying. This theory of learning was thus replaced with a bipartite theory of language. Principles are universal constraints on what kinds of languages are possible. As they are universal, no learning is necessary. Parameters, on the other hand, can be understood as Principles containing a variable. For example, the ‘head directionality parameter’ specifies that languages either place the head of an expression before or after their complements, but it is up to the child to determine which. One core idea was that ‘setting’ a parameter could be done without substantial probabilistic reasoning. An English-speaking child could hear just a small number of transitive sentences, and identify that the Verb comes before its Object, and thus settle the parameter value to ‘head-initial’, while a Japanese learning child could make the opposite determination with just a few sentences in which the Object precedes the Verb (although see Sakas & Fodor 2012 for criticisms). In this way, the set of parameters defined the range of possible languages, with one language for each possible combination of settings.

However, this elegant theory of language acquisition increased the burden on theories of language evolution. Parametric theories are over-specified; the child knows prior to experience the full range of different options for languages, and thus the learning procedure is best viewed as one of selection, or even of forgetting (Yang 2006). But if children innately know all the possible languages (i.e., their innate parameter space determines them), this knowledge must come from somewhere, and it has seemed the only option is evolution/selection. However, language is a relatively recent trait, biologically speaking, especially according to prominent generativists (Berwick & Chomsky 2016), and the parameters themselves don’t seem obviously functional (while language as a whole may be functional, what could be the story for selection of a human population with the head-directionality parameter?). So it was widely worried that these theories relied on an implausible story about the human lineage. In combination with empirical and explanatory concerns about the role of parameters in linguistic theory (cf. Newmeyer 2004, 2017; Boeckx 2010), this led to the current implementation of generative linguistics, Minimalism.

3.3 The Minimalist Program

The Minimalist Program (Chomsky 1995, for an introduction, see Hornstein, Nunes, & Grohmann 2005) aims to strip the linguistic system down to its minimal elements, with the hope that through appeal only to these, we will still be able to derive most or all of the descriptive generalizations identified in previous versions of generative theory. The core computational function of a grammar is the generation of complex structures out of simple structures. To this end, the only strictly necessary element of a grammar is an operation, called ‘Merge’, which takes two expressions and combines them to form a larger expression. To make use of such an operation, there must also be a store of basic expressions, the Lexicon, as well as systems capable of making use of the structures generated by this system, specifically a semantic system (alternately called the LF-interface, or the Conceptual-Intentional system) which can assign interpretations, and a perceptual-motor system (the PF-interface) which can assign such structures whatever properties are needed to produce and perceive public linguistic signals. One core motivation for this proposal comes from the aforementioned evolutionary constraints. While language is an evolutionarily recent invention, conceptual representations, and perceptual and motor systems are much older. If all that is language-specific is Merge, and language is made possible by adding, and integrating, a computational system defined by Merge into these older systems, then we need not posit much innate linguistic structure or knowledge, and thus it is easier to see how language could have evolved (Hauser, Chomsky, & Fitch 2002). Of course, what remains to be shown is that we can indeed reproduce the explanations of prior linguistic theories with this minimal material (and, further, to the extent that we still need to posit language-specific innate components in the lexicon and interface systems, the worries about evolution and evolvability remain).

While Minimalism has probably earned the right to be called “mainstream generative linguistics” in the twenty-first century, it is by no means the only such approach. The Principles and Parameters approach is still represented in a more-or-less traditional way, by, among others, Baker (2002) and the Structural Triggers working group (Sakas 2016). Charles Yang’s (2002) variational model of grammar acquisition involves a roughly P&P model of a grammar, but combined with a return to the statistical, evaluation-metric, learning model of early generativism. And for an alternative understanding of parameters as under-specified (i.e., as parameters representing just those elements of language which are not determined by principles, rather than as positive constraints in their own right), as opposed to the traditional over-specified interpretation, see Biberauer, Holmberg, Roberts, & Sheehan 2014.

3.4 The Parallel Architecture

The Parallel Architecture (PA), developed by Ray Jackendoff (e.g., Jackendoff 2002, 2025; Culicover & Jackendoff 2005; Jackendoff & Audring 2020), provides an alternative approach, combining ideas from both the generative and constructionist tradition. Jackendoff argues against the ‘syntactocentrism’ of traditional generative approaches, which view the combination of words/morphemes into phrases and sentences as the core engine driving all linguistic properties. Jackendoff instead proposes that linguistic competence involves multiple distinct generative systems: syntax, semantics, morphology, and phonology. Acquiring a lexical item then requires identifying those properties which are relevant to each of these systems. Generating a complex expression involves ensuring that there is some degree of alignment or correspondence between the properties and structures generated by each such system. The added degrees of freedom provided by the parallel systems enabled PA to more naturally handle mismatches between the various linguistic domains: semantics and syntax (e.g., expletive pronouns), semantics and phonology (unarticulated variables), and so on.

Even with such a brief overview of some of the major divides in theoretical linguistics, I hope the key point is clear: we cannot assess claims about linguistic nativism without determining what the adult competence is, i.e which linguistic theory is correct. These disparate theories describe developed competence in incommensurable ways, and the differences matter hugely to the plausibility of linguistic nativism. This is true both for the language system as a whole, and for specific pieces of knowledge within it. In some cases (e.g., classical Principles and Parameters theory, the Parallel Architecture) the structure of the system itself is quite ornate, and not plausibly learned, thus suggesting robust innate linguistic knowledge (it is for this reason that Jackendoff insists that there must be a long evolutionary runway for language phylogeny [Pinker & Jackendoff 2005 and Pinker & Bloom 1990]). Minimalism on the other hand aims to posit the bare minimum of language-specific structure, but still accepts that this minimal innate component is required to explain why some aspects of developed competence take the form and structure they do. Questions about linguistic nativism are thus inextricably entwined with first-order questions about the best analysis of the specific constructions knowledge of which is appealed to in a PoS argument, and also the more abstract questions about the general form a linguistic theory should take and how it could have evolved.

4. Innate Morpho-Phonology

While many philosophers have discussed PoS arguments for linguistic nativism, these have mostly focused on cases of the acquisition of syntax. PoS arguments can be produced, and evaluated, in a much wider range of areas. Phonology, the study of the sound systems used to produce, or externalize, linguistic expressions provides a particularly interesting case.

As Bromberger and Halle note, “Though many philosophers of language have views on empirical linguistics, few, if any, have given serious attention to phonology” (2000: 19). This lack of attention itself, I think, makes phonology a good target for philosophical focus, as an expansion of the range of cases under discussion can only help to clarify the issues surrounding nativist and empiricist approaches to language acquisition and cognitive development more generally. Beyond this, nativism in phonology is of interest because it exemplifies (i) that a theory of acquisition can appeal to innate resources but without being linguistically nativist, (ii) that nativist theories can differ substantially from one another, in empirically testable ways, and (iii) the different aspects of the developed linguistic system each of which require a developmental account.

4.1 Categorical Phoneme Perception in Humans and Other Animals

Any account of how children learn their language must begin with a story of how children perceive their language. And in general (excepting the important case of signed languages) this will be an account of linguistic auditory perception. Some of the most notable early discussions of the ways that humans are innately disposed to learn a language focused on just this.

The first step in learning a language is recognizing the difference between linguistic stimuli and non-linguistic stimuli, between speech and mere noise. Speech, unlike noise, is heard as consisting of a number of distinct, and more-or-less discrete, components. We hear sentences as composed of words, and words themselves as composed of smaller, repeatable components, ‘phonemes’. This in itself is a remarkable fact, as it requires that we impose substantial structure on the physical stimulus. Starting with the pioneering work of Alvin Liberman (see, e.g., Liberman, Cooper, Shankweiler, & Studdert-Kennedy 1967), it has been established that speech sounds, considered as physical/acoustic entities, lack many of the properties of speech as represented by a human hearer. Most centrally, the acoustic signal is a more-or-less continuously varying wave of displacement in the air between a speaker and a hearer, contrasting starkly with the discretized way speech is perceived. Thus, a cognitive psychological theory is needed to capture the representational system in which the output of the speech transduction process is encoded.

Liberman and colleagues (1967) discovered that human adults perceive speech sounds categorically. That is, even where the physical signals corresponding to distinct speech sounds vary continuously, hearers impose categorical boundaries, such that sounds within a category sound more like one another than either do to sounds outside of the boundary, even when the distance between the two, considered objectively, is equal. For example, the difference between producing a /ta-/ sound and a /da-/ sound consists in how long after the start of the onset consonant the vocal cords begin to vibrate (“Voice-onset time”, or “VOT”). For /da-/, voicing is immediate or almost so, whereas for /ta-/, there is a brief delay between the initial production of the consonant and the vocal cord vibration associated with the vowel (roughly 30ms). As a duration, VOT can be manipulated to create a series of incrementally varying stimuli. The question is: how do human hearers treat such a series? What Liberman and colleagues found is that, despite the continuous nature of the stimuli, speakers did not hear this series as continuously varying, but instead as shifting starkly from a series of highly perceptually similar /da-/ sounds to a distinct series of highly similar /ta-/ sounds. That is, /da-/sounds close to the /da-/-/ta-/ boundary sounded more like the physically dissimilar /da-/sounds with VOTs close to 0 than to the physically similar /ta-/ sounds on the other side of the category boundary.

This phenomenon thus gives rise to a nativism vs. empiricism question: do human language learners learn where the phonemic boundaries are, or is this innate? Work by Eimas, Siqueland, Jusczyk, and Vigorito (1971), using an habituation paradigm, found that 1-, to 4-, month old infants who became habituated to synthetic speech sounds with VOTs within the normal range for a /b-/ sound would dishabituate when presented with a stimulus with a VOT in the normal range for a /p-/, but not in response to similarly different /b-/ sounds. This result is widely taken to show that infants, with minimal experience of the phonetic properties of language, nonetheless draw phonemic boundaries in much the same place that adults do.

This seems like a parade case for linguistic nativism. But, the story continues. In elegant work, Kuhl and Miller (1975) showed that similar results can be found in chinchillas. They played caged chinchillas naturalistic and synthetic speech sounds (composed of Consonant-Vowel sequences, differentiated by the VOT of the consonant) and trained them to associate a specific sound with an electric shock. The question was, how would they generalize from this specific sound to other, physically similar sounds? And what they found was that chinchillas generalized in much the same way humans do, categorically in line with VOT. That is, chinchillas view sounds with VOTs within a given range as all equivalent /da-/’s, but as soon as this range is exited, the sound becomes a /ta-/. Subsequent research has found similar results in birds, rhesus macaques, and other animals.

This work emphasizes the distinction between nativism in general and linguistic nativism in particular. While human infants, with minimal experience of language, display particular capacities relevant to language, such as categorical perception of phonemes, the linguistic nativist thesis doesn’t follow from this. That we see the same cognitive capacities in non-linguistic animals like chinchillas and macaques shows us that these capacities are not language-specific. Human language may build on an innate ability to categorically distinguish speech sounds, but these capacities are not dedicated to language use.

4.2 Nativism About Phonological Features

As Pearl (2022) notes, there have been relatively few explicit, empirical arguments for nativism in phonology of the form of the PoS outlined above, in which a specific piece of phonological knowledge is identified in an adult, and for which there is little or no evidence available to the language learning child (although she notes Idsardi’s (2005) case involving final-obstruent devoicing as one instance). This perhaps explains the widespread denial within phonology circles that there is a poverty of stimulus for phonological acquisition (e.g. Blevins 2004: 235: “within the domain of sounds, there is no poverty of the stimulus”, quoted in Volenec & Reiss 2020). Hale and Reiss, in a series of works together, alone, and with other collaborators, marshal a wide range of empirical evidence for their particular theory of developed phonological systems, but in their discussions of nativism tend to argue on a more abstract level that the categories of phonology must be innately given.

Hale and Reiss (2003) provide a classic argument, drawing on similar arguments by Jackendoff (1990) and Jerry Fodor (1975), arguing for what they call the Innateness of Primitives Principle (IoPP). They argue that learning which sounds are possible in one’s native language presupposes the ability to identify the relevant sounds in one’s ambient linguistic experience. For example, for English learning children to learn that their language (unlike French) contains dental fricatives, they must recognize their peers and parents as making use of dental fricatives in their speech. Unless they do this, they cannot take their linguistic experience to bear on the question of whether dental fricatives are allowed. For learning, what matters is not what is in fact in the environment, but how it is represented by the learner. They thus endorse the claim that “representations at an earlier stage of acquisition must be more highly specified than those at a later stage”, a version of the claim that children don’t so much as learn their language, but instead un-learn all the other possible languages that they are not exposed to (cf. Yang 2002, 2006).

Hale and Reiss’ argument thus makes a similar point to the Lasnik quote from earlier, that innate knowledge is needed simply to identify the stimulus as having linguistic, rather than merely physical or acoustic, properties. A similar claim is endorsed by Elan Dresher (2025: 5–6), although in the service of a quite different nativist theory, when he says

This type of poverty of stimulus does not depend on showing that learners have acquired patterns or generalizations for which they did not have sufficient input; rather, we are dealing with a basic incommensurability between an acoustic signal and whatever representation learners assign it.

4.3 Nativism About Phonotactic Preferences

Iris Berent (2013) develops an experimentally grounded argument that speakers have knowledge about linguistic phenomena even when they are not evidenced in their native language. Building on joint work (Berent, Steriade, Lennertz, & Vaknin 2007), Berent investigated whether speakers displayed perceptual preferences between phonotactic structures not available in their native language. Phonotactics is the study of the available combinations of phonemes in a given language. Different languages allow for the occurrence of different complex sounds, sometimes relative to different positions within a word. One major area of phonotactic variation is in the ‘onset’ of a syllable, i.e., the consonants which come before the vowel. Some languages, like Russian, allow for a wide range of onsets, while others, like Japanese, are highly restrictive. English sits somewhere in between these two extremes. For example, in English, words can begin /gl-/ (‘globe’, ‘gluten’), but not /lg-/, whereas in Russian, both options are found: the verb to lie is ‘лгать’ (roughly: ‘lgat’), and the verb to swallow is ‘глотать’ (roughly: ‘glotat’).

Interestingly, these patterns of phonotactic possibility and impossibility are not randomly distributed in the Earth’s languages. There are asymmetric entailment relations between them. Languages which allow /lg-/ almost always allow /gl-/, but not vice versa. These dependencies are reasonably well-captured by the sonority profiles of these consonant clusters. Sonority is a scalar property of the production of a phoneme: roughly, the louder the sound, the more sonorous it is. And syllables tend to rise in sonority in the onset, and then fall in the coda (the consonant(s) following the vowel). The least sonorous consonants, such as plosives like /p/, /b/, and /g/ which involve complete stops and releases of airflow in the vocal tract, are thus more frequently found before more sonorous consonants, such as liquids like /l/ and /r/ for which airflow continues throughout production, in onsets (the reverse is true in codas). We can define consonant clusters in terms of changes in sonority. Some consonant clusters rise in sonority (e.g., /gl-/, /pr-/), some plateau, with each consonant at the same sonority level (e.g., /bd-/), and some fall (e.g., /lg-/, /rp-/). We can further define the above asymmetric entailment relations in these terms: languages which allow falling sonority in onsets also allow plateauing and rising sonority, and languages which allow plateauing sonority also allow rising sonority. In Berent’s terms, languages which allow the marked varieties (i.e., the taxonomically rarer sounds) also allow the less marked (i.e., the more common) varieties.

Berent’s question, then, was: do we find a preference for less marked over more marked structures, even in users of a language where both are prohibited? In English, both plateaus and falling sonority clusters are prohibited in the onset. And so English learning children will rarely, if ever, have been exposed to words with onsets like the artificial stimuli ‘bdif’ (plateau) and ‘lbif’ (falling). So, if they treat these sounds differently, that is evidence for some kind of innate knowledge about this hierarchy. She tested for this in a series of elegant experiments.

Her hypothesis was that English speakers would tend to misperceive stimuli composed of marked consonant clusters in ways that would make them permissible in their native language, specifically, that they would perceive a non-existent (‘epenthetic’) vowel between the illicitly arranged consonants. That is they would hear ‘bdif’ as ‘bedif’. Further, that this effect would be more extreme for more marked stimuli. In other words, it would occur more reliably for the falling ‘lbif’ than for the plateauing ‘bdif’. The most straightforward test of this hypothesis involved simply asking subjects whether the stimuli they heard contained one or two syllables. Berent’s hypothesis well predicted the results. English subjects correctly identified monosyllabic stimuli with plateauing onset clusters more reliably than they did monosyllabic stimuli with falling sonority onsets. Further, this pattern was reversed for the genuinely disyllabic stimuli. It is easier for English speakers to identify disyllabic stimuli with falling sonority monosyllabic counterparts (e.g., ‘lebif’) than for those with plateauing sonority monosyllabic counterparts (e.g., ‘bedif’). Again, the hypothesis under investigation explains this: ‘bdif’ is less marked than ‘lbif’, and thus more plausible to the English speaking subject, and so they are more likely to mishear ‘bedif’ as ‘bdif’ than ‘lebif’ as ‘lbif’.

These preferences were exhibited despite the fact that all the monosyllabic stimuli are prohibited in English. To paraphrase the title of Berent et al. (2007), English speakers display knowledge about sounds they have never heard. This provides a particularly compelling PoS argument: English speakers have knowledge about which onsets are more marked than others, despite having no experience with any of the relevant stimuli. Berent and colleagues (2007) argue that this shows that knowledge of the sonority hierarchy must therefore be part of the package of information children bring to the task of language acquisition. Further support for these claims comes from Gómez and colleagues (2014) who, using Near Infrared Spectroscopy, found that neonate brains responded differently to marked vs. unmarked linguistic forms.

4.4 Comparing and Contrasting Nativist Positions about Phonology

While Hale and Reiss’s and Berent’s arguments align in many ways, most notably on the discrete and symbolic nature of human phonological competence, their work also strongly diverges with respect to both what is innate and how to argue for this. As we saw, Hale and Reiss are centrally concerned to argue that the categories appealed to in a phonological theory must be available to the child in order for them to identify their environmental stimuli as displaying these properties. This argument thus has a somewhat a priori flavor to it: it involves asking what must be the case for a child to learn a sound system at all? Berent, on the other hand, does not view knowledge of the sonority hierarchy as a requirement for learning a language. Russian and English speakers do have, in some sense, environmental evidence for the possible consonant clusters in their respective languages. She argues instead that innate knowledge is required to explain what speakers know about sounds which are not licit in their native languages.

Further, the appeal to markedness, the idea that some sounds are more natural in a language than others even when the latter are possible, is an idea Hale and Reiss have rejected in a number of places. As Reiss (2018) stresses, such constraints seem unhelpful to the learner: either their language does contain the marked expressions/constructions, in which case whatever aversion markedness creates in the learner will at best hinder them, or it does not, in which case these preferences are inert, untriggered by the phenomena they are supposed to downgrade. Nevertheless, Berent’s experimental work seems to show that they are present, and thus must be explained somehow.

These contrasting positions thus provide a counter-example to some empiricist’s claims to the effect that nativism is empirically or explanatorily empty (see, e.g., Churchland [2012: 15]: “since one has no idea how to explain the origin of our concepts, one simply pronounces them innate, and credits either a prior life, almighty God, or fifty million years of biological evolution for the actual lexicon of concepts we find in ourselves”). Hale and Reiss and Berent each argue, through appeal to PoS arguments, that children come equipped to the task of learning the sound systems of their native language with substantial phonological knowledge and assumptions. But their proposals about the nature of this knowledge, such as whether they incorporate a sonority hierarchy, differ, in empirically investigable ways. Nativist approaches to phonology are thus empirically driven and empirically testable, with competing nativist theories providing explanations of different kinds of phenomena.

4.5 Innateness in Morphology

Morphology presents an interesting test case for debates about language acquisition. Some of the parade cases of poverty of stimulus arguments (e.g., Chomsky 1957 on affix-hopping) and debates between nativists and empiricists (e.g., Pinker & Prince 1988 vs Rumelhart & McClelland 1986 on the English past tense) have centered on the acquisition of morphology. However, these debates, while highly relevant to the nativist/empiricist debate, have often focused on other issues, such as the format of linguistic knowledge.

Pinker and Prince, for example, provided a detailed and influential critique of connectionist approaches to language learning, specifically arguing that these systems lack the discrete and highly articulated structures needed to account for the complex interactions between various linguistic systems (syntax, phonology, morphology) needed to capture human knowledge of inflectional morphology. This seemed at the time like a powerful blow against empiricism, as these connectionist approaches were the foremost alternative to the nativist, generativist framework. However, the debate was not strictly about innateness, but rather the format of the rule systems of developed competence (see Seidenberg & Plaut 2014 for discussion). Even if these early connectionist systems cannot acquire the productive rules of morphology, it doesn’t follow that such rules must be innate.

More generally, the nature of morphological systems, including the massive variety observed cross-linguistically in how central morphological processes are to language, seems to require extraction from the environment. And indeed there is much evidence for the ‘data-driven’ nature of morphology acquisition (Ambridge & Lieven 2015), including the tight correlation between input frequency and age of acquisition.

This makes morphology a less common place to find explicit POS arguments of the type we have been describing for other areas. The purported innate contributions are much more general in this area: the functional architecture (e.g., rules vs connections), and the inferential systems (e.g., Yang 2016’s Tolerance Principle) have been suggested to be innate, but there seems to be less reason to view specific, substantive features about the kinds of words and word-constructing operations made use of in the world’s languages as innately determined.

5. Nativism in Semantics

Semantic theory is an area in which there has been significantly less discussion of the division between innate contributions and environmental information in the development of linguistic competence. This is surprising because, intuitively at least, meaning is the aspect of language least detectable in the stimulus. Putting aside the worries from the earlier sections, it is at least clear how a child might learn from their environment what sounds are available in their language, or in what orders words can or cannot occur. But it is not even clear what property of the stimulus the semantics-learning child should be attending to.

As with other branches of linguistics, semantics can be divided up into a number of different domains. Lexical semantics studies the meanings of individual words (or, perhaps better, morphemes), whereas compositional semantics studies the ways that the meanings of complex expressions are generated out of their simpler components. For each, we can ask the question of how the child comes to acquire their semantic knowledge.

5.1 Lexical Semantics: Biases in Lexical Acquisition

For the case of lexical semantics, there are two further sub-questions: how does the child acquire the meaning of an expression, and how does the child come to associate this meaning with a particular form? Given the widespread assumption, within linguistics at least, that lexical meanings are, or are very closely related to, concepts, issues about lexical acquisition and concept acquisition can blur together, especially if neo-Whorfianism, according to which acquisition of novel words enables the acquisition of novel concepts, is true (cf. Carey 2009 on the role of numerical language in the acquisition of numerical concepts). I won’t focus on these issues here, although see Laurence and Margolis (2005) for a detailed discussion. I will assume that we have some story, nativist or empiricist, about how children acquire the concepts that their words express, and will instead discuss work concerning how children learn to associate these with expressible forms.

Lexical acquisition is simultaneously one of the areas in which nativism seems most and least plausible. On the one hand, children raised in Leeds are likely to use the word ‘cat’ when they want to talk about cats, whereas children raised in Ljubljana are more likely to use the word ‘mačka’. This can only be a product of their experience. On the other, acquiring a lexicon of tens of thousands of words, in a complex environment of highly ambiguous cues, with only the cognitive resources of a developing mind, can seem to call for substantial innate guidance. These two broad facts, that which specific words we acquire must be sensitive to our learning environments, and that children reliably achieve a remarkable cognitive feat under seemingly adverse conditions, have shaped the nativist approaches to lexical acquisition, which aim to identify the processes with which specific words are acquired and the constraints under which these processes operate. Once these have been identified, we can ask whether these processes and constraints are themselves innate or learned, and, if innate, whether they are language specific.

The first thing the child needs to do, in order to learn a new word, is identify those elements of the speech stream which correspond to words. This is a far from trivial task. For children to learn that cats are called ‘cats’, on the basis of evidence like their parent saying “Look at the cats!”, they must distinguish the words in this utterance from sub-, and super-, lexical items, such as /c-/, or ‘the cats’. As noted above, these boundaries are not generally available in the stimulus, considered physically/acoustically, and so must be imposed by the child. Pioneering work on statistical learning (e.g., Jusczyk 1999; Saffran, Aslin, & Newport 1996) has identified several sources of information a child could use in this process, which I will discuss in Section 6.

Having identified the relevant sounds and words, the child now needs to determine what meaning to associate with them. As I’ve stressed repeatedly, identifying how children learn something requires first determining what it is that they have learned. In the case of lexical semantics, this is made vastly more difficult by the fact that there is no real consensus on what the meaning of a word is (cf. Gasparri & Marconi 2015 [2024]). Drawing on the mainstream tradition in analytic philosophy of language, much work on the acquisition of a lexicon has assumed that at least one component of lexical meaning is extension: the set of things an expression applies to. One helpful feature of such an extensionalist approach is that at least for some lexical items, it does seem like a crucial step in learning their meaning is coming to associate them with something (an individual, kind, property, or event) in the learning environment. This is perhaps the paradigm case of lexical acquisition: we hear a sound while our attention is drawn towards an object, and we come to know that this sound is used to talk about objects like this. Even such a simple case (one which abstracts away from many difficulties such as the acquisition of words which lack referents (‘genie’, ‘Atlantis’…), or which refer to things not present (‘thirty’, ‘apatosaurus’), or things it is hard to attend to (‘air’, ‘economy’...) etc. poses deep worries, which nativist developmental linguists have argued must be overcome with innate biases.

As Quine (1960) points out, hearing a particular, novel word while in the presence of a particular object falls far short of determining what this word means. When the child first hears ‘cat’, while looking at a particular cat, Obi, how does the child learn that this word applies to cats, as opposed just to Obi (an individual), or to some property of Obi’s (say, blackness or being smaller than a breadbox), or to the activity Obi is currently engaging in (say, balancing on a stool), or to some other set of which Obi is a member (cats older than 4 years old, or mammals, or whatever)? Obviously, the list of alternative interpretations can be extended indefinitely. Despite this radical underdetermination, children very reliably do correctly assign meanings to their expressions. To the nativist linguist, this suggests strong innate constraints, biasing against arbitrarily many empirically possible meanings.

Many theorists have thought that a crucial question for deciding between nativist and empiricist learning methods is: how fast is lexical acquisition? The thought is that empiricist learning strategies, at least in the general case, are rational, and involve the child looking to support their linguistic hypotheses/guesses with the best available environmental evidence (although see Trueswell, Medina, Hafri, & Gleitman 2013 for evidence that children seem to disregard relevant environmental evidence). Evidence acquisition, and hypothesis confirmation, are typically gradual processes. Consider a language learning child, confronted with a particular sound, with the aim of associating this sound with its correct referent, which in paradigmatic cases will be present in the environment. As Quine and others noted, the mere correlation between this sound and this referent is at best very marginal confirmation of any specific hypothesis, given the wide range of such hypotheses compatible with it. Given this, if children are very quick to acquire lexical items, e.g., learning the meaning of a word on just a few encounters (or even just one), that could favor the nativist. This is exactly what Carey and Bartlett (1978) found, in foundational work on ‘fast mapping’. This is the process in which children acquire a novel lexical item in response to a single encounter (see also Halberda 2003 for experiments concerning just when this ability becomes available). For the nativist, that children can learn the meanings of novel words in response to such low-quality evidence is strong evidence to think that they are relying on more than the environment provides, i.e., that they have innate biases for positing some kinds of lexical item over others.

Work by Ellen Markman (summarized in Markman 1989) aimed to uncover these biases. Markman’s core strategy is to introduce young children to a novel word in the presence of a stimulus which is ambiguous in just the way noted above: it has several distinctive properties, each of which could serve to determine the extension of the novel term. The child is then confronted with disambiguating stimuli, which would be referents of the novel term on exactly one of the possible meaning assignments. The child’s reaction to these novel stimuli will indicate which meaning they associated with the novel word. Patterns of generalization here can cue us in to the assumptions children bring to the task of lexical acquisition.

Two of the most basic such biases Markman and colleagues proposed are the whole-object bias and the taxonomic bias. When a child is introduced to a novel label, they will, as a default, associate this label with the salient entity or object in the environment, not to a property, part, or activity of this object. While this is highly intuitive, it is important to keep in mind that this is a contingent fact about human learners. All learners require some constraints in order to home in on the meanings of their words, but they need not be biased towards objects. We could imagine a learner who was disposed to view novel words as applying to salient properties in their environment. But humans appear not to work that way, which may be an important component of the innate toolkit they bring to the task of language acquisition. The taxonomic bias combines with the whole-object bias, leading children to prefer novel expressions to apply to categories of object, rather than to individuals or to collections of objects standing in other, non-categorical relations. Drawing on work with Jean Hutchinson (1984), Markman points out that this preference actually conflicts with other biases commonly observed in young children. When children group together some items, independent of language, they often group them thematically, e.g., grouping a cow with, say, milk, rather than categorically, with other cows or animals. However, children do not interpret novel words in this way, with meanings like associated-with-cows (I will use small caps to denote concepts/word meanings). Together, these two biases mean that children’s early words tend to have meanings typical of common nouns, classifying entities by their kinds (‘cow’, ‘cat’, ‘car’, ‘cabbage’), rather than words for referring to individuals (‘Carl’, ‘Catherine’), properties (‘clean’, ‘cold’), events (‘cry’, ‘carry’) and so on.

Obviously, the defaults provided by these biases can’t be too strong, or children would never learn the many expressions which don’t work in this way. For this reason, Markman proposed a third bias in lexical acquisition: the mutual exclusivity assumption (ME). This bias operates as a constraint on the previous two biases, and motivates children to assume that each object is categorized by just one label/expression. The general idea is that the earliest stages of lexical acquisition involve classifying objects into kinds; however, having begun this task, the child is likely to encounter novel terms applied to already classified objects. Mutual Exclusivity biases the child against either (i) viewing novel labels as synonyms for previously acquired terms, or (ii) viewing novel labels as identifying non-equivalent categories (e.g., as cross-cutting categories, or as hyponyms and hypernyms of already learned expressions). Eve Clark’s (1987) Principle of Contrast is similar to, but weaker than, ME, as it requires only the first of these.

The combination of ME with whole-object and taxonomic biases then accounts for many of the cases of lexical acquisition originally ruled out, but does so in a diachronic way, avoiding some of the worries raised by Quine’s induction problem. The first time a novel word is encountered as applied to a novel object, the child ignores all the potential meanings which are not object categories, including terms referring to parts, properties, individuals, etc. (Of course, this does not rule out all possible alternative meanings. Some story is needed to explain why cat is a plausible meaning, whereas physical object or cat-over-5-years-old is not, but this is presumably provided by the cognitive psychology of concepts, not developmental linguistics.) However, when the same object is encountered with a different label, these previous constraints are lifted, so as to avoid conflict with ME. At this point, then, the child can begin to hypothesize that this term applies to something other than the kind of entity it is attending to, such as a salient property, part, activity, or whatever.

The most compelling evidence for something like ME comes from disjunctive-syllogism-like reasoning in language learning children (and indeed adults) (Markman 1990; Au & Glusman 1990). If children are motivated by ME, then they should reject a novel word as applying to an object they already have a category name for. In a series of experiments, this effect was demonstrated. A classical example works like this: children are shown several stimuli, some of which are familiar and for which they have demonstrated mastery of the common category name (e.g., ‘dog’, ‘cat’), others of which are unfamiliar and their labels unknown (e.g., ‘lemur’). The experimenter then uses the unfamiliar label to request an action (e.g., “Can you bring me the lemur?”). Using ME, the children can infer that this label does not apply to the familiar object (the dog can’t be called a ‘lemur’, because it is called a ‘dog’, and can’t have multiple category labels), and thus that it does apply to the novel object. Without ME, it is argued, the child could not rule out that ‘lemur’ applies to dogs, and thus would have no way to determine the referent of the novel expression.

Before turning to more advanced heuristics, which leverage morphosyntactic knowledge to determine whether the non-category novel expression applies to a property, event, or part, it is worth reminding ourselves that two claims must be true for the above constraints on word learning to guarantee linguistic nativism. These constraints must be both innate and language-specific. There are strong reasons to think the former, given the early age at which they (or, perhaps, similar but distinct constraints which can play the same role of guiding learners to the right lexical meanings) are active. However, there are reasons to be skeptical of the latter, as indeed (Markman 1992) is. Preferences for categorization in terms of object kinds, rather than properties, may reflect deeper features of human cognition than language, and thus the taxonomic bias may not be language-specific. Likewise, defaults for categorization in terms of whole-objects, rather than parts, may reflect the workings of the sensory systems. Finally, the disjunction-elimination capacities, taken to be distinctive of ME biases, have been demonstrated in language-trained non-human animals, such as the border-collie Rico (Kaminski, Call, & Fischer 2004). This strongly suggests that ME, while useful in language acquisition, is not a dedicated bias for lexical acquisition, but rather reflects more general principles of learning.

5.2 Syntactic Bootstrapping

One of the most sophisticated approaches to lexical acquisition is the Syntactic Bootstrapping proposal, developed by Lila Gleitman and numerous collaborators. This program explicitly takes as its framing problem Quine’s worries about the underdetermination of the referent/meaning of an expression by the learning environment. The core idea (see, e.g., Gleitman 1990 for an overview) is that children come equipped with knowledge of how meaning and form relate.

Consider, for example, a child confronted with a scene in which two people are passing a ball to one another. A novel verb used, say by a caretaker, to describe this scene could mean many different things, even without recherché cases like Quine’s ‘Gavagai’ (from Quine 1960, which points out that the evidence we have that a word means rabbit is, in general compatible with bizarre alternative interpretations such as undetached rabbit parts). It could for example refer to the mental state of one of the individuals (“the baby wants to have the ball”), to the motion of the ball itself (“the ball rolled”), or to the transfer of possession (“the mother gave the ball to the baby”). Note, however, that each of these events is described using a different syntactic structure: mental verbs naturally take clausal complements, verbs of motion naturally take subject arguments without objects, and verbs of change of possession require donor, recipient, and theme arguments. Gleitman’s insight is that the child can leverage these correlations between meaning and structure to isolate the correct interpretation of novel words, despite the inherent ambiguity in the word-world correlations.

This proposal is of relevance to the present discussion because this knowledge about the interaction of semantics and syntax is hypothesized to be innate, a constraint on the lexical acquisition process. Lidz et al. (2004) explicitly appeal to the Projection Principle and Theta Criterion of Government and Binding Theory (Chomsky 1981), which say that (i) syntactic structure is projected on the basis of information stored in the lexicon, and (ii) that there is a one-to-one mapping between semantic thematic roles (roughly: participants in an event) and syntactic argument positions such as Subject or Object. The child can use these principles to infer the lexical entry for novel expressions by reasoning thusly: this verb has been used with three syntactic arguments; therefore, it must assign three theta-roles, plausibly agent, patient, and theme; this di-transitive semantics must be projected from its lexical entry. Thus, the child can determine that the verb might mean ‘throw’ or ‘pass’, but not ‘want’ or ‘travel’.

As always, positing a learning mechanism, on its own, is not sufficient for establishing linguistic nativism. What is needed is to show that this mechanism (a) is not itself learned, and (b) is language-specific. On the former, there are significant lines of converging evidence. For one thing, these correlations are strongly cross-linguistically robust (e.g. Lee & Naigles 2005 document correspondences between verb meaning and syntactic distribution in Mandarin). For another, very young children (22–24 months) have been shown to use syntactic framing of this sort to disambiguate ambiguous scenes. Naigles (1990) used a looking-preference paradigm, in which children saw a scene in which two agents (a philosophically appropriate duck and a rabbit) simultaneously performed two distinctive actions: the rabbit pushed down on the duck, forcing it to squat, while both the duck and the rabbit wheeled their arm around. As they viewed this scene, they heard a novel verb in either a causative construction (e.g., “The rabbit is gorping the duck”), where ‘gorp’ is being used to indicate that one agent is bringing about a change or effect in another, or an intransitive construction (“The rabbit and the duck are gorping”). When then shown disambiguated scenes, in which only one of these two actions was occurring, and asked to “find gorping”, the children’s responses were predicted by the syntactic frame in which the verb had been introduced. In other words, children who encountered ‘gorp’ in causative constructions looked at the scene in which the rabbit was forcing the duck to squat, whereas those who heard intransitive uses looked at the scene in which the rabbit and the duck wheeled their arms, without either manipulating the other’s behavior.

Further evidence for innate syntax-semantics correspondences comes from children’s use of syntactic frames as constraints on meaning despite a lack of evidence, or even conflicting evidence, from their environment. Lidz et al. (2003) experimentally tested 3-year-old children learning Kannada, a Dravidian language spoken in Southwestern India, on their interpretation of causativity. They wanted to test whether children interpret transitive sentences as causative (i.e., as in English “Dorothy melted Elpheba” meaning Dorothy caused Elpheba to melt) on the basis of innate syntax-semantics mappings, or instead due to correlations present in the linguistic environment. Kannada provides an ideal test case, as, like any language, it allows simple transitives to have causal readings, but it also has a morphological particle which is used to indicate causativity. Importantly, while the correlation between transitivity and causativity is informationally useful, it is imperfect: some transitives (e.g., Kannada equivalents to “Dorothy saw Elpheba”) cannot be construed as causative. The causative particle, on the other hand, is an invariable guide to causativity: whenever it is present, the causative is the correct meaning. So, they reasoned, if there is evidence that children rely on transitivity in interpreting causativity, and disregard the informationally more probative morphosyntactic evidence, this is suggestive of an innate bias for this specific syntactic-semantic correspondence, and against the view that the child is simply reproducing environmental regularities in their language acquisition. And, in an act-out experiment, this is exactly what they found. When asked to act out a described scene, using toys, children’s demonstrations were predicted entirely on the basis of the number of arguments in the prompt, and apparently independently of the presence or absence of the causative morpheme.

Further support for this thesis comes from Feldman, Goldin-Meadow, and Gleitman (1978), who argued that patterns of inclusion and omission in the homesign utterances of deaf children is best explained by appealing to syntax-semantics correspondence rules mirroring those acquired by hearing children, despite the differences in their learning environment.

One final source of evidence comes from apparent differences between expressions which are ungrammatical in a given language. In addition to whatever innate constraints there may be on syntax-semantics relations, it is universally accepted that learning a language requires identifying particular, and often sui generis, selectional constraints for specific lexical items (e.g., that ‘devour’, but not ‘eat’, requires a direct object, or that ‘want’, but not ‘hope’ requires an infinitive complement). As these vary between languages, they must be learned. However, not all excluded constructions are alike. In several circumstances, some unavailable combinations of expressions seem to be better than others, and the hypothesized universal syntax-semantics mappings seem well suited for predicting these degrees of unacceptability. Language learning children often make mistakes. But these mistakes are not random. For example, children are much more likely to use expressions which are not causative in a given language to express a causative structure (many such examples are documented in Bowerman 1982, such as the excellent “you can push her mouth open and drink her”, said of a doll, meaning you can cause her to drink!). As the transitive-causative structure is an option available in the language, this error is highly natural on this view. Other cases, which would violate the innately preferred structures (e.g., “you can know her happy” meaning your knowledge causes her to be happy) are much less frequent, if not unattested. The same patterns can be observed in adults, both in spontaneous generation of ‘coerced’ structures and in solicited judgements of the relative acceptability of different kinds of construction (Lidz et al. 2004).

Overall, the hypothesis that children come equipped to the language learning with substantial knowledge of which grammatical structures are suitable for expressing which kinds of meanings suggests a solution to a deep puzzle about the very possibility of lexical acquisition, and accounts for a wide range of data about child and adult language use. This makes it a parade case in the discussion of linguistic nativism.

5.3 Compositional Semantics

Beyond lexical semantics, children must somehow learn how meaningful linguistic units are combined in order to generate complex meanings. And as with all developed linguistic competence, we can ask: did the child learn the rules for semantically composing linguistic expressions from their environment, or are they innate? Del Pinal (2015) provides an important argument that compositionality, the thesis that the meanings of complex expressions are determined entirely by the meanings of their simple constituents and the structures in which these constituents are arranged, is an innately determined constraint on the human semantic system.

Compositionality provides a particularly strong constraint on the interpretation of complex expressions: nothing in the meaning of the complex which was not in the meaning of the simple constituents. Del Pinal argues that this fact generates predictions about possible developmental trajectories we might observe in language learning children. Specifically, children operating without this constraint might ‘try out’ semantic hypotheses for the interpretation of complex expressions which are not consistent with compositionality. For example, some linguists and philosophers of language have argued that claims about the weather (e.g., “It is raining”) typically incorporate reference to a specific location, typically the location of the utterance, as a component of their literal truth-conditions, despite there being no locative expression present in the linguistic form of the utterance, in violation of the principle of compositionality. Del Pinal points out that if this form of interpretation is a licit one, we might expect children to hypothesize other constructions to work in this way. Why, he asks, don’t we find children who interpret “he is happy” to mean “he is happy here”, at least until they are corrected by their parents (witness the similarity of this style of argument to discussion of the “subset principle” in developmental linguistics; cf. §2)? His answer is: they can’t! Compositional interpretation is part of the innate architecture of the human linguistic system, and thus children do not need to learn that expressions can’t be interpreted in these compositionality-violating ways, which explains why they never make these kinds of mistakes.

6. Empiricist Responses

As I’ve set up the argument, there are four ways an empiricist could respond. The first is to deny that children in fact learn the purported target linguistic knowledge (denoted ‘T’ in the schematic statement of the PoS argument in §2) . This will involve first-order linguistic theorizing, providing alternative accounts of the linguistic capacities underlying the relevant linguistic data. PoS arguments presuppose first-order analyses, which involve identifying some relevant linguistic data (e.g., some pattern of linguistic judgements) and arguing that the best way to explain such data is by appeal to T. If it can be shown that there are alternative, even better, hypotheses about the minds of language users which can explain these data as well, then PoS arguments appealing to T will fail. I will turn to these kinds of response in the next section.

The second and third kinds of response involve, in one way or another, the inference from premises 2 and 3 to 4 in the above argument. That is, assuming that T is indeed a feature of developed competence, an empiricist must show that knowledge of T can be explained without appeal to innate knowledge. This can be done either by showing that (i) the Primary Linguistic Data (‘PLD’) is more informative, or that (ii) domain-general learning mechanisms are better able to extract the relevant linguistic information from the PLD than nativists have assumed.

The final kind of response accepts the argument up to this point, accepting that language acquisition is guided by specific innate knowledge or learning mechanisms, but rejects premise 5, thus adopting a non-linguistic form of nativism. As we shall see, in practice responses to nativism often take many, if not all, of these options simultaneously. But it is helpful to conceptually distinguish between them, in order to make as explicit as possible where PoS arguments are alleged to fail.

6.1 Enhancing the Primary Learning Data

Pullum and Scholz (2002) discuss option number two. Their article has two aims, firstly to motivate nativists to take seriously the empirical burden of establishing that the stimulus is indeed as impoverished as they claim, and secondly to show that in several cases this impoverishment has been overstated. They look at four instances of PoS arguments in the literature, and for each claim to find evidence in various corpora (sometimes child-directed, sometimes not) that could provide children with the evidence needed to genuinely learn the relevant linguistic facts. For example, in discussing the case of subject-auxiliary inversion, they searched a child-directed speech database (produced by Patrick Suppes, and reported in MacWhinney 1995) for wh-questions with main and embedded auxiliaries, and find utterances like “Where’s the other dolly that was in here?”, in which the main auxiliary ‘is’ is raised over the embedded auxiliary ‘was’. They claim evidence of this sort could be used by the child to learn that question-formation is structure-, rather than linear order-, sensitive, thus undermining claims about the necessity of innate knowledge to this effect. Of course, they cover only a handful of proposed PoS arguments, and so defenders of nativism can always retreat to the myriad other such cases that have been made. But if their arguments are correct, this would go some way to building an inductive case against PoS arguments more generally. But are they?

Lasnik and Lidz (2016) argue that such evidence is “largely beside the point” (2016: 238). Their worry, building on Lasnik and Uriagereka (2002) and Freidin (1991), is that positive evidence that a main auxiliary can be raised in a wh-question does not suffice for the child to acquire the knowledge they in fact do acquire. The reason is familiar: there are (indefinitely) many possible rules compatible with this evidence, but which the child does not entertain. For example, these data are compatible with the rule: move some/any auxiliary. How, then, does the child know that not only can they raise the main auxiliary, but that they must do so? The impossibility of raising an embedded auxiliary cannot be extracted from positive evidence alone, and thus again should be viewed as innately guided. Note, however, that this amounts to endorsing a different PoS argument than the one originally proposed. There is evidence available, at least to the child spoken to in this corpus, that “move the first auxiliary” is not a rule governing their local dialect. And one could well view some general learning process like pre-emption (see §6.2 and §7.1) guiding the child to use utterances like this to preclude the more general rules (“move any auxiliary”) compatible with this richer PLD.

Legate and Yang (2002) are more sympathetic to Pullum and Scholz’ first aim, and agree that developmental linguists would do well to use computational tools to evaluate just how rich the stimulus really is. However, they find issue with their second goal. All parties agree that children may, on occasion, be exposed to data which could serve to cue them into the relevant linguistic knowledge. The question is whether all children who learn their language are likely to be so exposed. Legate and Yang argue that the very data Pullum and Scholz appeal to undermines this claim. In a fuller search of the database, they claim that the relevant sentences comprise somewhere in the region of 0.05% of the corpus. This, they claim, makes it quite possible that a child could indeed go through the relevant portion of their linguistic development without encountering such sentences (see also Collins 2003). Further, given the rarity of these sentences, it is likely that a learning system sensitive enough to data to generalize from these examples would be too sensitive, and would generalize also from the various forms of noise and error in their environment, precluding successful learning. For these reasons, they conclude that the empirical premise of PoS arguments remains justified.

6.2 Enhancing the Learning System

The third option involves identifying empiricist-friendly learning mechanisms which are capable of extracting the relevant categories and generalizations from the data. The pioneering work by Saffran, Aslin, and Newport (1996) and Jusczyk (1999) provides a paradigm here, which birthed a substantial literature on the role of statistical inference in language acquisition. In contrast to the quote from Lasnik above (§2), this research suggested that the category ‘word’, and the specific instances of words in the ambient language, could be generated without prior, innate, language-specific expectations, by identifying dependencies and correlations in the stream of sounds available to the child. This work aims to identify reliable signals in the primary linguistic data which can be used by the child to identify when one word ends and another begins in the more-or-less continuous auditory stream. Jusczyk focused on the role of prosody/stress in this task. The speech stream is not merely a string of syllables, but is organized by variable levels of intensity. It is highly reliable that when there are two high-intensity peaks in the sound signal, this corresponds to there being two distinct words, and each polysyllabic content word will feature a single peak (although monosyllabic words and functional words may be unstressed, problematizing the inference in the other direction). Infants have shown sensitivity to these patterns, and thus could use them to extract word boundaries from their linguistic experience. That this is so even though languages vary in how they assign stress to polysyllabic words (e.g., Czech stress is always on the first syllable, Polish always places stress on the penultimate syllable, Turkish on the final syllable, with English exhibiting a more complex and variable assignment) suggests that the child must be extracting these patterns from their environment.

In addition to stress patterns, a number of distinct cues have been identified which could enable a sophisticated learner to identify word boundaries. Jusczyk appeals also to allophonic cues, involving the predictable variation in sound forms within words (e.g., that /t/ is aspirated, i.e., spoken while expelling a blast of air from the lungs, when word-initial in English, but not when word-internally) and phonotactic cues, concerning where in a word/syllable various sounds can or cannot be found (e.g., that /sb/ cannot in English occur word initially, thus ensuring that there is a word boundary in the string ‘those boys’). Saffran and colleagues identified the possibility of children using conditional probabilities between syllables as powerful cues for word segmentation. They trained 8-month-old infants on a speech signal consisting of a series of trisyllabic nonsense words, but containing no prosodic/acoustic information about where words ended or began. The crucial feature of this sample was that transition probabilities between syllables were very high within a word, but much lower between words. This corresponds to facts about natural language, wherein there are many fewer possible syllabic continuations within a word than there are between words. As the “words” in the stimulus were entirely novel, these probabilities were all the children had to go on. Despite this, when confronted either with words which had been encountered in this training data and formally similar words which had not, but which were composed of the same set of syllables, the infant subjects displayed a strong preference for the novel words, indicating that they had habituated to the words in the sample on the basis of this probabilistic information. Taken together, it seems that there is a lot of information that a sophisticated learner could identify in the process of word segmentation (although see Yang 2004 for worries about whether this information is robust enough in naturalistic learning data to suffice, without innate biases). How far this learning strategy can be extended to ‘more abstract’ levels of language acquisition, such as syntax, remains an open area of investigation.

Another broad class of models of language acquisition which have acquired sustained interest in recent years use Bayesian statistical inference (e.g. Perfors 2008, 2012; Chater, Clark, Goldsmith, & Perfors 2015; Pearl & Lidz 2013). While relying on substantial innate machinery (specifically, large and complex hypothesis spaces and inferential mechanisms), it is unclear whether these are language-specific, rather than general features of the human mind (cf. A. Clark 2013, 2016). Such models assume that language acquisition is the process of identifying the hypotheses about the local language which maximizes the trade-off between simplicity (simpler hypotheses, often defined in terms of minimum description length, are assigned higher priors) and coverage-of-data (defined in terms of how likely the encountered data would be, if a given hypothesis were true).

One feature of these models, particularly significant for our concerns, is that they allow, in principle at least, a solution to some forms of problems of negative evidence. Bayesian models are well-suited for inferring from absence of evidence of some construction to hypotheses which exclude this construction from the language. A trained Bayesian system, in its search for the (simplest) hypothesis which would predict the data, must account for two things: how likely is the encountered (i.e., positive) evidence, given the hypothesis under investigation, and how unlikely is the unencountered evidence? That is, if a given grammar predicts that a specific expression would be found, but it is not, this counts against the overall evaluation of this grammar. This fact has been leveraged (e.g., by Perfors, Tenenbaum, & Wonnacott 2010) to show how a system without innate biases can learn the set of constructions in which specific verbs can, and crucially cannot, occur. They look at dative alternations, roughly synonymous pairs of sentences which differ just regarding how the two non-subject arguments of the verbs are arranged, either with a preposition (“Manish passed the suitcase to Alonzo”), or without in “double-object” constructions (“Manish passed Alonzo the suitcase”). While this alternation is possible for some verbs, others are well-formed only with one or the other form (e.g., “Manish donated the money to the museum” vs. *“Manish donated the museum the money”). These examples pose worries for the learner, as the former kind of case seems to provide evidence for the generality of the alternation, yet children seem to know when not to generalize in this way, leading some (e.g., Pinker 1989) to argue for nativist solutions. Perfors and colleagues show that it is possible to leverage the fact that if this alternation were possible for a given verb, a child might expect to encounter the unavailable forms, and thus infer from the absence of these constructions in their experience that such forms are indeed impossible.

As noted above, the strongest case against PoS arguments for linguistic nativism will involve all of these strategies. This is seen in extensively developed empiricist approaches to language learning, such as Chater et al. (2015), which throughout appeals to the ways that Bayesian models can leverage simplicity and coverage of data, alongside frequent skepticism of the specific claims nativists have made about the form of developed linguistic competence, in the development of computational models which suggest promising capacities for learning subtle facts about language without strong innate, language-specific, constraints.

7. Empiricist Alternatives

Beyond replying to PoS arguments, the most compelling way to reject linguistic nativism is to develop detailed theories of language acquisition which rely only on innate structures which are used in non-linguistic tasks. This section will point to some of the most promising such approaches. As in the previous section, these alternatives need not be viewed as competing, and much contemporary work involves combining them.

7.1 Bayesian Language Acquisition

As noted above, Bayesian approaches have become common within developmental linguistics, in line with a more general trend across the cognitive sciences. Bayesian methods, in themselves, are not specifically empiricist, as they are compatible with innate constraints on the hypothesis space, priors, and learning biases. However, the power of Bayesian methods in extracting information from data, and generalizing in powerful and natural ways has promoted in some researchers skepticism about the need for language-specific innate constraints.

One influential study (Xu & Tenenbaum 2007) applied Bayesian inference to the problem of lexical acquisition described earlier: how does the child learn which worldly entities a given word applies to? Xu and Tenenbaum develop a Bayesian model which assigns a likelihood to a range of hypotheses about the extension of a novel term, on the basis of small amounts of labelled data. A crucial feature of this model is that it assumes that the labelled data are drawn randomly from the full possible set of object-label pairs. This assumption can be leveraged to discriminate between hypotheses which are equally compatible with the data; one version of the problem posed by Quine. For example, if a child encounters a novel term, e.g ‘fep’ attached to an image of a dalmatian, the hypothesis that ‘fep’ means dog and the hypothesis that it means dalmatian are each consistent with the evidence. This consistency, of course, is retained over multiple such term-referent pairings. However, if ‘fep’ meant dog, it would be quite unlikely that a randomly selected subset of the feps were all dalmatians. Much more likely, on these Bayesian assumptions, is that ‘fep’ means dalmatian, as this would explain this otherwise puzzling coincidence. Probabilistic reasoning, with Bayesianism as an exemplar, can thus leverage information in the stimulus far more successfully than mere deductive consistency. Xu and Tenenbaum use phenomena of this sort to argue that some constraints on acquisition, centrally the basic category assumption (children’s preference to view novel terms as naming so-called “basic categories” such as dog or car, rather than subordinate (labrador, buick), or superordinate (mammal, artifact) categories) are plausibly learned, rather than innate. This proposal is strengthened by their results that adult lexical learners seem more influenced by the basic category assumption than children. (See also Sim, Yuan, & Xu 2011 for similar arguments concerning the shape bias.) Xu and Tenenbaum further note that strong constraints on the hypothesis space (e.g., that it is hierarchically structured, i.e., the child assumes that lexical items stand in strong containment relations: subordinate categories like dalmatian entail basic level categories like dog which entail superordinate categories like animal) are needed to capture the ways in which language learners generalize. But they are non-committal on whether these are language-specific or stem from more general facts about human cognition (although see, e.g., Pearl & Lidz 2009, who argue that language-specific constraints must be applied to the hypothesis space over which Bayesian learning occurs).

7.2 Language as Shaped by the Human Mind

One line of research which, while also not formally inconsistent with nativist approaches, has tended to be promoted with broadly empiricist assumptions, centers on the idea that natural languages themselves have been shaped by the pressures of human communication and (non-language-specific) cognition. This idea is common to a wide range of work, including Christiansen and Chater (2016), Deacon (1997), the “Edinburgh school” (e.g., Kirby, Tamariz, Cornish, & Smith 2015), and the CLMBR research group (e.g., Steinert-Threlkeld 2020). This shifts the focus from the traditional generativist idea that language-specific components of the human mind ensure that only some languages are learnable to the development of the language itself, and the ways that languages dynamically respond to the developmental and communicative needs of populations and generations of language-users. As languages are passed on from one speaker to the next, these pressures will generate interacting biases for some forms over others. Some of these biases may be very general: human memory constraints plausibly favor languages which require less “storage space” to represent, and the energetic costs of motor action favor languages which allow for shorter messages, whereas the complex situations humans find themselves in favor more expressive languages capable of conveying a wider range of contents. Kirby (2000) and Steinert-Threlkeld (2020) argue, using computational models of agential communication, that these competing pressures motivate the development of compositional languages, in which semantic contents are productively determined by the syntactic combination of simpler linguistic units (although see discussion in §2 about worries posed by very young languages). Compositionality provides a way of maximizing expressive power without a corresponding burden on the memories of language users. If compositional languages are stably achieved through the sustained interaction of language users without innate preferences for, or knowledge of, compositional semantic principles, this could undermine PoS arguments for such innate contributions.

Other work in this tradition focuses instead on more specific features of human language users, such as the specific ways auditory or perceptual processing systems work. This aligns closely with ‘functionalist’ approaches to language, which argue that the linguistic quirks identified in PoS arguments are generally explained not by appeal to language-specific innate knowledge but rather by the communicative and computational needs of human language users. For example, wh-island phenomena have been argued to indicate not constraints on the grammar or well-formedness of the relevant sentences, but instead barriers to parsing, processing, or interpretation (see, e.g., Hofmeister & Sag 2010; Hawkins 2004; and Phillips 2013a, 2013b). See also Adele Goldberg (2013) who argues, in line with the broader constraints on efficient communication, that island phenomena reflect constraints imposed by the structuring of information within a discourse. Again, if these phenomena can be explained by appeal to properties of cognitive systems such as perception or memory, rather than syntax, then the argument for language-specific innate knowledge is undermined.

7.3 Construction Grammar

Construction Grammar (‘CxG’), or Usage-Based Grammar (the terms are sometimes used with slightly different meanings, but the programs are closely linked), as developed most explicitly by Adele Goldberg (e.g., 1995, 2006, 2019) and Michael Tomasello (e.g., 2003) represents perhaps the dominant alternative to generativism in contemporary linguistics, in part due to its anti-linguistic-nativism outlook. This approach is powerful due to its explicit development of an alternative linguistic formalism, for capturing adult linguistic competence, and its integration of models of how language is mentally represented with broader work in cognitive and developmental psychology (see, e.g., Ibbotson 2020 for an overview). CxG replaces the traditional division between the lexicon, the store of basic linguistic atoms, and the grammar, the set of constructive rules for generating complex linguistic expressions, with a more complex lexicon, consisting of ‘constructions’. These are stored linguistic units, mappings between sounds (or, more generally, observable forms) and meanings, as in the traditional view, but include as well as basic words and morphemes more abstract units. Abstract units are underspecified, in terms of both sound and meaning, but determine which other expressions they can be combined with to generate complete, usable, linguistic expressions. The productivity of language is then accounted for by the ability to combine expressions of a variety of sizes and degrees of abstraction in these novel ways. In this way, the complexity of language is relocated in the complex possibilities of sound and meaning allowed for by the specific constructions in the lexicon.

CxG is largely pursued from a non-nativist (i.e., non-linguistically-nativist) perspective, in that the particular constructions acquired are supposed to be extractable from the primary data on the basis solely of general learning mechanisms, such as pattern-recognition, and human-, but not language-, specific communicative abilities, such as intention-recognition. The rough idea is that the child is able to identify the communicative goals of the parents or peers from whom they are learning, and thereby map specific sounds onto specific meanings (e.g., the parent says ‘dog’ when there are dogs in our joint attentional space, and thus ‘dog’ likely means dog, the parent says ‘go to mommy’ when they want me to go to mommy, and thus ‘go to mommy’ likely means go to mommy, etc). Basic units thus enter the lexicon as unstructured: one (perhaps complex) sound is related to one (perhaps complex) meaning. However, the ability to recognize patterns in these constructions allows the child to decompose these units, and abstract over similarities between them generating underspecified complex constructions (e.g., if ‘go to mommy’ means go to mommy and ‘go to daddy’ means go to daddy, the child can infer that ‘go to X’ means go to x, a construction with a variable which must be filled in order to produce a complete sound-meaning pair). Iterated application of such an inferential strategy can lead to greater and greater degrees of abstraction in the lexicon, ultimately to entirely abstract constructions which take the place of traditional syntactic structures.

The promise of CxG approaches, with respect to challenges to nativism, is in the ‘shallower’ representations of linguistic structure they allow for, relative to traditional generative approaches. The more abstract these structures are, i.e., the farther removed they are from the superficial patterns of spoken language, the harder it is for children to learn them, and thus the stronger the motivation there is for innate help. And so if we can capture all the patterns of language in terms of schemas abstracted from concrete utterances, and their combination, it becomes more plausible that all the child needs to learn a language are domain-general capacities such as pattern-recognition. However, the abstract structures posited by generativists were posited for a reason; namely, to capture the apparently quirky patterns of human linguistic judgements. The challenge for the CxG theorist then is to show that the linguistic data appealed to in each PoS argument can be both captured in a CxG-friendly way, and learned on the basis of the data available to each child who learns the relevant facts. This will require engagement with the specific cases appealed to in PoS arguments.

One area in which this has been pursued is in discussion of island-effects. Tomasello (2003), for example, argues that island-phenomena need not be accounted for by positing innate constraints on the movement of wh-expressions, but instead by a learned preference for keeping “referential units” intact, as these are constructions with a clear communicative function, and the strings generated by displacing, for example, an auxiliary verb from within them would result in constructions not found in the child’s experience. In our earlier examples, children would prefer “Is the man who is tall in the room?” to “Is the man who tall is in the room?” not because of a prohibition on extraction from a relative clause, but due to the absence of strings like “the man who tall” in the PLD, relative to frequent, and communicatively functional, strings like “the man who is tall”. An alternative CxG approach is developed in Ambridge and A. Goldberg (2008), who argue that islands are a product partially of pragmatic constraints on discourse information structure, such that extraction is difficult or impossible from presupposed or backgrounded content (e.g., relative clauses), but possible from the asserted elements of a sentence. Both such approaches aim to derive the knowledge needed for appropriate avoidance of wh-islands from more general communicative and learning capacities. However, see Crain, Koring, and Thornton (2017) for arguments that an empirically adequate account of wh-islands, and several other constructions, requires more abstract, and plausibly innate, representations of language than is made available by CxG and other surface-oriented, usage-based approaches. See also Adger (2013) and Lidz and Williams (2009) for more general scepticism about the explanatory potential of CxG.

As noted above, while most research into questions of linguistic nativism has focused on syntax, the same questions arise in other linguistic domains. Phonology in particular has seen a significant amount of discussion about the role of innate knowledge, with several strongly empiricist theories being widely adopted, such as Emergent Phonology (Archangeli & Pulleyblank 2022), Usage-Based Phonology (Bybee 1999, 2010), and Exemplar Theory (Pierrehumbert 2003).

Bibliography

Adger, David, 2013, “Constructions and Grammatical Explanation: Comments on Goldberg”, Mind & Language, 28(4): 466–478. doi:10.1111/mila.12027
Ambridge, Ben and Liam Blything, 2024, “Large Language Models Are Better than Theoretical Linguists at Theoretical Linguistics”, Theoretical Linguistics, 50(1–2): 33–48. doi:10.1515/tl-2024-2002
Ambridge, Ben and Adele E. Goldberg, 2008, “The Island Status of Clausal Complements: Evidence in Favor of an Information Structure Explanation”, Cognitive Linguistics, 19(3): 357–389. doi:10.1515/COGL.2008.014
Ambridge, Ben and Elena Lieven, 2015, “A Constructivist Account of Child Language Acquisition”, in The Handbook of Language Emergence, Brian MacWhinney and William O’Grady (eds), Hoboken: Wiley, 478–510 (ch. 22). doi:10.1002/9781118346136.ch22
Antony, Louise M, 2003, “Rabbit-Pots and Supernovas: On the Relevance of Psychological Data to Linguistic Theory”, in Epistemology of Language, Alex Barber (ed.), Oxford: Oxford University Press, 47–68 (ch. 2). doi:10.1093/oso/9780199250578.003.0002
Archangeli, Diana and Douglas Pulleyblank, 2022, Emergent Phonology (Conceptual Foundations of Language Science 7), Berlin: Language Science Press. doi:10.5281/zenodo.5721159
Ariew, André, 1996, “Innateness and Canalization”, Philosophy of Science, 63(S3): S19–S27. doi:10.1086/289932
–––, 1999, “Innateness Is Canalization: In Defense of a Developmental Account of Innateness”, in Where Biology Meets Psychology: Philosophical Essays, Valerie Gray Hardcastle (ed.), Cambridge, MA: The MIT Press, 117–138 (ch. 7). doi:10.7551/mitpress/7220.003.0009
Au, Terry Kit-fong and Mariana Glusman, 1990, “The Principle of Mutual Exclusivity in Word Learning: To Honor or Not to Honor?”, Child Development, 61(5): 1474–1490. doi:10.2307/1130757
Baker, Mark C., 2002, The Atoms of Language: The Mind‘s Hidden Rules of Grammar, New York: Basic Books.
–––, 2009, “Language Universals: Abstract but Not Mythological”, Behavioral and Brain Sciences, 32(5): 448–449. doi:10.1017/S0140525X09990604
Bates, Elizabeth, 1997, “On Language Savants and the Structure of the Mind Review of: The Mind of a Savant: Language Learning and Modularity by Neil Smith and Ianthi-Maria Tsimpli, 1995”, International Journal of Bilingualism, 1(2): 163–179. doi:10.1177/136700699700100204
Bates, Elizabeth, Inge Bretherton, and Lynn S. Snyder, 1988, From First Words to Grammar: Individual Differences and Dissociable Mechanisms, Cambridge/New York: Cambridge University Press.
Bellugi, Ursula, Liz Lichtenberger, Wendy Jones, Zona Lai, and Marie St. George, 2000, “The Neurocognitive Profile of Williams Syndrome: A Complex Pattern of Strengths and Weaknesses”, Journal of Cognitive Neuroscience, 12(Supplement 1): 7–29. doi:10.1162/089892900561959
Berent, Iris, Donca Steriade, Tracy Lennertz, and Vered Vaknin, 2007, “What We Know about What We Have Never Heard: Evidence from Perceptual Illusions”, Cognition, 104(3): 591–630. doi:10.1016/j.cognition.2006.05.015
Berent, Iris, 2013, The Phonological Mind, Cambridge: Cambridge University Press.
Berwick, Robert C. and Noam Chomsky, 2016, Why Only Us: Language and Evolution, Cambridge, MA: The MIT Press. doi:10.7551/mitpress/9780262034241.001.0001
Biberauer, Theresa, Anders Holmberg, Ian Roberts, and Michelle Sheehan, 2014, “Complexity in Comparative Syntax: The View from Modern Parametric Theory”, in Measuring Grammatical Complexity, Frederick J. Newmeyer and Laurel B. Preston (eds), Oxford: Oxford University Press, 103–127 (ch. 6). doi:10.1093/acprof:oso/9780199685301.003.0006
Bickerton, Derek, 1984, “The Language Bioprogram Hypothesis”, Behavioral and Brain Sciences, 7(2): 173–188. doi:10.1017/S0140525X00044149
–––, 2014, More than Nature Needs: Language, Mind, and Evolution, Cambridge, MA: Harvard University Press. doi:10.4159/9780674728523
Blevins, Juliette, 2004, Evolutionary Phonology: The Emergence of Sound Patterns, Cambridge/New York: Cambridge University Press. doi:10.1017/CBO9780511486357
Boeckx, Cedric, 2014, “What Principles and Parameters Got Wrong”, in Linguistic Variation in the Minimalist Framework, M. Carme Picallo (ed.), Oxford: Oxford University Press, 154–178 (ch. 8). doi:10.1093/acprof:oso/9780198702894.003.0008
–––, 2021, Reflections on Language Evolution: From Minimalism to Pluralism (Conceptual Foundations of Language Science 6), Berlin: Language Science Press. doi:10.5281/zenodo.5524633
Bowerman, Melissa, 1982, “Starting to Talk Worse: Clues to Language Acquisition from Children’s Late Speech Errors”, in U-Shaped Behavioral Growth, Sidney Strauss (ed.), New York: Academic Press, 101–145. doi:10.1016/B978-0-12-673020-3.50012-4
Bromberger, Sylvain and Morris Halle, 2000, “The Ontology of Phonology (Revised)”, in Phonological Knowledge, Noel Burton-Roberts, Philip Carr, and Gerard Docherty (eds), New York: Oxford University Press, 19–38 (ch. 2). doi:10.1093/oso/9780198241270.003.0002
Bybee, Joan L., 1999, “Usage-Based Phonology”, in Functionalism and Formalism in Linguistics, Volume 1: General Papers (Studies in Language Companion Series 41), Michael Darnell, Edith A. Moravcsik, Michael Noonan, Frederick J. Newmeyer, and Kathleen Wheatley (eds), Amsterdam: John Benjamins Publishing Company, 211–242. doi:10.1075/slcs.41.12byb
–––, 2010, Language, Usage and Cognition, Cambridge/New York: Cambridge University Press. doi:10.1017/CBO9780511750526
Carey, Susan, 2009, The Origin of Concepts (Oxford Series in Cognitive Development), Oxford/New York: Oxford University Press. doi:10.1093/acprof:oso/9780195367638.001.0001
Carey, Susan and Elsa Bartlett, 1978, “Acquiring a Single New Word”, in Papers and Reports on Child Language Development, 15, Linguistics, Stanford University, 17–29.
Chater, Nick, Alexander Clark, John A. Goldsmith, and Amy Perfors, 2015, Empiricism and Language Learnability, Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780198734260.001.0001
Cherniak, Christopher, 2009, “Brain Wiring Optimization and Non-Genomic Nativism”, in Of Minds and Language, Massimo Piattelli-Palmarini, Juan Uriagereka, and Pello Salaburu (eds), Oxford: Oxford University Press, 108–120 (ch. 8). doi:10.1093/oso/9780199544660.003.0009
Chomsky, Noam, 1957, Syntactic Structures (Janua Linguarum, 4), The Hague: Mouton.
–––, 1965, Aspects of the Theory of Syntax (Research Laboratory of Electronics. Special Technical Report, 11), Cambridge: MIT Press.
–––, 1968 [2006], Language and Mind, New York: Harcourt, Brace & World. Enlarged edition, 1972, New York: Harcourt Brace Jovanovich. Third edition, 2006, Cambridge/New York: Cambridge University Press. doi:10.1017/CBO9780511791222
–––, 1975, Reflections on Language, New York: Pantheon Books.
–––, 1980, Rules and Representations (Woodbridge Lectures Delivered at Columbia University, no. 11, 1978), New York: Columbia University Press.
–––, 1981, Lectures on Government and Binding (Studies in Generative Grammar 9), Dordrecht/Cinnaminson: Foris. Based on lectures given at the GLOW conference and workshop, Pisa, 1979. Seventh edition, Berlin/New York: Mouton de Gruyter, 1993.
–––, 1986, Knowledge of Language: Its Nature, Origins, and Use (Convergence), Westport, CT/London: Praeger.
–––, 1995, The Minimalist Program (Current Studies in Linguistics 28), Cambridge, MA: The MIT Press.
–––, 2017, “Two Notions of Modularity”, in On Concepts, Modules, and Language: Cognitive Science at Its Core, Roberto G. de Almeida and Lila R. Gleitman (eds), New York: Oxford University Press, 25–40 (ch. 1).
Christiansen, Morten H. and Nick Chater, 2008, “Language as Shaped by the Brain”, Behavioral and Brain Sciences, 31(5): 489–509. doi:10.1017/S0140525X08004998
–––, 2016, Creating Language: Integrating Evolution, Acquisition, and Processing, Cambridge, MA: The MIT Press. doi:10.7551/mitpress/10406.001.0001
Churchland, Paul M., 2012, Plato’s Camera: How the Physical Brain Captures a Landscape of Abstract Universals, Cambridge, MA: MIT Press. doi:10.7551/mitpress/9116.001.0001
Clark, Andy, 2013, “Whatever next? Predictive Brains, Situated Agents, and the Future of Cognitive Science”, Behavioral and Brain Sciences, 36(3): 181–204. doi:10.1017/S0140525X12000477
–––, 2016, Surfing Uncertainty: Prediction, Action, and the Embodied Mind, New York: Oxford University Press. doi:10.1093/acprof:oso/9780190217013.001.0001
Clark, Eve V., 1987, “The Principle of Contrast: A Constraint on Language Acquisition”, in Mechanisms of Language Acquisition, Brian MacWhinney (ed.), Hillsdale, NJ: Lawrence Erlbaum, 1–33.
Clarke, Sam, 2025, “Number Nativism”, Philosophy and Phenomenological Research, 110(1): 226–252. doi:10.1111/phpr.13107
Collins, John, 2003, “Cowie on the Poverty of Stimulus”, Synthese, 136(2): 159–190. doi:10.1023/A:1024738522031
–––, 2004, “Faculty Disputes”, Mind & Language, 19(5): 503–533. doi:10.1111/j.0268-1064.2004.00270.x
–––, 2005, “Nativism: In Defense of a Biological Understanding”, Philosophical Psychology, 18(2): 157–177. doi:10.1080/09515080500169686
Crain, Stephen, 1991, “Language Acquisition in the Absence of Experience”, Behavioral and Brain Sciences, 14(4): 597–612. doi:10.1017/S0140525X00071491
Crain, Stephen, Loes Koring, and Rosalind Thornton, 2017, “Language Acquisition from a Biolinguistic Perspective”, Neuroscience & Biobehavioral Reviews, 81: 120–149. doi:10.1016/j.neubiorev.2016.09.004
Crain, Stephen and Paul Pietroski, 2001, “Nature, Nurture And Universal Grammar”, Linguistics and Philosophy, 24(2): 139–186. doi:10.1023/A:1005694100138
Culicover, Peter W. and Ray Jackendoff, 2005, Simpler Syntax (Oxford Linguistics), Oxford/New York: Oxford University Press. doi:10.1093/acprof:oso/9780199271092.001.0001
Currie, Adrian, 2018, Rock, Bone, and Ruin: An Optimist’s Guide to the Historical Sciences, Cambridge, MA: The MIT Press. doi:10.7551/mitpress/11421.001.0001
Curtiss, Susan, Victoria Fromkin, Stephen Krashen, David Rigler, and Marilyn Rigler, 1974, “The Linguistic Development of Genie”, Language, 50(3): 528–554. doi:10.2307/412222
Deacon, Terrence William, 1997, The Symbolic Species: The Co-Evolution of Language and the Brain, New York: Norton.
Del Pinal, Guillermo, 2015, “The Structure of Semantic Competence: Compositionality as an Innate Constraint of the Faculty of Language”, Mind & Language, 30(4): 375–413. doi:10.1111/mila.12084
Dresher, B Elan, 2025, “On the Poverty of the Stimulus in Phonology”, Second Language Research, 41(3): 523–534. doi:10.1177/02676583251321795
Dupre, Gabe, 2021a, “Empiricism, Syntax, and Ontogeny”, Philosophical Psychology, 34(7): 1011–1046. doi:10.1080/09515089.2021.1937591
–––, 2021b, “(What) Can Deep Learning Contribute to Theoretical Linguistics?”, Minds and Machines, 31(4): 617–635. doi:10.1007/s11023-021-09571-w
–––, 2022, “Balancing Evolution and Acquisition in Theoretical Linguistics: Tensions and Prospects”, in Philosophical Approaches to Language and Communication: Volume 1 (Studies in Philosophy of Language and Linguistics 19), Piotr Stalmaszczyk and Martin Hinton (eds), Berlin: Peter Lang Verlag, 15–50.
–––, 2024, “Acquiring a Language vs. Inducing a Grammar”, Cognition, 247: article 105771. doi:10.1016/j.cognition.2024.105771
Eimas, Peter D., Einar R. Siqueland, Peter Jusczyk, and James Vigorito, 1971, “Speech Perception in Infants”, Science, 171(3968): 303–306. doi:10.1126/science.171.3968.303
Embick, David and David Poeppel, 2015, “Towards a Computational(Ist) Neurobiology of Language: Correlational , Integrated and Explanatory Neurolinguistics”, Language, Cognition and Neuroscience, 30(4): 357–366. doi:10.1080/23273798.2014.980750
Evans, Nicholas and Stephen C. Levinson, 2009, “The Myth of Language Universals: Language Diversity and Its Importance for Cognitive Science”, Behavioral and Brain Sciences, 32(5): 429–448. doi:10.1017/S0140525X0999094X
Everett, Daniel L., 2005, “Cultural Constraints on Grammar and Cognition in Pirahã: Another Look at the Design Features of Human Language”, Current Anthropology, 46(4): 621–646. doi:10.1086/431525
Fedorenko, Evelina and Idan A. Blank, 2020, “Broca’s Area Is Not a Natural Kind”, Trends in Cognitive Sciences, 24(4): 270–284. doi:10.1016/j.tics.2020.01.001
Fedorenko, Evelina, Anna A. Ivanova, and Tamar I. Regev, 2024, “The Language Network as a Natural Kind within the Broader Landscape of the Human Brain”, Nature Reviews Neuroscience, 25(5): 289–312. doi:10.1038/s41583-024-00802-4
Feldman, Heidi, Susan Goldin-Meadow, and Lila Gleitman, 1978, “Beyond Herodotus: The Creation of Language by Linguistically Deprived Deaf Children”, in Action, Symbol, and Gesture: The Emergence of Language, Andy Lock (ed.), London/New York: Academic Press, 351–414.
Fodor, Janet Dean and Carrie Crowther, 2002, “Understanding Stimulus Poverty Arguments”, The Linguistic Review, 19(1–2). doi:10.1515/tlir.19.1-2.105
Fodor, Jerry A., 1975, The Language of Thought (The Language & Thought Series), Cambridge, MA: Harvard University Press.
–––, 1981, “The Present Status of the Innateness Controversy”, in his Representations: Philosophical Essays on the Foundations of Cognitive Science, Cambridge, MA: MIT Press, 257–316 (ch. 10).
–––, 1983, The Modularity of Mind: An Essay on Faculty Psychology, Cambridge, MA: MIT Press. doi:10.7551/mitpress/4737.001.0001
–––, 1991, “The Dogma That Didn’t Bark (A Fragment of a Naturalized Epistemology)”, Mind, 100(398): 201–220. doi:10.1093/mind/C.398.201
–––, 2001, “Doing without What’s within: Fiona Cowie’s Critique of Nativism”, Mind, 110(437): 99–148. doi:10.1093/mind/110.437.99
Freidin, Robert, 1991, “Linguistic Theory and Language Acquisition: A Note on Structure-Dependence”, Behavioral and Brain Sciences, 14(4): 618–619. doi:10.1017/S0140525X00071569
Friederici, Angela D., 2018, “The Neural Basis for Human Syntax: Broca’s Area and Beyond”, Current Opinion in Behavioral Sciences, 21: 88–92. doi:10.1016/j.cobeha.2018.03.004
Futrell, Richard and Kyle Mahowald, 2025, “How Linguistics Learned to Stop Worrying and Love the Language Models”, Behavioral and Brain Sciences, first onlin: 24 July 2025. doi:10.1017/S0140525X2510112X
Gallistel, Charles R., 1998, “Symbolic Processes in the Brain”, in An Invitation to Cognitive Science, Volume 4: Methods, Models, and Conceptual Issues, Daniel N. Osherson, Saul Sternberg, and Don Scarborough (eds), Cambridge, MA: The MIT Press, 1–51 (ch. 1). doi:10.7551/mitpress/3967.003.0004
Gasparri, Luca and Diego Marconi, 2015 [2024], “Word Meaning”, in The Stanford Encyclopedia of Philosophy (Summer 2024 edition), Edward N. Zalta and Uri Nodelman (eds), URL = <https://plato.stanford.edu/archives/sum2024/entries/word-meaning/.
Gilkerson, Jill, Jeffrey A. Richards, Steven F. Warren, Judith K. Montgomery, Charles R. Greenwood, D. Kimbrough Oller, John H. L. Hansen, and Terrance D. Paul, 2017, “Mapping the Early Language Environment Using All-Day Recordings and Automated Analysis”, American Journal of Speech-Language Pathology, 26(2): 248–265. doi:10.1044/2016_AJSLP-15-0169
Gleitman, Lila, 1990, “The Structural Sources of Verb Meanings”, Language Acquisition, 1(1): 3–55. doi:10.1207/s15327817la0101_2
Gleitman, Lila R., Elissa L. Newport, and Henry Gleitman, 1984, “The Current Status of the Motherese Hypothesis”, Journal of Child Language, 11(1): 43–79. doi:10.1017/S0305000900005584
Godfrey-Smith, P., 2000. “On the Theoretical Role of “Genetic Coding” ”. Philosophy of Science, 67(1): 26–44.
Goldberg, Adele E., 1995, Constructions: A Construction Grammar Approach to Argument Structure (Cognitive Theory of Language and Culture), Chicago: University of Chicago Press.
–––, 2006, Constructions at Work: The Nature of Generalization in Language (Oxford Linguistics), Oxford/New York: Oxford University Press. doi:10.1093/acprof:oso/9780199268511.001.0001
–––, 2013, “Backgrounded Constituents Cannot Be ‘Extracted’”, in Experimental Syntax and Island Effects, Jon Sprouse and Norbert Hornstein (eds), Cambridge: Cambridge University Press, 221–238 (ch. 10). doi:10.1017/CBO9781139035309.012
–––, 2019, Explain Me This: Creativity, Competition, and the Partial Productivity of Constructions, Princeton, NJ: Princeton University Press.
Goldin-Meadow, Susan, 2003a, Hearing Gesture: How Our Hands Help Us Think, Cambridge, MA: Belknap Press of Harvard University Press.
–––, 2003b, The Resilience of Language: What Gesture Creation in Deaf Children Can Tell Us about How All Children Learn Language (Essays in Developmental Psychology), New York: Psychology Press.
Goldin-Meadow, Susan and Heidi Feldman, 1977, “The Development of Language-Like Communication Without a Language Model”, Science, 197(4301): 401–403. doi:10.1126/science.877567
Gómez, David Maximiliano, Iris Berent, Silvia Benavides-Varela, Ricardo A. H. Bion, Luigi Cattarossi, Marina Nespor, and Jacques Mehler, 2014, “Language Universals at Birth”, Proceedings of the National Academy of Sciences, 111(16): 5837–5841. doi:10.1073/pnas.1318261111
Goodluck, Helen, 2020, Language Acquisition by Children: A Linguistic Introduction (Edinburgh Advanced Textbooks in Linguistics), Edinburgh: Edinburgh University Press.
Gopnik, M. and Martha B. Crago, 1991, “Familial Aggregation of a Developmental Language Disorder”, Cognition, 39(1): 1–50. doi:10.1016/0010-0277(91)90058-C
Goro, Takuya and Sachie Akiba, 2004, “The Acquisition of Disjunction and Positive Polarity in Japanese”, in WCCFL 23: Proceedings of the 23rd West Coast Conference on Formal Linguistics, Vineeta Chand, Ann Kelleher, Angelo J. Rodríguez, and Benjamin Schmeiser (eds), Somerville, MA: Cascadilla Press, 251–264.
Greenberg, Joseph H., 1966, Language Universals: With Special Reference to Feature Hierarchies, The Hague, Mouton & Co.
Halberda, Justin, 2003, “The Development of a Word-Learning Strategy”, Cognition, 87(1): B23–B34. doi:10.1016/S0010-0277(02)00186-5
Hale, Mark and Charles Reiss, 2003, “The Subset Principle in Phonology: Why the Tabula Can’t Be Rasa”, Journal of Linguistics, 39(2): 219–244. doi:10.1017/S0022226703002019
Han, Chung-hye, Jeffrey Lidz, and Julien Musolino, 2007, “V-Raising and Grammar Competition in Korean: Evidence from Negation and Quantifier Scope”, Linguistic Inquiry, 38(1): 1–47. doi:10.1162/ling.2007.38.1.1
Hartshorne, Joshua K., 2022, “When Do Children Lose the Language Instinct? A Critical Review of the Critical Periods Literature”, Annual Review of Linguistics, 8: 143–151. doi:10.1146/annurev-linguistics-032521-053234
Hartshorne, Joshua K., Joshua B. Tenenbaum, and Steven Pinker, 2018, “A Critical Period for Second Language Acquisition: Evidence from 2/3 Million English Speakers”, Cognition, 177: 263–277. doi:10.1016/j.cognition.2018.04.007
Hauser, Marc D., Noam Chomsky, and W. Tecumseh Fitch, 2002, “The Faculty of Language: What Is It, Who Has It, and How Did It Evolve?”, Science, 298(5598): 1569–1579. doi:10.1126/science.298.5598.1569
Hawkins, John A., 2004, Efficiency and Complexity in Grammars (Oxford Linguistics), Oxford/New York: Oxford University Press. doi:10.1093/acprof:oso/9780199252695.001.0001
Heyes, Cecilia M., 2018, Cognitive Gadgets: The Cultural Evolution of Thinking, Cambridge, MA: The Belknap Press of Harvard University Press.
Hofmeister, Philip and Ivan A. Sag, 2010, “Cognitive Constraints and Island Effects”, Language, 86(2): 366–415. doi:10.1353/lan.0.0223
Hornstein, Norbert, 2013, “Three Grades of Grammatical Invofrom a Minimalist Perspective”, Mind & Language, 28(4): 392–420. doi:10.1111/mila.12023
Hornstein, Norbert and David Lightfoot (eds), 1981, Explanation in Linguistics: The Logical Problem of Language Acquisition (Longman Linguistics Library 25), London/New York: Longman.
Hornstein, Norbert, Jairo Nunes, and Kleanthes K. Grohmann, 2005, Understanding Minimalism (Cambridge Textbooks in Linguistics), Cambridge/New York: Cambridge University Press. doi:10.1017/CBO9780511840678
Ibbotson, Paul, 2020, What It Takes to Talk: Exploring Developmental Cognitive Linguistics (Cognitive Linguistics Research [CLR] 64), Berlin/Boston: Walter De Gruyter. doi:10.1515/9783110647914
Jackendoff, Ray, 1977, X-Bar Syntax: A Study of Phrase Structure (Linguistic Inquiry Monographs), Cambridge, MA: The MIT Press. [Jackendoff 1977 available online]
–––, 1990, Semantic Structures (Current Studies in Linguistics 18), Cambridge, MA: MIT Press.
–––, 1993, Patterns in the Mind: Language and Human Nature, New York/London: Harvester Wheatsheaf.
–––, 2002, Foundations of Language: Brain, Meaning, Grammar, Evolution, Oxford/New York: Oxford University Press. doi:10.1093/acprof:oso/9780198270126.001.0001
–––, 2025, “The Parallel Architecture in Language and Elsewhere”, Topics in Cognitive Science, 17(4): 822–831. doi:10.1111/tops.12698
Jackendoff, Ray and Jenny Audring, 2020, The Texture of the Lexicon: Relational Morphology and the Parallel Architecture, Oxford/New York: Oxford University Press. doi:10.1093/oso/9780198827900.001.0001
Jusczyk, Peter W., 1999, “How Infants Begin to Extract Words from Speech”, Trends in Cognitive Sciences, 3(9): 323–328. doi:10.1016/S1364-6613(99)01363-7
Kallini, J., Papadimitriou, I., Futrell, R., Mahowald, K. and Potts, C., 2024, “Mission: Impossible Language Models”. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers): 14691–14714.
Karmiloff-Smith, Annette, 1992, Beyond Modularity: A Developmental Perspective on Cognitive Science (Learning, Development, and Conceptual Change), Cambridge, MA: The MIT Press. doi:10.7551/mitpress/1579.001.0001
Katzir, Roni, 2023, “Why Large Language Models Are Poor Theories of Human Linguistic Cognition: A Reply to Piantadosi”, Biolinguistics, 17(December): e13153. doi:10.5964/bioling.13153
Kaminski, Juliane, Josep Call, and Julia Fischer, 2004, “Word Learning in a Domestic Dog: Evidence for ‘Fast Mapping’”, Science, 304(5677): 1682–1683. doi:10.1126/science.1097859
Kim, Judy Sein, Brianna Aheimer, Verónica Montané Manrara, and Marina Bedny, 2021, “Shared Understanding of Color among Sighted and Blind Adults”, Proceedings of the National Academy of Sciences, 118(33): e2020192118. doi:10.1073/pnas.2020192118
Kirby, Simon, 2000, “Syntax Without Natural Selection: How Compositionality Emerges from Vocabulary in a Population of Learners”, in The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form, Chris Knight, Michael Studdert-Kennedy, and James Hurford (eds), Cambridge University Press, 303–323 (ch. 18). doi:10.1017/CBO9780511606441.019
Kirby, Simon, Monica Tamariz, Hannah Cornish, and Kenny Smith, 2015, “Compression and Communication in the Cultural Evolution of Linguistic Structure”, Cognition, 141: 87–102. doi:10.1016/j.cognition.2015.03.016
Kuhl, Patricia K. and James D. Miller, 1975, “Speech Perception by the Chinchilla: Voiced-Voiceless Distinction in Alveolar Plosive Consonants”, Science, 190(4209): 69–72. doi:10.1126/science.1166301
Lai, Cecilia S. L., Simon E. Fisher, Jane A. Hurst, Faraneh Vargha-Khadem, and Anthony P. Monaco, 2001, “A Forkhead-Domain Gene Is Mutated in a Severe Speech and Language Disorder”, Nature, 413(6855): 519–523. doi:10.1038/35097076
Landau, Barbara and Lila R. Gleitman, 1985, Language and Experience: Evidence from the Blind Child (Cognitive Science Series 8), Cambridge, MA: Harvard University Press.
Lasnik, Howard, 2000, Syntactic Structures Revisited: Contemporary Lectures on Classic Transformational Theory (Current Studies in Linguistics 33), Cambridge, MA: The MIT Press. doi:10.7551/mitpress/6592.001.0001
Lasnik, Howard and Jeffrey L. Lidz, 2016, “The Argument from the Poverty of the Stimulus”, in The Oxford Handbook of Universal Grammar, Ian Roberts (ed.), Oxford/New York: Oxford University Press, 221–248. doi:10.1093/oxfordhb/9780199573776.013.10
Lasnik, Howard and Juan Uriagereka, 2002, “On the Poverty of the Challenge”, The Linguistic Review, 18(1–2): 147–150. doi:10.1515/tlir.19.1-2.147
Laurence, Stephen and Eric Margolis, 2001, “The Poverty of the Stimulus Argument”, The British Journal for the Philosophy of Science, 52(2): 217–276. doi:10.1093/bjps/52.2.217
–––, 2005, “Number and Natural Language”, in The Innate Mind, Volume 1: Structure and Contents, Peter Carruthers, Stephen Laurence, and Stephen Stich (eds), Oxford/New York: Oxford University Press, 216–236 (ch. 13). doi:10.1093/acprof:oso/9780195179675.003.0013
–––, 2024, The Building Blocks of Thought: A Rationalist Account of the Origins of Concepts, Oxford: Oxford University Press. doi:10.1093/9780191925375.001.0001
Lee, Joanne N. and Letitia R. Naigles, 2005, “The Input to Verb Learning in Mandarin Chinese: A Role for Syntactic Bootstrapping”, Developmental Psychology, 41(3): 529–540. doi:10.1037/0012-1649.41.3.529
Legate, Julie Anne and Charles D Yang, 2002, “Empirical Re-Assessment of Stimulus Poverty Arguments”, The Linguistic Review, 18(1–2): 151–162. doi:10.1515/tlir.19.1-2.151
Lenneberg, Eric H., 1967, Biological Foundations of Language, New York: Wiley.
Liberman, Alvin M., Franklin S. Cooper, Donald P. Shankweiler, and Michael Studdert-Kennedy, 1967, “Perception of the Speech Code”, Psychological Review, 74(6): 431–461. doi:10.1037/h0020279
Lidz, Jeffrey, Henry Gleitman, and Lila Gleitman, 2003, “Understanding How Input Matters: Verb Learning and the Footprint of Universal Grammar”, Cognition, 87(3): 151–178. doi:10.1016/S0010-0277(02)00230-5
–––, 2004, “Kidz in the ’Hood: Syntactic Bootstrapping and the Mental Lexicon”, in Weaving A Lexicon, D. Geoffrey Hall and Sandra R. Waxman (eds), Cambridge, MA: The MIT Press, 603–636 (ch. 19). doi:10.7551/mitpress/7185.003.0023
Lidz, Jeffrey and Alexander Williams, 2009, “Constructions on Holiday”, Cognitive Linguistics, 20(1): 177–189. doi:10.1515/COGL.2009.011
Linzen, Tal, Emmanuel Dupoux, and Yoav Goldberg, 2016, “Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies”, Transactions of the Association for Computational Linguistics, 4: 521–535. doi:10.1162/tacl_a_00115
MacWhinney, Brian, 1995, The CHILDES Project: Tools for Analyzing Talk, second edition, Hillsdale, NJ: Lawrence Erlbaum. First edition, 1991.
Marcus, Gary F., 1993, “Negative Evidence in Language Acquisition”, Cognition, 46(1): 53–85. doi:10.1016/0010-0277(93)90022-N
Margolis, Eric and Stephen Laurence, 2013, “In Defense of Nativism”, Philosophical Studies, 165(2): 693–718. doi:10.1007/s11098-012-9972-x
Markie, Peter and M. Folescu, 2021 [2023], “Rationalism vs. Empiricism” (Spring 2023), in The Stanford Encyclopedia of Philosophy, Edward N. Zalta and Uri Nodelman (eds), URL = <https://plato.stanford.edu/archives/spr2023/entries/rationalism-empiricism/>.
Markman, Ellen M., 1989, Categorization and Naming in Children: Problems of Induction (The MIT Press Series in Learning, Development, and Conceptual Change), Cambridge, MA: The MIT Press. doi:10.7551/mitpress/1750.001.0001
–––, 1990, “Constraints Children Place on Word Meanings”, Cognitive Science, 14(1): 57–77. doi:10.1207/s15516709cog1401_4
–––, 1992, “Constraints on Word Learning: Speculations about Their Nature, Origins, and Domain Specificity”, in Modularity and Constraints in Language and Cognition (The Minnesota Symposia on Child Psychology 25), Megan R. Gunnar and Michael P. Maratsos (eds), Hillsdale, NJ: Lawrence Erlbaum Associates, 59–102 (ch. 3).
Markman, Ellen M. and Jean E. Hutchinson, 1984, “Children’s Sensitivity to Constraints on Word Meaning: Taxonomic versus Thematic Relations”, Cognitive Psychology, 16(1): 1–27. doi:10.1016/0010-0285(84)90002-1
McCoy, R. Thomas, Robert Frank, and Tal Linzen, 2020, “Does Syntax Need to Grow on Trees? Sources of Hierarchical Inductive Bias in Sequence-to-Sequence Networks”, Transactions of the Association for Computational Linguistics, 8: 125–140. doi:10.1162/tacl_a_00304
Millière, Raphaël, 2026, “Language Models as Models of Language”, in The Oxford Handbook of Philosophy of Linguistics (Oxford Handbooks Series), Ryan M. Nefdt, Gabe Dupre, and Kate Hazel Stanton (eds), New York: Oxford University Press, ch. 25.
Misra, Kanishka and Kyle Mahowald, 2024, “Language Models Learn Rare Phenomena from Less Rare Phenomena: The Case of the Missing AANNs”, in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA: Association for Computational Linguistics, 913–929. doi:10.18653/v1/2024.emnlp-main.53
Miyagawa, Shigeru, 2017, “Integration Hypothesis: A Parallel Model of Language Development in Evolution”, in Evolution of the Brain, Cognition, and Emotion in Vertebrates, Shigeru Watanabe, Michel A Hofman, and Toru Shimizu (eds), New York: Springer, 225–247. doi:10.1007/978-4-431-56559-8_11
Miyagawa, Shigeru, Shiro Ojima, Robert C. Berwick, and Kazuo Okanoya, 2014, “The Integration Hypothesis of Human Language Evolution and the Nature of Contemporary Languages”, Frontiers in Psychology, 5: article 564. doi:10.3389/fpsyg.2014.00564
Moro, Andrea, Matteo Greco, and Stefano F. Cappa, 2023, “Large Languages, Impossible Languages and Human Brains”, Cortex, 167: 82–85. doi:10.1016/j.cortex.2023.07.003
Musso, Mariacristina, Andrea Moro, Volkmar Glauche, Michel Rijntjes, Jürgen Reichenbach, Christian Büchel, and Cornelius Weiller, 2003, “Broca’s Area and the Language Instinct”, Nature Neuroscience, 6(7): 774–781. doi:10.1038/nn1077
Naigles, Letitia, 1990, “Children Use Syntax to Learn Verb Meanings”, Journal of Child Language, 17(2): 357–374. doi:10.1017/S0305000900013817
Nevins, Andrew, David Pesetsky, and Cilene Rodrigues, 2009, “Pirahã Exceptionality: A Reassessment”, Language, 85(2): 355–404. doi:10.1353/lan.0.0107
Newmeyer, Frederick J., 2004, “Against a Parameter-Setting Approach to Typological Variation”, Linguistic Variation Yearbook, 4: 181–234. doi:10.1075/livy.4.06new
–––, 2017, “Where, If Anywhere, are Parameters? a Critical Historical Overview of Parametric Theory”, in On Looking into Words (and Beyond): Structures, Relations, Analyses, Claire Bowern, Laurence Horn, and Raffaella Zanuttini (eds), Berlin: Language Science Press, 547–568 (ch. 25). [Newmeyer 2017 available online]
Newport, Elissa L., Henry Gleitman, and Lila R. Gleitman, 1977, “Mother I’d Rather Do It Myself: Some Effects and Non- Effects of Maternal Speech Style”, in Talking to Children: Language Input and Acquisition, Catherine E. Snow and Charles A. Ferguson (eds), Cambridge/New York: Cambridge University Press, 109–149 (ch. 5)
Ochs, Elinor and Bambi B. Schieffelin, 1982, “Language Acquisition and Socialization: Three Developmental Stories and Their Implications”, Sociolinguistic Working Paper 105, Austin, TX: Southwest Educational Development Lab. [Ochs and Schieffelin 1982 available online]
Pearl, Lisa, 2022, “Poverty of the Stimulus Without Tears”, Language Learning and Development, 18(4): 415–454. doi:10.1080/15475441.2021.1981908
Pearl, Lisa and Jeffrey Lidz, 2009, “When Domain-General Learning Fails and When It Succeeds: Identifying the Contribution of Domain Specificity”, Language Learning and Development, 5(4): 235–265. doi:10.1080/15475440902979907
–––, 2013, “Parameters in Language Acquisition”, in The Cambridge Handbook of Biolinguistics, Cedric Boeckx and Kleanthes K. Grohmann (eds), Cambridge: Cambridge University Press, 129–159. doi:10.1017/CBO9780511980435.010
Perfors, Andrew, 2008, Learnability, Representation, and Language: A Bayesian Approach, PhD Thesis, Massachusetts Institute of Technology.
–––, 2012, “Bayesian Models of Cognition: What’s Built in After All?”, Philosophy Compass, 7(2): 127–138. doi:10.1111/j.1747-9991.2011.00467.x
Perfors, Andrew, Joshua B. Tenenbaum, and Elizabeth Wonnacott, 2010, “Variability, Negative Evidence, and the Acquisition of Verb Argument Constructions”, Journal of Child Language, 37(3): 607–642. doi:10.1017/S0305000910000012
Pesetsky, David, 2009, “Against Taking Linguistic Diversity at ‘Face Value’”, Behavioral and Brain Sciences, 32(5): 464–465. doi:10.1017/S0140525X09990562
Petitto, Laura-Ann, 2005, “How the Brain Begets Language”, in The Cambridge Companion to Chomsky, James McGilvray (ed.), 1st ed., Cambridge/New York: Cambridge University Press, 84–101 (ch. 4). doi:10.1017/CCOL0521780136.005
Phillips, Colin, 2013a, “On the Nature of Island Constraints I: Language Processing and Reductionist Accounts”, in Experimental Syntax and Island Effects, Jon Sprouse and Norbert Hornstein (eds), Cambridge: Cambridge University Press, 64–108 (ch. 4). doi:10.1017/CBO9781139035309.005
–––, 2013b, “On the Nature of Island Constraints II: Language Learning and Innateness”, in Experimental Syntax and Island Effects, Jon Sprouse and Norbert Hornstein (eds), Cambridge: Cambridge University Press, 132–158 (ch. 6). doi:10.1017/CBO9781139035309.007
Piantadosi, Steven T., 2024, “Modern Language Models Refute Chomsky’s Approach to Language”, in From Fieldwork to Linguistic Theory: A Tribute to Dan Everett, Edward Gibson and Moshe Poliak (eds), Berlin: Language Science Press, 353–414 (ch. 15). doi:10.5281/ZENODO.12665932
Pierrehumbert, Janet B., 2003, “Phonetic Diversity, Statistical Learning, and Acquisition of Phonology”, Language and Speech, 46(2–3): 115–154. doi:10.1177/00238309030460020501
Pinker, Steven, 1989, Learnability and Cognition: The Acquisition of Argument Structure (Learning, Development, and Conceptual Change), Cambridge, MA: MIT Press.
Pinker, Steven and Paul Bloom, 1990, “Natural Language and Natural Selection”, Behavioral and Brain Sciences, 13(4): 707–727. doi:10.1017/S0140525X00081061
Pinker, Steven and Ray Jackendoff, 2005, “The Faculty of Language: What’s Special about It?”, Cognition, 95(2): 201–236. doi:10.1016/j.cognition.2004.08.004
Pinker, Steven and Alan Prince, 1988, “On Language and Connectionism: Analysis of a Parallel Distributed Processing Model of Language Acquisition”, Cognition, 28(1–2): 73–193. doi:10.1016/0010-0277(88)90032-7
Plunkett, Kim, Annette Karmiloff‐Smith, Elizabeth Bates, Jeffrey L. Elman, and Mark H. Johnson, 1997, “Connectionism and Developmental Psychology”, Journal of Child Psychology and Psychiatry, 38(1): 53–80. doi:10.1111/j.1469-7610.1997.tb01505.x
Pullum, Geoffrey K. and Barbara C. Scholz, 2002, “Empirical Assessment of Stimulus Poverty Arguments”, The Linguistic Review, 18(1–2): 9–50. doi:10.1515/tlir.19.1-2.9
Quine, W. V., 1960, Word and Object (Studies in Communication), Cambridge, MA: Technology Press of the Massachusetts Institute of Technology.
Radford, Alec, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al., 2021, “Learning Transferable Visual Models From Natural Language Supervision”, in Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 8748–8763. [Radford et al. 2021 available online]
Reiss, Charles, 2018, “Substance Free Phonology”, in The Routledge Handbook of Phonological Theory, S. J. Hannahs and Anna R. K. Bosch (eds), London: Routledge, 425–452 (ch. 15).
Robbins, Philip and Zoe Drayson, 2009 [2025], “Modularity of Mind”, in The Stanford Encyclopedia of Philosophy (Fall 2025), Edward N. Zalta and Uri Nodelman (eds), URL = <https://plato.stanford.edu/archives/fall2025/entries/modularity-mind/>.
Ross, John Robert, 1967, Constraints on Variables in Syntax, PhD Thesis, Massachusetts Institute of Technology.
Rumelhart, David E. and James L. McClelland, 1986, “On Learning the Past Tenses of English Verbs”, in Parallel Distributed Processing, Volume 2: Explorations in the Microstructure of Cognition: Psychological and Biological Models, James L. McClelland, David E. Rumelhart, and The PDP Research Group (eds), Cambridge, MA: The MIT Press, 216–271 (ch. 18).
Saffran, Jenny R., Richard N. Aslin, and Elissa L. Newport, 1996, “Statistical Learning by 8-Month-Old Infants”, Science, 274(5294): 1926–1928. doi:10.1126/science.274.5294.1926
Sakas, William, 2016, “Computational Approaches to Parameter Setting in Generative Linguistics”, in The Oxford Handbook of Developmental Linguistics, Jeffrey Lidz, William Snyder, and Joe Pater (eds), Oxford: Oxford University Press, 696–724 (ch. 29).
Sakas, William Gregory and Janet Dean Fodor, 2012, “Disambiguating Syntactic Triggers”, Language Acquisition, 19(2): 83–143. doi:10.1080/10489223.2012.660553
Samuels, Bridget D., 2015, “Can a Bird Brain Do Phonology?”, Frontiers in Psychology, 6: article 1082. doi:10.3389/fpsyg.2015.01082
Samuels, Bridget D., Marc Hauser, and Cedric Boeckx, 2016, “Looking for UG in Animals: A Case Study in Phonology”, in The Oxford Handbook of Universal Grammar, Ian Roberts (ed.), Oxford/New York: Oxford University Press, 527–546. doi:10.1093/oxfordhb/9780199573776.013.22
Samuels, Richard, 2002, “Nativism in Cognitive Science”, Mind & Language, 17(3): 233–265. doi:10.1111/1468-0017.00197
–––, 2004, “Innateness in Cognitive Science”, Trends in Cognitive Sciences, 8(3): 136–141. doi:10.1016/j.tics.2004.01.010
Sandler, Wendy, Mark Aronoff, Irit Meir, and Carol Padden, 2011, “The Gradual Emergence of Phonological Form in a New Language”, Natural Language & Linguistic Theory, 29(2): 503–543. doi:10.1007/s11049-011-9128-2
Seidenberg, Mark S. and David C. Plaut, 2014, “Quasiregularity and Its Discontents: The Legacy of the Past Tense Debate”, Cognitive Science, 38(6): 1190–1228. doi:10.1111/cogs.12147
Senghas, Ann and Marie Coppola, 2001, “Children Creating Language: How Nicaraguan Sign Language Acquired a Spatial Grammar”, Psychological Science, 12(4): 323–328. doi:10.1111/1467-9280.00359
Shea, Nicholas, 2018, Representation in Cognitive Science, Oxford: Oxford University Press. doi:10.1093/oso/9780198812883.001.0001
Sim, Zi, Sylvia Yuan, and Fei Xu, 2011, “Acquiring Word Learning Biases”, Proceedings of the Annual Meeting of the Cognitive Science Society, 33: 2544–2549. [Sim, Yuan, and Xu 2011 available online]
Smith, N. V. and Ianthi-Maria Tsimpli, 1995, The Mind of a Savant: Language-Learning and Modularity, Oxford/Cambridge, MA: Blackwell Publishers.
Spelke, Elizabeth, 2022, What Babies Know: Core Knowledge and Composition. Volume 1 (Oxford Cognitive Development Series), New York: Oxford University Press. doi:10.1093/oso/9780190618247.001.0001
Spelke, Elizabeth S., Emily P. Bernier, and Amy E. Skerry, 2013, “Core Social Cognition”, in Navigating the Social World: What Infants, Children, and Other Species Can Teach Us (Oxford Series in Social Cognition and Social Neuroscience), Mahzarin R. Banaji and Susan A. Gelman (eds), Oxford: Oxford University Press, 11–16 (ch. 1.3). doi:10.1093/acprof:oso/9780199890712.003.0003
Steinert-Threlkeld, Shane, 2020, “Toward the Emergence of Nontrivial Compositionality”, Philosophy of Science, 87(5): 897–909. doi:10.1086/710628
Tallerman, Maggie, 2009, “If Language Is a Jungle, Why Are We All Cultivating the Same Plot?”, Behavioral and Brain Sciences, 32(5): 469–470. doi:10.1017/S0140525X09990598
Tomasello, Michael, 2003, Constructing a Language: A Usage-Based Theory of Language Acquisition, Cambridge, MA: Harvard University Press.
Trueswell, John C., Tamara Nicol Medina, Alon Hafri, and Lila R. Gleitman, 2013, “Propose but Verify: Fast Mapping Meets Cross-Situational Word Learning”, Cognitive Psychology, 66(1): 126–156. doi:10.1016/j.cogpsych.2012.10.001
Ullman, Michael T. and Elizabeth I. Pierpont, 2005, “Specific Language Impairment Is Not Specific to Language: The Procedural Deficit Hypothesis”, Cortex, 41(3): 399–433. doi:10.1016/S0010-9452(08)70276-4
Vargha-Khadem, Watkins, K., Alcock, K., Fletcher, P. and Passingham, R., 1995, “Praxic and nonverbal cognitive deficits in a large family with a genetically transmitted speech and language disorder,” Proceedings of the National Academy of Sciences (U.S.A.), 92: 930–33.
Volenec, Veno and Charles Reiss, 2020, “Formal Generative Phonology”, Radical: A Journal of Phonology, 2: 1–148. [Volenec and Reiss 2020 available online]
–––, 2025, “Adopting Large Language Models as a Theory of Language Does Refute Chomsky (but Not Like You Think)”, SKASE Journal of Theoretical Linguistics, 22(1): 2–17. [Volenec and Reiss 2025 available online]
Vong, Wai Keen, Wentao Wang, A. Emin Orhan, and Brenden M. Lake, 2024, “Grounded Language Acquisition through the Eyes and Ears of a Single Child”, Science, 383(6682): 504–511. doi:10.1126/science.adi1374
Warstadt, Alex and Samuel R. Bowman, 2022, “What Artificial Neural Networks Can Tell Us about Human Language Acquisition”, in Algebraic Structures in Natural Language, Shalom Lappin and Jean-Philipppe Bernardy (eds), Boca Raton: CRC Press, 17–60 (ch. 2).
Werker, Janet F. and Richard C. Tees, 1983, “Developmental Changes across Childhood in the Perception of Non-Native Speech Sounds”, Canadian Journal of Psychology / Revue Canadienne de Psychologie, 37(2): 278–286. doi:10.1037/h0080725
Wiesel, Torsten N. and David H. Hubel, 1963, “Effects of Visual Deprivation on Morphology and Physiology of Cells in the Cat’s Lateral Geniculate Body”, Journal of Neurophysiology, 26(6): 978–993. doi:10.1152/jn.1963.26.6.978
Xu, Fei and Joshua B. Tenenbaum, 2007, “Word Learning as Bayesian Inference”, Psychological Review, 114(2): 245–272. doi:10.1037/0033-295X.114.2.245
Yang, Charles D., 2002, Knowledge and Learning in Natural Language, Oxford/New York: Oxford University Press. doi:10.1093/oso/9780199254149.001.0001
–––, 2004, “Universal Grammar, Statistics or Both?” Trends in Cognitive Science, 8(10): 451–6. doi:10.1016/j.tics.2004.08.006
–––, 2006, The Infinite Gift: How Children Learn and Unlearn the Languages of the World, New York: Scribner.
–––, 2016, The Price of Linguistic Productivity: How Children Learn to Break the Rules of Language, Cambridge, MA: The MIT Press. doi:10.7551/mitpress/9780262035323.001.0001

Academic Tools

How to cite this entry.

Preview the PDF version of this entry at the Friends of the SEP Society.

Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO).

Enhanced bibliography for this entry at PhilPapers, with links to its database.

Other Internet Resources

Cowie, Fiona, “Innateness and Language”, Stanford Encyclopedia of Philosophy (Summer 2006 Edition), Edward N. Zalta & Uri Nodelman (eds.), URL = <https://plato.stanford.edu/archives/sum2006/entries/innateness-language/>. [This was the previous entry on this topic in the Stanford Encyclopedia of Philosophy – see the version history.]
Goldberg, Yoav, 2019, “Assessing BERT’s Syntactic Abilities”, arXiv:1901.05287. doi:10.48550/ARXIV.1901.05287
Idsardi, William J., 2005, “Poverty of the Stimulus Arguments in Phonology”
Papers on linguistic innateness, compiled at PhilPapers.org.
Papers on Large Language Models, compild at PhilPapers.org.
Entries on Noam Chomsky and Knowledge of Language in the Internet Encyclopedia of Philosophy.

Acknowledgments

Thanks to Gabbrielle Johnson and Jared Meoni for looking at a draft of this entry, and offering helpful advice on how to make it clearer, more accessible, and shorter.

Open access to the SEP is made possible by a world-wide funding initiative.
The Encyclopedia Now Needs Your Support
Please Read How You Can Help Keep the Encyclopedia Free

	How to cite this entry.
	Preview the PDF version of this entry at the Friends of the SEP Society.
	Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO).
	Enhanced bibliography for this entry at PhilPapers, with links to its database.

Innateness and Language

1. Introduction

2. Chomsky’s Poverty of the Stimulus Argument

3. Fleshing Out the Nativist Picture

3.1 Innateness and Evolution

3.2 Principles and Parameters

3.3 The Minimalist Program

3.4 The Parallel Architecture

4. Innate Morpho-Phonology

4.1 Categorical Phoneme Perception in Humans and Other Animals

4.2 Nativism About Phonological Features

4.3 Nativism About Phonotactic Preferences

4.4 Comparing and Contrasting Nativist Positions about Phonology

4.5 Innateness in Morphology

5. Nativism in Semantics

5.1 Lexical Semantics: Biases in Lexical Acquisition

5.2 Syntactic Bootstrapping

5.3 Compositional Semantics

6. Empiricist Responses

6.1 Enhancing the Primary Learning Data

6.2 Enhancing the Learning System

7. Empiricist Alternatives

7.1 Bayesian Language Acquisition

7.2 Language as Shaped by the Human Mind

7.3 Construction Grammar

Bibliography

Academic Tools

Other Internet Resources

Related Entries

Acknowledgments