Stanford Encyclopedia of Philosophy
This is a file in the archives of the Stanford Encyclopedia of Philosophy.

Population Genetics

First published Fri Sep 22, 2006

Population genetics is a field of biology that studies the genetic composition of biological populations, and the changes in genetic composition that result from the operation of various factors, including natural selection. Population geneticists pursue their goals by developing abstract mathematical models of gene frequency dynamics, trying to extract conclusions from those models about the likely patterns of genetic variation in actual populations, and testing the conclusions against empirical data. A number of the more robust generalizations to emerge from population-genetic analysis are discussed below.

Population genetics is intimately bound up with the study of evolution and natural selection, and is often regarded as the theoretical cornerstone of modern Darwinism. This is because natural selection is one of the most important factors that can affect a population's genetic composition. Natural selection occurs when some variants in a population out-reproduce other variants, as a result of being better adapted to the environment, or ‘fitter’. Presuming the fitness differences are at least partly due to genetic differences, this will cause the population's genetic makeup to be altered over time. By studying formal models of gene frequency change, population geneticists therefore hope to shed light on the evolutionary process, and to permit the consequences of different evolutionary hypotheses to be explored in a quantitatively precise way.

The field of population genetics came into being in the 1920s and 1930s, thanks to the work of R.A. Fisher, J.B.S. Haldane and Sewall Wright. Their achievement was to integrate the principles of Mendelian genetics, which had been rediscovered at the turn of century, with Darwinian natural selection. Though the compatibility of Darwinism with Mendelian genetics is today taken for granted, in the early years of the twentieth century it was not. Many of the early Mendelians did not accept Darwin's ‘gradualist’ account of evolution, believing instead that novel adaptations must arise in a single mutational step; conversely, many of the early Darwinians did not believe in Mendelian inheritance, often because of the erroneous belief that it was incompatible with the process of evolutionary modification as described by Darwin. By working out mathematically the consequences of selection acting on a population obeying the Mendelian rules of inheritance, Fisher, Haldane and Wright showed that Darwinism and Mendelism were not just compatible but excellent bed fellows; this played a key part in the formation of the ‘neo-Darwinian synthesis’, and explains why population genetics came to occupy so pivotal a role in evolutionary theory.

The discussion below is structured as follows. Section 1 outlines the history of population genetics, focusing on major themes. Section 2 explains the Hardy-Weinberg principle, one of the most important concepts in population genetics. Section 3 introduces the reader to simple population-genetic models of the evolutionary process, and discusses their significance. Section 4 discusses some conceptual issues surrounding population genetics, and its status in modern evolutionary biology.

1. The Origins of Population Genetics

To understand how population genetics came into being, and to appreciate its intellectual significance, a brief excursion into the history of biology is necessary. Darwin's Origin of Species, published in 1859, propounded two main theses: firstly, that modern species were descended from common ancestors, and secondly that the process of natural selection was the major mechanism of evolutionary change. The first thesis quickly won acceptance in the scientific community, but the second did not. Many people found it difficult to accept that natural selection could play the explanatory role required of it by Darwin's theory. This situation—accepting that evolution had happened but doubting Darwin's account of what had caused it to happen—persisted well into the twentieth century.

Opposition to natural selection was understandable, for Darwin's theory, though compelling, contained a major lacuna: an account of the mechanism of inheritance. For evolution by natural selection to occur, it is necessary that parents should tend to resemble their offspring; otherwise, fitness-enhancing traits will have no tendency to spread through a population. (For example, if fast zebras leave more offspring then slow ones, this will only lead to evolutionary change if the offspring of fast zebras are themselves fast runners.) In the Origin, Darwin rested his argument on the observed fact that offspring do tend to resemble their parents—‘the strong principle of inheritance’—while admitting that he did not know why this was. Darwin did later attempt an explicit theory of inheritance, based on hypothetical entities called ‘gemmules’, but it turned out to have no basis in fact.

Darwin was deeply troubled by not having a proper understanding of the inheritance mechanism, for it left him unable to rebut one of the most powerful objections to his overall theory. For a population to evolve by natural selection, the members of the population must vary—if all organisms are identical, no selection can occur. So for selection to gradually modify a population over a long period of time, in the manner suggested by Darwin, a continual supply of variation is needed. This was the basis for Fleeming Jenkins' famous objection to Darwin, namely that the available variation would be used up too fast. Jenkins' reasoning assumed a ‘blending’ theory of inheritance, i.e. that an offspring's phenotypic traits are a ‘blend’ of those of its parents. (So for example, if a short and a tall organism mate, the height of the offspring will be intermediate between the two.) Jenkins argued that given blending inheritance, a sexually reproducing population will become phenotypically homogenous in just a few generations, far shorter than the number of generations needed for natural selection to produce complex adaptations.

Fortunately for Darwin's theory, inheritance does not actually work the way Jenkins thought. The type of inheritance that we call ‘Mendelian’, after Gregor Mendel, is ‘particulate’ rather than ‘blending’—offspring inherit discrete hereditary particles (genes) from their parents, which means that sexual reproduction does not diminish the heritable variation present in the population. (See section 2, ‘The Hardy-Weinberg Principle’, below.) However, this realisation took a long time to come, for two reasons. Firstly, Mendel's work was overlooked by the scientific community for forty years. Secondly, even after the rediscovery of Mendel's work at the turn of the twentieth century, it was widely believed that Darwinian evolution and Mendelian inheritance were incompatible. The early Mendelians did not accept that natural selection played an important role in evolution, so were not well placed to see that Mendel had given Darwin's theory the lifeline it needed. The synthesis of Darwinism and Mendelism, which marked the birth of modern population genetics, was achieved by a long and tortuous route (cf. Provine 1971).

The key ideas behind Mendel's theory of inheritance are straightforward. In his experimental work on pea plants, Mendel observed an unusual phenomenon. He began with two ‘pure breeding’ lines, one producing plants with round seeds, the other wrinkled seeds. He then crossed these to produce the first daughter generation (the F1 generation). The F1 plants all had round seeds—the wrinkled trait had disappeared from the population. Mendel then crossed the F1 plants with each other to produce the F2 generation. Strikingly, approximately one quarter of the F2 plants had wrinkled seeds. So the wrinkled trait had made a comeback, skipping a generation.

These and similar observations were explained by Mendel as follows. He hypothesised that each plant contains a pair of ‘factors’ that together determine some aspect of its phenotype—in this case, seed shape. A plant inherits one factor from each of its parents. Suppose that there is one factor for round seeds (R), another for wrinkled seeds (W). There are then three possible types of plant: RR, RW and WW. An RR plant will have round seeds, a WW plant wrinkled seeds. What about an RW plant? Mendel suggested that it would have round seeds—the R factor is ‘dominant’ over the W factor. The observations could then be easily explained. The initial pure-breeding lines were RR and WW. The F1 plants were formed by RR × WW crosses, so were all of the RW type and thus had round seeds. The F2 plants were formed by RW × RW crosses, so contained a mixture of the RR, RW and WW types. If we assume that each RW parent transmits the R and W factors to its offspring with equal probability, then the F2 plants would contain RR, RW and WW in the ratio 1:2:1. (This assumption is known as Mendel's First Law or The Law of Segregation.) Since RR and RW both have round seeds, this explains why three quarters of the F2 plants had round seeds, one quarter wrinkled seeds.

Obviously, our modern understanding of heredity is vastly more sophisticated than Mendel's, but the key elements of Mendel's theory—discrete hereditary particles that come in different types, dominance and recessiveness, and the law of segregation—have turned out to be essentially correct. Mendel's ‘factors’ are the genes of modern population genetics, and the alternative forms that a factor can take (e.g. R versus W in the pea plant example) are known as the alleles of a gene. The law of segregation is explained by the fact that during gametogenesis, each gamete (sex cell) receives only one of each chromosome pair from its parent organism. Other aspects of Mendel's theory have been modified in the light of later discoveries. Mendel thought that most phenotypic traits were controlled by a single pair of factors, like seed shape in his pea plants, but it is now known that most traits are affected by many pairs of genes, not just one. Mendel believed that the pairs of factors responsible for different traits (e.g. seed shape and flower colour) segregated independently of each other, but we now know that this need not be so (see section 3.6, ‘Two-Locus Models and Linkage’, below). Despite these points, Mendel's theory marks a turning point in our understanding of inheritance.

The rediscovery of Mendel's work in 1900 did not lead the scientific community to be converted to Mendelism overnight. The dominant approach to the study of heredity at the time was biometry, spearheaded by Karl Pearson in London, which involved statistical analysis of the phenotypic variation found in natural populations. Biometricians were mainly interested in continuously varying traits such as height, rather than the ‘discrete’ traits such as seed shape that Mendel studied, and were generally believers in Darwinian gradualism. Opposed to the biometricians were the Mendelians, spearheaded by William Bateson, who emphasized discontinuous variation, and believed that major adaptive change could be produced by single mutational steps, rather than by cumulative natural selection à la Darwin. A heated controversy between the biometricians and the Mendelians ensued. As a result, Mendelian inheritance came to be associated with an anti-Darwinian view of evolution.

Population genetics as we know it today arose from the need to reconcile Mendel with Darwin, a need which became increasingly urgent as the empirical evidence for Mendelian inheritance began to pile up. The first significant milestone was R.A. Fisher's 1918 paper, ‘The Correlation between Relatives on the Supposition of Mendelian Inheritance’, which showed how the biometrical and Mendelian research traditions could be unified. Fisher demonstrated that if a given continuous trait, e.g. height, was affected by a large number of Mendelian factors, each of which made a small difference to the trait, then the trait would show an approximately normal distribution in a population. Since the Darwinian process was widely believed to work best on continuously varying traits, showing that the distribution of such traits was compatible with Mendelism was an important step towards reconciling Darwin with Mendel.

The full reconciliation was achieved in the 1920s and early 30s, thanks to the mathematical work of Fisher, Haldane and Wright. Each of these theorists developed formal models to explore how natural selection, and other evolutionary forces such as mutation, would modify the genetic composition of a Mendelian population over time. This work marked a major step forward in our understanding of evolution, for it enabled the consequences of various evolutionary hypotheses to be explored quantitatively rather than just qualitatively. Verbal arguments about what natural selection could or could not achieve, or about the patterns of genetic variation to which it could give rise, were replaced with explicit mathematical arguments. The strategy of devising formal models to shed light on the process of evolution is still the dominant research methodology in contemporary population genetics.

There were important intellectual differences between Fisher, Haldane and Wright, some of which have left legacies on the subsequent development of the subject. One difference concerned their respective attitudes towards natural selection. Fisher and Haldane were both strong Darwinians—they believed that natural selection was by far the most important factor affecting a population's genetic composition. Wright did not downplay the role of natural selection, but he believed that chance factors also played a crucial role in evolution, as did migration between the constituent populations of a species (See sections 3.3, ‘Random Drift’, and 3.4, ‘Migration’.) A related difference is that Wright emphasized epistasis, or non-additive interactions between the genes within a single genome, to a much greater extent than Fisher or Haldane. Despite these differences, the work of all three was remarkably consonant in overall approach.

2. The Hardy-Weinberg Principle

The Hardy-Weinberg principle, discovered independently by G.H. Hardy and W. Weinberg in 1908, is one of the simplest and most important principles in population genetics. To illustrate the principle, consider a large population of sexually reproducing organisms. The organisms are assumed to be diploids, meaning that they contain two copies of each chromosome, one received from each parent. The gametes they produce are haploid, meaning that they contain only one of each chromosome pair. During sexual fusion, two haploid gametes fuse to form a diploid zygote, which then grows and develops into an adult organism. Most multi-celled animals, and many plants, have a lifecycle of this sort.

Suppose that at a given locus, or chromosomal ‘slot’, there are two possible alleles, A1 and A2; the locus is assumed to be on an autosome, not a sex chromosome. With respect to the locus in question, there are three possible genotypes in the population, A1A1, A1A2 and A2A2 (just as in Mendel's pea plant example above). Organisms with the A1A1 and A2A2 genotypes are called homozygotes; those with the A1A2 genotype are heterozygotes. The proportions, or relative frequencies, of the three genotypes in the overall population may be denoted f(A1A1), f(A1A2) and f(A2A2) respectively, where f(A1A1) + f(A1A2) + f(A2A2) = 1. It is assumed that these genotypic frequencies are the same for both males and females. The relative frequencies of the A and B alleles in the population may be denoted p and q, where p + q = 1.

The Hardy-Weinberg principle is about the relation between the allelic and the genotypic frequencies. It states that if mating is random in the population, and if the evolutionary forces of natural selection, mutation, migration and drift are absent, then in the offspring generation the genotypic and allelic frequencies will be related by the following simple equations:

f(A1A1) = p2,    f(A1A2) = 2pq,    f(A2A2) = q2

Random mating means the absence of a genotypic correlation between mating partners, i.e. the probability that a given organism mates with an A1A1 partner, for example, does not depend on the organism's own genotype, and similarly for the probability of mating with a partner of one of the other two types.

That random mating will lead the genotypes to be in the above proportions (so-called Hardy-Weinberg proportions) is a consequence of Mendel's law of segregation. To see this, note that random mating is in effect equivalent to offspring being formed by randomly picking pairs of gametes from a large ‘gamete pool’ and fusing them into a zygote. The gamete pool contains all the successful gametes of the parent organisms. Since we are assuming the absence of selection, all parents contribute equal numbers of gametes to the pool. By the law of segregation, an A1A2 heterozygote produces gametes bearing the A1 and A2 alleles in equal proportion. Therefore, the relative frequencies of the A and B alleles in the gamete pool will be the same as in the parental population, namely p and q respectively. Given that the gamete pool is very large, when we pick pairs of gametes from the pool at random, we will get the ordered genotypic pairs {A1A1}, {A1A2}, {A2A1}, {A2A2} in the proportions p2:pq:pq:q2. But order does not matter, so we can regard the {A1A2} and {A2A1} pairs as equivalent, giving the Hardy-Weinberg proportions for the unordered offspring genotypes.

This simple derivation of the Hardy-Weinberg principle deals with two alleles at a single locus, but can easily be extended to multiple alleles. (Extension to more than one locus is trickier; see section 3.6, ‘Two-Locus Models and Linkage’, below.) For the multi-allelic case, suppose there are n alleles at the locus, A1An, with relative frequencies of p1pn respectively, where p1 + p2 + … + pn = 1. Assuming again that the population is large, mating is random, evolutionary forces are absent, and Mendel's law of segregation holds, then in the offspring generation the frequency of the AiAi genotype will be pi2, and the frequency of the (unordered) AiAj genotype (ij) will be 2pipj. It is easy to see that the two allele case above is a special case of this generalized principle.

Importantly, whatever the initial genotypic proportions, random mating will automatically produce offspring in Hardy-Weinberg proportions (for one-locus genotypes). So if generations are non-overlapping, i.e. parents die as soon as they have reproduced, just one round of random mating is needed to bring about Hardy-Weinberg proportions in the whole population; if generations overlap, more than one round of random mating is needed. Once Hardy-Weinberg proportions have been achieved, they will be maintained in subsequent generations so long as the population continues to mate at random and is unaffected by evolutionary forces such as selection, mutation etc. The population is then said to be in Hardy-Weinberg equilibrium—meaning that the genotypic proportions are constant from generation to generation.

The importance of the Hardy-Weinberg principle lies in the fact that it contains the solution to the problem of blending that troubled Darwin. As we saw, Jenkins argued that with sexual reproduction, the variation in the population would be exhausted very rapidly. But the Hardy-Weinberg principle teaches us that this is not so. Sexual reproduction has no inherent tendency to destroy the genotypic variation present in the population, for the genotypic proportions remain constant over generations, given the assumptions noted above. It is true that natural selection often tends often to destroy variation, and is thus a homogenizing force; but this is a quite different matter. The ‘blending’ objection was that sexual mixing itself would produce homogeneity, even in the absence of selection, and the Hardy-Weinberg principle shows that this is untrue.

Another benefit of the Hardy-Weinberg principle is that it greatly simplifies the task of modelling evolutionary change. When a population is in Hardy-Weinberg equilibrium, it is possible to track the genotypic composition of the population by directly tracking the allelic frequencies (or gametic frequencies). That this is so is clear—for if we know the relative frequencies of all the alleles (at a single locus), and know that the population is in Hardy-Weinberg equilibrium, the entire genotype frequency distribution can be easily computed. Were the population not in Hardy-Weinberg equilibrium, we would need to explicitly track the genotype frequencies themselves, which is more complicated.

Primarily for this reason, many population-genetic models assume that Hardy-Weinberg equilibrium obtains; as we have seen, this is tantamount to assuming that mating is random with respect to genotype. But is this assumption empirically plausible? The answer is sometimes but not always. In the human population, for example, mating is random with respect to ABO blood group, so the genotype that determines blood group is found in Hardy-Weinberg proportions in many populations (cf. Hartl 1980). But mating is not random with respect to height; on the contrary, people tend to choose mates similar in height to themselves. So if we consider a genotype that influences height, mating will not be random with respect to this genotype (see section 3.5 ‘Non-Random Mating’).

The geneticist W.J. Ewens has written of the Hardy-Weinberg principle, ‘it does not often happen that the most important theorem in any subject is the easiest and most readily derived theorem for that subject’ (1969, p. 1). The main importance of the principle, as Ewens stresses, is not the gain in mathematical simplicity that it permits, which is simply a beneficial side effect, but rather what it teaches us about the preservation of genetic variation in a sexually reproducing population.

3. Population-Genetic Models of Evolution

Population geneticists usually define ‘evolution’ as any change in a population's genetic composition over time. The four factors that can bring about such a change are: natural selection, mutation, random genetic drift, and migration into or out of the population. (A fifth factor—changes to the mating pattern—can change the genotype but not the allele frequencies; many theorists would not count this as an evolutionary change.) A brief introduction to the standard population-genetic treatment of each of these factors is given below.

3.1 Selection at One Locus

Natural selection occurs when some genotypic variants in a population enjoy a survival or reproduction advantage over others. The simplest population-genetic model of natural selection assumes one autosomal locus with two alleles, A1 and A2, as above. The three diploid genotypes A1A1, A1A2 and A2A2 have different fitnesses, denoted by w11, w12 and w22 respectively. These fitnesses are assumed to be constant across generations. A genotype's fitness may be defined, in this context, as the average number of successful gametes that an organism of that genotype contributes to the next generation—which depends on how well the organism survives, how many matings it achieves, and how fertile it is. Unless w11, w12 and w22 are all equal, then natural selection will occur, possibly leading the genetic composition of the population to change.

Suppose that initially, i.e. before selection has operated, the zygote genotypes are in Hardy-Weinberg proportions and the frequencies of the A1 and A2 alleles are p and q respectively, where p + q = 1. The zygotes then grow to adulthood and reproduce, giving rise to a new generation of offspring zygotes. Our task is to compute the frequencies of A1 and A2 in the second generation; let us denote these by p′ and q′ respectively, where p′ + q′ = 1. (Note that in both generations, we consider gene frequencies at the zygotic stage; these may differ from the adult gene frequencies if there is differential survivorship.)

In the first generation, the genotypic frequencies at the zygotic stage are p2, 2pq and q2 for A1A1, A1A2, A2A2 respectively, by the Hardy-Weinberg law. The three genotypes produce successful gametes in proportion to their fitnesses, i.e. in the ratio w11:w12:w22. The average fitness in the population is w = p2 w11 + 2pq w12 + q2 w22, so the total number of successful gametes produced is Nw, where N is population size. Assuming there is no mutation, and that Mendel's law of segregation holds, then an A1A1 organism will produce only A1 gametes, an A2A2 organism will produce only A2 gametes, and an A1A2 organism will produce A1 and A2 gametes in equal proportion. Therefore, the proportion of A1 gametes, and thus the frequency of the A1 allele in the second generation at the zygotic stage, is:

p = [N p2 w11 + ½ (N 2pq w12)] / Nw
= (p2 w11 + pq w12) / w (1)

Equation (1) is known as a ‘recurrence’ equation—it expresses the frequency of the A1 allele in the second generation in terms of its frequency in the first generation. The change in frequency between generations can then be written as:

Δp = p′ − p
= (p2 w11 + pq w12) / wp
= pq [p(w11w12) + q(w12w22)] / w (2)

If Δp > 0, then natural selection has led the A1 allele to increase in frequency; if Δp < 0 then selection has led the A2 allele to increase in frequency. If Δp = 0 then no gene frequency change has occurred, i.e. the system is in allelic equilibrium. (Note, however, that the condition Δp = 0 does not imply that no natural selection has occurred; the condition for that is w11 = w12 = w22. It is possible for natural selection to occur but to have no effect on gene frequencies.)

Equations (1) and (2) show, in precise terms, how fitness differences between genotypes will lead to evolutionary change. This enables us to explore the consequences of various different selective regimes. Suppose firstly that w11 > w12 > w22, i.e. the A1A1 homozygote is fitter than the A1A2 heterozygote, which in turn is fitter than the A2A2 homozygote. By inspection of equation (2), we can see that Δp must be positive (so long as neither p nor q is zero, in which case Δp = 0). So in each generation, the frequency of the A1 allele will be greater than in the previous generation, until it eventually reaches fixation and the A2 allele is eliminated from the population. Once the A1 allele reaches fixation, i.e. p = 1 and q = 0, no further evolutionary change will occur, for if p = 1 then Δp = 0.

It is obvious that analogous reasoning applies in the case where w22 > w12 > w11. Equation (2) tells us that Δp must then be negative, so long as neither p nor q is zero, so the A2 allele will sweep to fixation, eliminating the A1 allele. A more interesting situation arises when the heterozygote is superior in fitness to both of the homozygotes, i.e. w12 > w11 and w12 > w22—a phenomenon known as heterosis. Intuitively it is clear what should happen in this situation: an equilibrium situation should be reached in which both alleles are present in the population. Equation (2) confirms this intuition. It is easy to see that Δp = 0 if either allele has gone to fixation (i.e. if p = 0 or q = 0), or, thirdly, if the following condition obtains:

p(w11w12) + q(w12w22) = 0

which reduces to

p = p* = (w12w22) / (w12w22) + (w12w11)

(The asterisk indicates that this is an equilibrium condition.) Since p must be non-negative, this condition can only be satisfied if there is heterozygote superiority or heterozygote inferiority; it represents an equilibrium state of the population in which both alleles are present. This equilibrium is known as polymorphic, by contrast with the monomorphic equilibria that arise when either of the alleles has gone to fixation. The possibility of polymorphic equilibrium is quite significant. It teaches us that natural selection will not always produce genetic homogeneity; in some cases, selection preserves the genetic variation found in a population.

Numerous further questions about natural selection can be addressed using a simple population-genetic model. For example, by incorporating a parameter which measures the fitness differences between genotypes, we can study the rate of evolutionary change, permitting us to ask questions such as: how many generations are needed for selection to increase the frequency of the A1 allele from 0.1 to 0.9? If a given deleterious allele is recessive, how much longer will it take to eliminate it from the population than if it were dominant? By permitting questions such as these to be formulated and answered, population geneticists have brought mathematical rigour to the theory of evolution, to an extent that would have seemed unimaginable in Darwin's day.

Of course, the one-locus model discussed above is too simple to apply to many real-life populations, for it incorporates simplifying assumptions that are unlikely to hold true. Selection is rarely the only evolutionary force in operation, genotypic fitnesses are unlikely to be constant across generations, Mendelian segregation does not always hold exactly, and so-on. Much research in population genetics consists in devising more realistic evolutionary models, which rely on fewer simplifying assumptions and are thus more complicated. But the one-locus model illustrates the essence of population-genetic reasoning, and the attendant clarification of the evolutionary process that it brings.

3.2 Selection-Mutation Balance

Mutation is the ultimate source of genetic variation, preventing populations from becoming genetically homogeneous in situations where they otherwise would. Once mutation is taken into account, the conclusions drawn in the previous section need to be modified. Even if one allele is selectively superior to all others at a given locus, it will not become fixed in the population; recurrent mutation will ensure that other alleles are present at low frequency, thus maintaining a degree of polymorphism. Population geneticists have long been interested in exploring what happens when selection and mutation act simultaneously.

Continuing with our one-locus, two allele model, let us suppose that the A1 allele is selectively superior to A2, but recurrent mutation from A1 to A2 prevents A1 from spreading to fixation. The rate of mutation from A1 to A2 per generation, i.e. the proportion of A1 alleles that mutate every generation, is denoted u. (Empirical estimates of mutation rates are typically in the region of 10-6.) Back mutation from A2 to A1 can be ignored, because we are assuming that the A2 allele is at a very low frequency in the population, thanks to natural selection. What happens to the gene frequency dynamics under these assumptions? Recall equation (1) above, which expresses the frequency of the A1 allele in terms of its frequency in the previous generation. Since a certain fraction (u) of the A1 alleles will have mutated to A2, this recurrence equation must be modified to:

p′ = (p2 w11 + pq w12) (1 − u) / w

to take account of mutation. As before, equilibrium is reached when p′ = p, i.e. Δp = 0. The condition for equilibrium is therefore:

p = p* = (p2 w11 + pq w12) (1 − u) / w (3)

A useful simplification of equation (3) can be achieved by making some assumptions about the genotype fitnesses, and adopting a new notation. Let us suppose that the A2 allele is completely recessive (as is often the case for deleterious mutants). This means that the A1A1 and A1A2 genotypes have identical fitness. Therefore, genotypic fitnesses can be written w11 = 1, w12 = 1, w22 = 1 − s, where s denotes the difference in fitness of the A2A2 homozygote from that of the other two genotypes. (s is known as the selection co-efficient against A2A2). Since we are assuming that the A2 allele is deleterious, it follows that s > 0. Substituting these genotype fitnesses in equation (3) yields:

p* = p (1 − u) / p2 + 2pq + q2(1 − s)

which reduces to:

p* = 1 − (u/s)½

or equivalently (since p + q = 1):

q* = (u/s)½ (4)

Equation (4) gives the equilibrium frequency of the A2 allele, under the assumption that it is completely recessive. Note that as u increases, q* increases too. This is highly intuitive: the greater the mutation rate from A1 to A2, the greater the frequency of A2 that can be maintained at equilibrium, for a given value of s. Conversely, as s increases, q* decreases. This is also intuitive: the stronger the selection against the A2A2 homozygote, the lower the equilibrium frequency of A2, for a given value of u.

It is easy to see why equation (4) is said to describe selection-mutation balance—natural selection is continually removing A2 alleles from the population, while mutation is continually re-creating them. Equation (4) tells us the equilibrium frequency of A2 that will be maintained, as a function of the rate of mutation from A1 to A2 and the magnitude of the selective disadvantage suffered by the A2A2 homozygote. Importantly, equation (4) was derived under the assumption that the A2 allele is completely recessive, i.e. that the A1A2 heterozygote is phenotypically identical to the A1A1 homozygote. However, it is straightforward to derive similar equations for the cases where the A2 allele is dominant, or partially dominant. If A2 is dominant, or partially dominant, its equilibrium frequency will be lower than if it is completely recessive; for selection is more efficient at removing it from the population. A deleterious allele that is recessive can ‘hide’ in heterozygotes, and thus escape the purging power of selection, but a dominant allele cannot.

Before leaving this topic, one final point should be noted. Our discussion has focused exclusively on deleterious mutations, i.e. ones which reduce the fitness of their host organism. This may seem odd, given that beneficial mutations play so crucial a role in the evolutionary process. The reason is that in population genetics, a major concern is to understand the causes of the genetic variability found in biological populations. If a gene is beneficial, natural selection is likely to be the major determinant of its equilibrium frequency; the rate of sporadic mutation to that gene will play at most a minor role. It is only where a gene is deleterious that mutation plays a major role in maintaining it in a population.

3.3 Random Drift

Random genetic drift refers to the chance fluctuations in gene frequency that arise in finite populations; it can be thought of as a type of ‘sampling error’. In many evolutionary models, the population is assumed to be infinite, or very large, precisely in order to abstract away from chance fluctuations. But though mathematically convenient, this assumption is often unrealistic. In real life, chance factors will invariably affect organisms' survival and reproductive success, thus causing gene frequencies to change.

To understand the nature of random drift, consider a simple example. A population contains just ten organisms, five of type A and five of type B; the organisms reproduce asexually and beget offspring of the same type. Suppose that neither type is selectively superior to the other—both are equally well-adapted to the environment. However, this does not imply that the two types will produce identical numbers of offspring, for chance factors may play a role. For example, it is possible that all the type Bs might die by accident before reproducing; in which case the frequency of B in the second generation will fall to zero. If so, then the decline of the B type (and thus the spread of the A type) is the result of random drift. Evolutionists are often interested in knowing whether a given gene frequency change is the result of drift, selection, or some combination of the two.

The label ‘random drift’ is slightly misleading. In saying that the spread of the A type is due to random drift, or chance, we do not mean that no cause can be found of its spread. In theory, we could presumably discover the complete causal story about why each organism in the population left exactly the number of offspring that it did. In ascribing the evolutionary change to random drift, we are not denying that there is such a causal story to be told. Rather, we mean that the spread of the A type was not due to its adaptive superiority over the B type. Put differently, the A and the B types had the same expected number of offspring, so were equally fit; but the A types had a greater actual number of offspring. In a finite population, actual reproductive output will almost always deviate from expectation, leading to evolutionary change.

An analogy with coin tossing can illuminate random drift. Suppose a fair coin is tossed ten times. The probability of heads on any one toss is ½, and so the expected frequency of heads in the sequence of ten is 50%. But the probability of actually getting half heads and half tails is only 242/1024, or approximately 23.6%. So even though the coin is fair, we are unlikely to get equal proportions of heads and tails in a sequence of ten tosses; some deviation from expectation is more probable than not. In just the same way, even though the A and B types are equally fit in the example above, it is likely that some evolutionary change will occur. This analogy also illustrates the role of population size. If we tossed the coin a hundred times rather than ten, the proportion of heads would probably be very close to ½. In just the same way, the larger the population, the less important the effect of random drift on gene frequencies; in the infinite limit, drift has no effect.

Drift greatly complicates the task facing the population geneticist. In the example above, it is obviously impossible to deduce the composition of the population in the second generation from its composition in the first generation; at most, we can hope to deduce the probability distribution over all the possible compositions. So once drift is taken into account, no simple recurrence relation for gene frequencies, of the sort expressed in equation (1) above, can be derived. To analyse the evolutionary consequences of drift, population geneticists use a mathematical technique known as diffusion modelling, which is beyond the scope of this article. However, many of these consequences are quite intuitive, and can be understood without the mathematics.

One important effect of random drift is to decrease the degree of heterozygosity in a population over time. This happens because, given enough time, any finite population will eventually become homozygous through drift (though if the population is large, the approach to homozygosity will be slow.) It is easy to see why this is—for gene frequencies of 0 and 1 are ‘absorbing boundaries’, meaning that once the boundary is reached, there is no way back from it (apart from mutation). So eventually, a given allele will eventually become fixed in a population, or go extinct, the latter being the more likely fate. Indeed mathematical models show that a neutral allele arising by mutation has a very low probability of becoming fixed in a population; the larger the population, the lower the probability of fixation.

Another important effect of random drift is to cause the different subpopulations of a species to diverge genetically from each other, as the chance accumulation of alleles will probably proceed differently in each, particularly if the alleles confer little selective advantage or disadvantage. By chance, one population may become fixed for allele A1, while a second population becomes fixed for another allele A2. This possibility is an important one, for if we ignore it, we may mistakenly conclude that the A1 allele must have been advantageous in the environment of the first population, the A2 allele in the environment of the second, i.e. that selection was responsible for the genetic differentiation. Such an explanation might be right, but it is not the only one—random drift provides an alternative.

The question of whether drift or selection plays a more important role in molecular evolution was much debated in the 1960s and 1970s; it lay at the heart of the heated controversy between ‘selectionists’ and ‘neutralists’. The neutralist camp, headed by M. Kimura, argued that most molecular variants had no effect on phenotype, so were not subject to natural selection; random drift was the major determinant of their fate. Kimura argued that the apparently constant rate at which the amino acid sequences of proteins evolved, and the extent of genetic polymorphism observed in natural populations, could best be explained by the neutralist hypothesis. Selectionists countered that natural selection was also capable of explaining the molecular data. In recent years, the controversy has subsided somewhat, without a clear victory for either side. Most biologists believe that some molecular variants are indeed neutral, though fewer than were claimed by the original neutralists.

3.4 Migration

Migration into or out of a population is the fourth and final factor that can affect its genetic composition. Obviously, if immigrants are genetically different from the population they are entering, this will cause the population's genetic composition to be altered. The evolutionary importance of migration stems from the fact that many species are composed of a number of distinct subpopulations, largely isolated from each other but connected by occasional migration. (For an extreme example of population subdivision, think of ant colonies.) Migration between subpopulations gives rise to gene flow, which acts as a sort of ‘glue’, limiting the extent to which subpopulations can diverge from each other genetically.

The simplest model for analysing migration assumes that a given population receives a number of migrants each generation, but sends out no emigrants. Suppose the frequency of the A1 allele in the resident population is p, and the frequency of the A1 allele among the migrants arriving in the population is pm. The proportion of migrants coming into the population each generation is m (i.e. as a proportion of the resident population.) So post-migration, the frequency of the A1 allele in the population is:

p′ = (1 − m) p + mpm

The change in gene frequency across generations is therefore:

Δp = p′ − p
= m (ppm)

Therefore, migration will increase the frequency of the A1 allele if pm > p, decrease its frequency if p > pm, and leave its frequency unchanged if p = pm. It is then a straightforward matter to derive an equation giving the gene frequency in generation t as a function of its initial frequency and the rate of migration. The equation is:

pt = pm + (p0pm)(1 − m)t

where p0 is the initial frequency of the A1 allele in the population, i.e. before any migration has taken place. Since the expression (1 − m)t tends towards zero as t grows large, it is easy to see that equilibrium is reached when pt = pm, i.e. when the gene frequency of the migrants equals the gene frequency of the resident population.

This simple model assumes that migration is the only factor affecting gene frequency at the locus, but this is unlikely to be the case. So it is necessary to consider how migration will interact with selection, drift and mutation. A balance between migration and selection can lead to the maintenance of a deleterious allele in a population, in a manner closely analogous to mutation-selection balance, discussed above. The interaction between migration and drift is especially interesting. We have seen that drift can lead the separate subpopulations of a species to diverge genetically. Migration opposes this trend—it is a homogenising force that tends to make subpopulations more alike. Mathematical models suggest that that even a fairly small rate of migration will be sufficient to prevent the subpopulations of a species from diverging genetically. Some theorists have used this to argue against the evolutionary importance of group selection, on the grounds that genetic differences between groups, which are essential for group selection to operate, are unlikely to persist in the face of migration.

3.5 Non-Random Mating

Recall that the Hardy-Weinberg law, the starting point for most population-genetic analysis, was derived under the assumption of random mating. But departures from random mating are actually quite common. Organisms may tend to choose mates who are similar to them phenotypically or genotypically—a mating system known as ‘positive assortment’. Alternatively, organisms may choose mates are dissimilar to them—‘negative assortment’. Another type of departure from random mating is inbreeding, or preferentially mating with relatives.

Analysing the consequences of non-random mating is quite complicated, but some conclusions are fairly easily seen. Firstly and most importantly, non-random mating does not in itself affect gene frequencies (so is not an evolutionary ‘force’ on a par with selection, mutation, migration and drift); rather, it affects genotype frequencies. To appreciate this point, note that the gene frequency of a population, at the zygotic stage, is equal to the gene frequency in the pool of successful gametes from which the zygotes are formed. The pattern of mating simply determines the way in which haploid gametes are ‘packaged’ into diploid zygotes. Thus if a random mating population suddenly starts to mate non-randomly, this will have no effect on gene frequencies.

Secondly, positive assortative mating will tend to decrease the proportion of heterozygotes in the population, thus increasing the genotypic variance. To see this, consider again a single locus with two alleles, A1 and A2, with frequencies p and q in a given population. Initially the population is at Hardy-Weinberg equilibrium, so the proportion of A1A2 heterozygotes is 2pq. If the population then starts to mate completely assortatively, i.e. mating only occurs between organisms of identical genotype, it is obvious that the proportion of heterozygotes must decline. For A1A1 × A1A1 and A2A2 × A2A2 matings will produce no heterozygotes; and only half the progeny of A1A2 × A1A2 matings will be heterozygotic. So the proportion of heterozygotes in the second generation must be less than 2pq. Conversely, negative assortment will tend to increase the proportion of heterozygotes from what it would be under Hardy-Weinberg equilibrium.

What about inbreeding? In general, inbreeding will tend to increase the homozygosity of a population, like positive assortment. The reason for this is obvious—relatives tend to be more genotypically similar than randomly chosen members of the population. In the majority of species, including the human species, inbreeding has negative effects on organismic fitness—a phenomenon known as ‘inbreeding depression’. The explanation for this is that deleterious alleles often tend to be recessive, so have no phenotypic effect when found in heterozygotes. Inbreeding reduces the proportion of heterozygotes, making recessive alleles more likely to be found in homozygotes where their negative phenotypic effects become apparent. The converse phenomenon—‘hybrid vigour’ resulting from outbreeding—is widely utilised by animal and plant breeders.

3.6 Two-Locus Models and Linkage

So far, our exposition has dealt with gene frequency change at a single locus, which is the simplest sort of population-genetic analysis. However, in practice it is unlikely that an organism's fitness will depend on its single-locus genotype, so there is a limit on the extent to which single-locus models can illuminate the evolutionary process. The aim of two-locus (and more generally, multi-locus) models is to track changes in gene frequency at more than one locus simultaneously. Such models are invariably more complicated that their single-locus counterparts, but achieve greater realism.

The simplest two-locus model assumes two autosomal loci, A and B, each with two alleles, A1 and A2, B1 and B2 respectively. Thus there are four types of haploid gamete in the population—A1B1, A1B2, A2B1 and A2B2—whose frequencies we will denote by x1, x2, x3 and x4 respectively. (Note that the xi are not allele frequencies; in the two-locus case, we cannot equate ‘gamete frequency’ with ‘allelic frequency’, as is possible for a single locus.) Diploid organisms are formed by the fusion of two gametes, as before. Thus there are ten possible diploid genotypes in the population—found by taking each gamete type in combination with every other.

In the one-locus case, we saw that in a large randomly mating population, there is a simple relationship between the frequencies of the gamete types and of the zygotic genotypes that they form. In the two-locus case, the same relationship holds. Thus for example, the frequency of the A1B1 / A1B1 genotype will be (x1)2; the frequency of the A1B1 / A2B1 genotype will be 2x1x3, and so-on. (This can be established rigorously with an argument based on random sampling of gametes, analogous to the argument used in the one-locus case.) The first aspect of the Hardy-Weinberg law—genotypic frequencies given by the square of the array of gametic frequencies—therefore transposes neatly to the two-locus case. However, the second aspect of the Hardy-Weinberg law—stable genotypic frequencies after one round of random mating—does not generally apply in the two-locus case, as we will see.

A key concept in two-locus population genetics is that of linkage, or lack of independence between the two loci. To understand linkage, consider the set of gametes produced by an organism of the A1B1 / A2B2 genotype, i.e. a double heterozygote. If the two loci are unlinked, then the composition of this set will be {¼ A1B1, ¼ A1B2, ¼ A2B1, ¼ A2B2}, i.e. all four gamete types will be equally represented. (We are presuming that Mendel's first law holds at both loci.) So unlinked loci are independent—which allele a gamete has at the A locus tells us nothing about which allele it has at the B locus. The opposite extreme is perfect linkage. If the two loci are perfectly linked, then the set of gametes produced by the A1B1 / A2B2 double heterozygote has the composition {½ A1B1, ½ A2B2}; this means that if a gamete receives the A1 allele at the A locus, it necessarily receives the B1 allele locus at the B locus.

In physical terms, perfect linkage means that the A and B loci are located close together on the same chromosome; the alleles at the two loci are thus inherited as a single unit. Unlinked loci are either on different chromosomes, or on the same chromosome but separated by a considerable distance, hence likely to be broken up by recombination. Where the loci are on the same chromosome, perfect linkage and complete lack of linkage are two ends of a continuum. The degree of linkage is measured by the recombination fraction r, where 0 ≤ r ≤ ½. The composition of the set of gametes produced by an organism of the A1B1 / A2B2 genotype can be written in terms of r, as follows:

A1B1 ½ (1 − r)
A1B2 ½ r
A2B1 ½ r
A2B2 ½ (1 − r)

It is easy to see that r = ½ means that the loci are unlinked, so all four gamete types are produced in equal proportion, while r = 0 means that they are perfectly linked.

In a two-locus model, the gametic (and therefore genotypic) frequencies need not be constant across generations, even in the absence of selection, mutation, migration and drift, unlike in the one-locus case. (Though allelic frequencies will of course be constant, in the absence of any evolutionary forces.) It is possible to write recurrence equations for the gamete frequencies, as a function of their frequencies in the previous generation plus the recombination fraction. The equations are:

x1′ = x1 + r(x2x3x1x4)
x2′ = x2 + r(x2x3x1x4)
x3′ = x3 + r(x2x3x1x4)
x4= x4 + r(x2x3x1x4)

(See Ewens 1969 or Edwards 2000 for an explicit derivation of these equations.)

From the recurrence equations, it is easy to see that gametic (and thus genotypic) frequencies will be stable across generations, i.e. xi′ = xi for each i, under either of two conditions: (i) r = 0, or (ii) x2x3x1x4 = 0. Condition (i) means that the two loci are perfectly linked, and thus in effect behaving as one locus; condition (ii) means that the two loci are in ‘linkage equilibrium’, which means that the alleles at the A-locus are in random association with the alleles at the B-locus. More precisely, linkage equilibrium means that the population-wide frequency of the AiBi gamete is equal to the frequency of the Ai allele multiplied by the frequency of the Bi allele.

An important result in two-locus theory shows that, given random mating, the quantity (x2x3x1x4) will decrease every generation until it reaches zero—at which point the genotype frequencies will be in equilibrium. So a population initially in linkage disequilibrium will approach linkage equilibrium over a number of generations. The rate of approach depends on the value of r, the recombination fraction. Note the contrast with the one-locus case, where just one round of random mating is sufficient to bring the genotype frequencies into equilibrium.

4. The Status of Population Genetics

The status of population genetics in contemporary biology is an interesting issue. Despite its centrality to evolutionary theory, and its historical importance, population genetics is not without its critics. Some argue that population geneticists have devoted too much energy to developing theoretical models, often with great mathematical ingenuity, and too little to actually testing the models against empirical data (cf. Wade 2005). Others argue that population-genetic models are usually too idealized to shed any real light on the evolutionary process. Still others have argued that, historically, population genetics has had a relatively minor impact on the actual practice of most evolutionary biologists, despite the lip-service often paid to it (cf. Lewontin 1980).

To some extent the underlying issue here is a general philosophical one, concerning the explanatory role of abstract models in science. Model-building involves deliberate over-simplification of reality, so is inherently risky; the aim of sacrificing realism in order to achieve general insights into nature does not always pay off. This general problem has been extensively discussed by philosophers of science, in relation to the use of models in physics, economics, biology, and other disciplines. However, the case of population genetics raises some distinctive issues of its own.

One such issue concerns the concept of the gene itself. As we have seen, population genetics came into being in the 1920s and 1930s, long before the molecular structure of genes had been discovered. In these pre-molecular days, the gene was a theoretical entity, postulated in order to explain observed patterns of inheritance in breeding experiments; what genes were made of, how they caused phenotypic changes, and how they were transmitted from parent to offspring were not known. Today we do know the answers to these questions, thanks to the spectacular success of the molecular genetics ushered in by Watson and Crick's discovery of the structure of DNA in 1953. The gene has gone from being a theoretical entity to being something that can actually be manipulated in the laboratory.

The relationship between the gene of classical (pre-molecular) genetics, and the gene of modern molecular genetics is a subtle and much discussed topic (cf. Beurton, Falk and Rheinberger (eds.) 2000). In molecular genetics, ‘gene’ refers, more or less, to a stretch of DNA that codes for a particular protein—so a gene is a unit of function. But in classical population genetics, ‘gene’ refers, more or less, to a portion of hereditary material that is inherited intact across generations—so a gene is a unit of transmission, not a unit of function. In many cases, the two concepts of gene will pick out roughly the same entities—which has led some philosophers to argue that classical genetics can be ‘reduced’ to molecular genetics. But it is clear that the two concepts do not have precisely the same extension; not every molecular gene is a classical gene, nor vice-versa. Some theorists go further than this, arguing that what molecular biology really shows is that there are no such things as classical genes.

Whatever one's view of this debate, it is striking that virtually all of the central concepts of population genetics were devised in the pre-molecular era, when so little was known about what genes were; the basic structure of population-genetic theory has changed little since the days of Fisher, Haldane and Wright. This reflects the fact that the empirical presuppositions of population-genetic models are really quite slim; the basic presupposition is simply the existence of hereditary particles which obey the Mendelian rules of transmission, and which somehow affect the phenotype. Therefore, even without knowing what these hereditary particles are made of, or how they exert their phenotypic effects, the early population geneticists were able to devise an impressive body of theory. That the theory continues to be useful today illustrates the power of abstract models in science.

Despite the continuity of modern population genetics with the work of Fisher, Haldane and Wright, it would be quite wrong to imply that molecular developments have had no effect on the discipline. Molecular biology has produced an enormous supply of data on the genetic variability of actual populations, which has enabled a link to be forged between abstract population-genetic models and empirical data. This is not in itself a new development: the selectionist-neutralist controversy of the 1960s, mentioned above, was fuelled by the then new data on protein polymorphism in fruit-fly populations (cf. Lewontin and Hubby 1966). More recently, extensive data sets on variation at the DNA rather than the protein level have become available; this has led to the rise of ‘molecular population genetics’ and an associated set of ideas known as ‘coalescent theory’ (cf. Wakeley 2004). Unlike traditional population-genetic analysis, which tries to determine how a given population will evolve in the future, coalescence theory tries to reconstruct the ancestral state of a population from its current state, based on the idea that all the genes in a population ultimately derive from a single common ancestor. Coalescence theory underpins much contemporary research in population genetics.

Population-genetic models of evolution are sometimes criticised on the grounds that few phenotypic traits are controlled by genotype at a single locus, or even two or three loci. (Multi-locus population-genetic models do exist, but they tend to be extremely complicated.) There is an alternative body of theory, known as quantitative genetics, which deals with so-called ‘polygenic’ or ‘continuous’ traits, such as height, which are thought to be affected by genes at many different loci in the genome, rather than just one or two. Quantitative genetics employs a quite different methodology from population genetics. The latter, as we have seen, aims to track gene and genotype frequencies across generations. By contrast, quantitative genetics does not directly deal with gene frequencies; the aim is to track the phenotype distribution, or moments of the distribution such as the mean or the variance, across generations. Though widely used by animal and plant breeders, quantitative genetics is usually regarded as a less fundamental body of theory than population genetics, given its ‘phenotypic’ orientation, and plays less of a role in evolutionary theorising.

A different criticism of the population-genetic approach to evolution is that it ignores embryological development; this criticism really applies to the evolutionary theory of the ‘modern synthesis’ era more generally, which had population genetics at its core. As we have seen, population-genetic reasoning assumes that an organism's genes somehow affect its phenotype, and thus its fitness, but it is silent about the details of how genes actually build organisms, i.e. about embryology. The founders of the modern synthesis treated embryology as a ‘black box’, the details of which could be ignored for the purposes of evolutionary theory; their focus was on the transmission of genes across generations, not the process by which genes make organisms. This strategy was perfectly reasonable, given how little was understood about development at the time. In recent years, great strides have been made in molecular developmental genetics, which has renewed hopes of integrating the study of embryological development with evolutionary theory; hence the emerging new discipline of ‘evolutionary developmental biology’, or evo-devo.

In a recent book, Sean Carroll, a leading evo-devo researcher, argues that population genetics no longer deserves pride-of-place on the evolutionary biology curriculum. He writes: ‘millions of biology students have been taught the view (from population genetics) that ‘evolution is change in gene frequencies’ … This view forces the explanation toward mathematics and abstract descriptions of genes, and away from butterflies and zebras, or Australopithecines and Neanderthals’ (2005 p. 294). Carroll argues that instead of defining evolution as ‘change in gene frequencies’, we should define it as ‘change in development’, in recognition of the fact that most morphological evolution is brought about through mutations that affect organismic development. Carroll may be right that evo-devo makes for a more accessible introduction to evolutionary biology than population genetics; but the latter remains indispensable to a full understanding of the evolutionary process.

Despite the criticisms levelled against it, population genetics has had a major influence on our understanding of how evolution works. For example, the well-known ‘gene's eye’ view of evolution, developed by biologists such as G.C. Williams, W.D. Hamilton and Richard Dawkins, stems directly from population-genetic reasoning; indeed, important aspects of gene's eye thinking were already present in Fisher's writings. Proponents of the gene's eye view argue that genes are the real beneficiaries of the evolutionary process; genotypes and organisms are mere temporary manifestations. Natural selection is at root a matter of competition between gene lineages for greater representation in the gene pool; creating organisms with adaptive features is a ‘strategy’ that genes have devised to secure their posterity (cf. Dawkins 1976, 1982). Gene's eye thinking has revolutionised many areas of evolutionary biology in the last thirty years, particularly in the field of animal behaviour; but in many ways it is simply a colourful gloss on the conception of evolution implicit in the formalisms of population genetics.


Other Internet Resources

Related Entries

genetics: evolutionary | natural selection: units and levels of