Stanford Encyclopedia of Philosophy
This is a file in the archives of the Stanford Encyclopedia of Philosophy.

The Genotype/Phenotype Distinction

First published Fri Jan 23, 2004

The distinction between phenotype and genotype is fundamental to the understanding of heredity and development of organisms. The genotype of an organism is the class to which that organism belongs as determined by the description of the actual physical material made up of DNA that was passed to the organism by its parents at the organism's conception. For sexually reproducing organisms that physical material consists of the DNA contributed to the fertilized egg by the sperm and egg of its two parents. For asexually reproducing organisms, for example bacteria, the inherited material is a direct copy of the DNA of its parent. The phenotype of an organism is the class to which that organism belongs as determined by the description of the physical and behavioral characteristics of the organism, for example its size and shape, its metabolic activities and its pattern of movement.

It is essential to distinguish the descriptors of the organism, its genotype and phenotype, from the material objects that are being described. The genotype is the descriptor of the genome which is the set of physical DNA molecules inherited from the organism's parents. The phenotype is the descriptor of the phenome, the manifest physical properties of the organism, its physiology, morphology and behavior.

The concepts of phenotype and genotype also demand the distinction between types and tokens. As the words “genotype” and “phenotype” suggest, these are types, sets of which any given organism and its genome are members, sets defined by their physical description. Any individual organism and its genome are members of those sets, tokens of those types.

1. Heredity and Development

The distinction that is made between genotype and phenotype is made necessary by the separation of causal pathways that lead on the one hand to the passage of information about organisms between successive generations, and, on the other, to the growth and development of an organism within a generation from conception to death. The mechanism of inheritance is such that the causal pathway of inheritance is from genomes in one generation to genomes in the next without any influence on the genome of the events that occur in the development of the phenome during the life history of the organism. The mechanism of development of the phenome within a generation from the genome is such that the outcome of development is terminated at the death of the organism. While the genome is an element in the causal pathway leading from the first stage in the life of the organism to the final individual, there is no reciprocal effect of the phenome of the developed organism on the genome passed between generations. This can be represented schematically as:

generation 1 generation 2 generation n
heredity heredity heredity
genome development genome development genome development
phenome phenome phenome

The distinction between genotype and phenotype was introduced by Wilhelm Johannsen in 1908 as a consequence of the realization that the hereditary and developmental pathways were causally separate. This claim had already been made explicitly by August Weismann at the end of the nineteenth century, who differentiated between the germplasm of an organism, the tissue that forms the gametes to produce the next generation, and the somatoplasm, the tissues of the rest of the body. According to Weismann the somatoplasm developed and was influenced by the environment, whereas the germplasm was segregated early in development and was not susceptible to environmental influences. Thus, there could be no inheritance of acquired characteristics. Johannsen's distinction between genotype and phenotype, was however, induced, not by Weismannism, but by the rediscovery in 1900 of Mendel's work on inheritance in the garden pea.

The critical feature of Mendel's result was the result he obtained in the first and second generations of crosses between pea plants with clear-cut phenotypic differences. When a pure breeding red-flowered variety was crossed to a pure breeding white-flowered form, all the offspring in the first generation were red-flowered. When, however, these red-flowered hybrids were crossed with each other, both red-flowered and white-flowered plants appeared in the progeny. In order to explain this extraordinary reappearance of white-flowered plants in the second generation despite the fact that the first generation cross produced only red-flowered plants, Mendel distinguished between the internal state of the plants and their outward appearance. He postulated the presence of internal discrete elements, “factors”, that were contributed by the parents to the offspring. While these factors interacted in some way to produce the external appearance of the plants, they did not physically blend or contaminate each other, but maintained their discrete individuality. Thus a pure bred white-flowered plant had two white factors, one contributed by its maternal parent and one by its paternal parent, while red-flowered plants had two red factors. The hybrid between these two pure bred varieties would then have one white factor and one red factor. The red flowers of this hybrid were a consequence of the “dominance” of red factors over white factors in their causal interaction in producing flower color. That dominance in physiological action, however, in no way affected the nature of the factors themselves, which separated again, uncontaminated, when the hybrid plants produced pollen and ovules. As a consequence, when two hybrid plants were crossed, some of the offspring would have received a white factor from both pollen and ovules and would have white flowers.

This scheme of Mendelian explanation makes a clear distinction between what we now call the “genome” and the “phenome”. The Mendelian factors are part of the genome and a description of their state is a genotypic description. The outward appearance of the plant provides the description of its phenotype. Most important, the formal description of the properties of causal pathways, encapsulated in Mendel's Laws make a clear separation between the hereditary and the developmental pathways. Mendel's Second Law, the Law of Segregation, asserts that the factors that come together when an egg is fertilized, will separate again, unaffected by their mixture in the developed organism and unaffected by their physiological interaction that produces the phenotype of the adult. Mendel's Third Law, the Law of Independent Assortment asserts that when there are factor pairs for different characteristics, say a factor pair for flower color and a factor pair for plant height, that the segregation of the flower color factors when gametes are formed is causally independent of the segregation of the factor pair for plant height. The Second and Third Laws are then descriptions of regularities in inheritance. Mendel's First Law, the Law of Dominance, in contrast is a developmental law, asserting that when the two factors of a factor pair in an individual organism differ in their physiological effect, one will dominate the other in the final result of development.

For the first half of the twentieth century very little progress was made in identifying the physical basis of Mendel's factors. The chief advance was the demonstration that the different factors, now renamed “genes” were linearly arranged along bodies in the nucleus of cells, the chromosomes. The behavior of the chromosomes in the process of gamete formation was what was predicted from the formal hereditary properties of genes. Alterations in the observable physical form of specific places on chromosomes could be associated with specific alterations in phenotype and heritable alterations in phenotype could be produced by bombarding organisms with high energy radiation ionizing radiation. But genes remained abstract entities whose existence as the elements of heredity and the causes of development depended entirely on inferences from the phenotypes of organisms involved in various breeding experiments. That is, genotypes had to be inferred from their phenotypic effects. The development of molecular biology began with the definitive identification of DNA as the material basis of genes in the late 1940s and early 1950s. This was then followed by the elucidation of the chemical and physical structure of DNA, the molecular mechanism of its reproduction in heredity, and a detailed molecular description of the way in which cells converted the information in the DNA of genes into the molecules of physiological and developmental function. All of these molecular details confirmed the causal independence of the hereditary behavior of the genome from its developmental functions. The DNA of the genome consists of long strings made up of a succession of nucleotides of which there are only four kinds. The differences among genes are differences in the number and particular order of the four nucleotide types in a gene string. The DNA is replicated by the cells by directly copying the DNA strings into progeny DNA molecules. On the other hand, the reading of the genotypic information by the cells and the use of that information to produce molecules that underlie development of the characteristics of the phenotype is carried out by a different pathway. The DNA of various genes is first transcribed into a related molecule, RNA, and the different RNA molecules transcribed from the different genes carry the information that specifies the chemical structure of the various proteins from which cells are built. Information about which genes are to be transcribed in which cells, at which times in development and in what amounts is also contained in stretches of DNA called controlling or regulatory elements. It is the transcription of the genomic DNA into a special separate molecule, RNA, which, in turn, carries the genotypic information into the metabolic apparatus of the cell that is the critical element in the separation of the hereditary and the developmental functions of the genome. It is the mechanism which allows the genome to be a cause of the phenotype but which, at the same time, insulates the genome from the reverse influence of the phenome, preventing the inheritance of acquired characteristics.

2. Partial Genotype and Partial Phenotype

Real organisms are characterized by great variation one from another. Typically, individual members of any species differ in very large numbers of nucleotides that make up their DNA. In humans there are 3 million nucleotide differences on the average between any two people taken at random. Even very closely related individuals have many genetic differences. With the exception of twins or individuals cloned from the same parent, no two organisms have identical genomes. Moreover there is some ambiguity in assignment of an individual to a genotype, because many mutations occur in cells during the process of growth and development so that all the cells in the body do not contain identical genomes. Even asexually single-celled organisms like bacteria that are reproduced by the division of the parental cell differ in their genomes because mutations of DNA are sufficiently common that at least one of the nucleotides that constitute their DNA will have undergone a spontaneous change during cell division. Thus genotypes are classes with only a single member. Moreover, even cloned individuals or identical twins , although identical in genotype, will differ from each other in phenotype because of variations in their developmental environments. Thus, phenotypes are also classes with only a single member. Taken literally, the distinction between genetic or phenotypic types and tokens, while logically correct, would seem to be of no practical import.

In practice genotypic and phenotypic descriptions are not total but partial, restricted to some subset of the characteristics of the organism that is regarded as relevant for a particular explanatory or experimental purpose. In the delimitation of partial phenotypic and genotypic descriptions two decisions must be made. First, a particular aspect of the total phenotype is chosen for description, say the rate of production of melanin pigment from the biochemical processing of small molecules by enzymes, which then dictates that the genes that code for the enzyme proteins involved in the reaction are part of the partial genotype of interest. Second, a decision must be made about what set of phenotypes and genotypes are to be regarded as indistinguishable and so are to be included in the definitions of the partial genotypic and phenotypic classes. In the case of melanin pigmentation there is a continuous distribution of the trait from very light to very dark color because of environmental variations and because of small variations from individual to individual in the actual rates of enzymatic activity. What range of melanin deposition will be regarded as belonging to the same phenotypic class? The necessity of establishing boundaries for phenotypic classes arises because, although pigment intensity is a continuous variable, genotypes are, by their nature, discrete classes. So the problem of phenotypic class boundaries must be addressed.

It is also unclear what sets of genotypes are to be included within a particular partial genotypic class for three reasons. Small variations in phenotype arise in part because the interconnections of metabolic pathways in the organism are so complex that variation in proteins that are not directly part of the melanin production pathway may nevertheless have an effect on the rate of melanin formation. But these variations in such peripheral proteins are, in turn, the result of genetic variations in the genes that code for them. Should the genes that code for those proteins be included in the definition of the partial genotype, in which case the number of genotypic classes in the partial genotype becomes very large? But the problem is recursive, because the proteins coded for by the “secondary” genes are, in turn, affected in their rates of activity by yet other proteins and so on until all the genes in the genome become included in the “partial” genotypic description. Second, the relation between DNA sequence and the amino acid content of proteins is many-to-one. The DNA code is redundant so that triplets that differ only in their third position do not affect which amino acid is specified, so from this perspective all the redundant variations are included in the same partial genotypic class. On the other hand, such third position variation can affect the rate at which the cell reads the DNA code and so such variation may indeed delineate effectively different genotypes. Finally, there are variations in the DNA of the genome outside the coding regions of genes that do not affect the chemical structure of enzyme proteins but affect the rate of their synthesis. These are variations in the DNA of the so-called “controlling elements” that are part of each gene and which influence rate at which the cell will read the gene in making proteins.

The problem of what parts of the genome and phenome are to be included in the partial genotypic and phenotypic descriptions of the organism in particular cases is one of the most problematic in biology. While it is undoubtedly true that every part of the genome is connected causally with the phenome by some pathway, it is simply impossible to consider all pathways of connection. Sometimes biologists choose a partial genotype and partial phenotype because of practical limitations on how much can be done experimentally and simply pretend for the sake of convenience that the rest of the organism really is constant. Sometimes they explicitly recognize the heterogeneity of the rest of the genome and phenome but claim that it is effectively causally irrelevant to the phenomena under investigation, and sometimes they explicitly recognize the causal relevance of the rest of the genome and phenome, but treat it as background experimental “noise” that can be averaged out in sufficiently large set of observations.

The claim that the rest of the genomic and phenomic effects can be ignored or averaged out depends on the assumption that these pathways are causally orthogonal to the pathways that are being analyzed in the observations. That is, the different partial genotypes and phenotypes that are the subject of the analysis and experiment are not differentially affected by the variation in the rest of the genome and phenome. The claim for the validity of this assumption is part of a commitment to the reductionism of analytic biology, that organisms can be understood by cutting them up into intuitively reasonable small parts whose separate study will be sufficient to lead to an understanding of the whole. There is no question that organisms are a collection of effective subsystems within which causal interactions are strong and between which they are weak. The difficulty is that the boundaries of these subsystems are not fixed, but change from function to function so they must be determined from observations on an ad hoc basis from case to case. Eyes and legs are developmentally independent. Genetic variation exists that affects the size and shape of one without any impact on the size and shape of the other, and developmental traumas can make an animal blind without making it lame. However, movement across an uneven surface depends on coordination of the contraction of the leg muscles with predictive visual cues so the problem of locomotion makes these two developmentally independent subsystems part of the same functional unit.

3. Mapping Genotype Into Phenotype

If the mechanisms of development were such that every change in genotype resulted in a different phenotype and every different phenotype was the consequence of a difference in genotype, the study of the origin of organic variation would be greatly simplified. Given a knowledge of the phenotype, the underlying causal genotype could be unambiguously inferred and vice versa. The problem of understanding the manifest variation among organisms would then be reduced to providing a mechanical story of a chain of biochemical reactions, beginning with the reading of the genome by the cell and ending with the final state, much like the production of an automobile can be completely reconstructed from the blueprints, a description of the materials used, of the production machinery and of the order in which the materials pass through that machinery. However, the actual correspondence between genotype and phenotype is a many-many relation in which any given genotype corresponds to many different phenotypes and there are different genotypes corresponding to a given phenotype. The current state of the study of organismic development ignores this many-many relationship and is structured on the model of the automobile assembly plant. It is not that developmental biologists are unaware of the many-many relationship between genotype and phenotype. Rather, pragmatic considerations dictate that the understanding of the mechanisms of development will best be achieved by first concentrating on those developmental outcomes that have an unambiguous relationship between genotype and phenotype, leaving for the future the issues posed by the many-many relation. An unintended side product of this strategic decision is that the language used to describe the problematic, and the results of the research, create and reinforce an overly simple view of the relationship between genes and characters.

The many-many mapping between genotype and phenotype arises from four sources: (1) the relation between the DNA sequence and the chemical structure of proteins; (2)relations between the products of the transcription and translation of the information coded in the genome; (3) the dependence of development and physiology on both the genotype of the organism and the temporal sequence of environments in which the organism develops and functions; (4) stochastic variations of molecular processes within cells.

3.1 DNA-protein relations

A protein consists of a string of amino acids, each one of which is coded for by a triplet of nucleic acids in the string of DNA constituting a gene. For the protein to have physiological activity the identity of many of these amino acids is essential. Thus, a change any part of the gene that causes a replacement of any one of these amino acids will prevent the physiological activity of the protein. It is impossible to say from observing the phenotype, lack of physiological activity of the protein, what change in the genotype has occurred. This is the most common form of many-to-one mappings of genotype onto phenotype.

3.2 Relations between genes

Mendel's observations provide the classic example of an ambiguity in the relation between genotype and phenotype. He observed that plants that carried one member of a gene pair specifying red flowers and one member specifying white flowers were indistinguishable from plants carrying two copies of the red form of the gene. He observed similar dominance of one gene form and recessiveness of the alternative gene form in other characters as well leading him to generalize the phenomenon as a law, the Law of Dominance. While subsequent research has shown that dominance of one copy of a gene over another is far from universal, it is sufficiently common that a large fraction of genetic variation present in populations of organisms is hidden at the level of phenotype and requires special experimental techniques to reveal it. This is especially true when one copy of the gene is defective so that protein with less than normal activity is produced from that copy, while the alternate normal copy codes for protein that is physiologically active enough to produce the normal phenotype (Fisher 1931; Haldane 1939; Wright 1934).

A second form of interaction that is extremely common is that which occurs between the products read from different genes in the genome. If the products of reading the different genes are all necessary to produce a physiological effect, then alterations in any one of the genes will block the effect. Such interactions occur when the physiological effect is the outcome of a chain of chemical steps, each step being mediated by a product of a different gene. For example, coat color in mammals is the result of the action of the products of three different genes. One determines the distribution of pigment in the hair, and another determines whether the color of the pigment is black or brown. Various combinations of different genotypes of these genes correspond to different coat colors. However, there is a third gene that codes for an enzyme that is necessary for any color at all to be expressed. If this gene is defective the coat will be white, irrespective of the genotype of the other genes (Wright 1925).

A third source of a many-to-one relation between genotype and phenotype is the phenomenon of developmental buffering whose mechanism is poorly understood. There are many phenotypic features of organisms that show no variation between individuals belonging to the same species or are even constant among many related species. For example all individuals of all species of the fruit-fly genus Drosophila have exactly three simple light receptors, ocelli, arranged in a symmetrical triangle on the midline of the top of their heads. The simplest assumption is that there is on genetic variation for this trait and that its development is resistant to normal environmental disturbance. If the development of the fly is sufficiently disturbed, however, some flies with two or fewer ocelli are observed. If those with fewer than three ocelli are used as parents for the next generation they produced more abnormal flies than the parental generation. When the process of selective breeding from abnormal flies is continued over many generations a line of flies is produced that consistently has two ocelli, even in the absence of any external disturbance of development and these ocelli can be symmetrically or asymmetrically arranged (Maynard Smith and Sondhi 1960). The success of such a selection experiment proves that there was genotypic variation for ocellus number and arrangement in the original population of normal flies, but that all the different genotypes mapped onto the same phenotype. This is the phenomenon of developmental canalization, in which there is buffering of development against perturbing forces. There is genetic variation among individuals for genes that affect ocellus number, but the developmental effects of that variation are prevented by the system of buffering (Waddington 1953, 1957). If a sufficiently large perturbation is introduced, the developmental buffering capacity is overcome and the genetic variation for ocellus number is revealed. It is then possible to select genotypes that are so extreme in their effect on development that they are beyond the buffering capacity of the normal developing system and produce unusual phenotypes even under normal circumstances. Experiments with various other constant features of various animals have shown that developmental canalization is a common feature, so that phenotypic uniformity cannot be taken as a demonstration of relevant genotypic uniformity (Rendel 1967; DeVisser et al. 1993).

3.3 Genes and environment

The complete DNA sequence of an organism does not contain the information necessary to specify the organism. The outcome of developmental processes depends both on the genotype and on the temporal sequence of environments in which the organism develops.

Moreover, the mapping of different genotypes into phenotypes in one environment is often completely unpredictable from their mapping in another environment. The classic demonstration of the complexities of this mapping is the experiment on clones of the plant Achillea (Clausen, Keck and Hiesey 1958). Individual immature plants were collected from nature and from each plant three clones were produced by the simple method of cutting them into three pieces. One piece of each plant was grown at low elevation in the Sierra Range, one at medium elevation and one at high elevation. The result of the growth at the three elevations was that the relative heights of the various plants was unpredictable from one environment to another. For example, the genotype that grew tallest at low elevation was the shortest at medium elevation and the second tallest at high elevation. Moreover, where as this genotype flowered at low and high elevation it failed to flower at medium elevation, while other genotypes flowered at that elevation but not at high elevation. There was, in fact, no correlation among the plants in their growth in the different environments. Many experiments on many different organisms where it has been possible to produce multiple individuals of the same genotype show this same result (Lewontin and Goss 2004).

If the phenotype of the organism of a given genotype is plotted against an environmental variable the function that is produced is called the norm of reaction of the genotype (Schmalhausen 1949). It is the mapping function of environment into phenotype for that genotype. It is the common experience that norms of reaction of different genotypes are curves of irregular shape that cross each other. Thus, it is not possible to predict the phenotypes of different genotypes in new environments. There are, of course, some genotypes that are so defective that they will not survive in any natural environment. But these are not typical of natural variation among organisms. It may be taken as a general rule that the outcome of development of any genotype is a unique consequence of the interaction between genome and environment.

3.4 Stochastic effects

Even a complete specification of both the genotype and the temporal order of the developmental environment is insufficient to predict the phenotype. If the left and right sides of a bilaterally “symmetrical” organism are examined it will be discovered that, in general, that it is asymmetrical, but that the direction and amount of asymmetry varies from individual to individual with no average difference between sides. So, flies have small sensory bristles on their left and right sides. One individual will have, say , six bristles on the right and eight on the left, while another will five on the right and seven on the left. On the average over many individuals the number is the same on both sides but there is fluctuating asymmetry from fly to fly. Humans, do not have the same fingerprints on their left and right hands and the differences in pattern can be so great that no similarity at all can be detected. Yet the genes of the left and right sides are the same and no usual meaning of environment will allow that the left and right hands of a foetus inn its mothers womb have different developmental environments.

Another important phenomenon of random variation is the asynchrony of cell divisions. A single bacterial cell innoculated in a large flask of constantly stirred medium will divide into two cells after about an hour. Those two cells will then each divide about an hour later, but not simultaneously. Their daughter cells will again divide, but each a few minutes earlier or later than others and so on until the population of cells is growing continuously in time with no synchronization of division. Yet, in the early stages of the development of the culture there have not been enough generations to accumulate mutations, so the cells are genetically identical, nor is there any possibility of different environments in the constantly stirred culture. The same asynchrony of divisions occur at all stages in the division of cells from the fertilized egg in embryos.

The source of these asymmetries and asynchronies is the very low numbers of copies of biologically important large molecules in each cell. The Law of Mass Action of chemistry which is based on statistical averaging over very large numbers of molecules not apply when there only three of one molecule and seven of another. But that is precisely the situation of molecular numbers within cells. Each kind of molecule is in low numbers and they are distributed over space within the cell. For a reaction to occur between molecules they must be in proximity and each molecule in the reaction must be in the right vibrational state for interaction. Vibrational states are, in turn, fluctuating for each molecule, ultimately as a consequence of quantum uncertainty. As a consequence of the stochastic variation in number, spatial location and reactivity of each kind of molecule, there is considerable random variation from cell to cell in the timing of cell division and in its outcome. if there are seven molecules of a certain type present at the time of cell division, one daughter cell may receive three copies and one four copies, so that it will require different amounts of time for these cells to synthesize enough copies of the molecules for the next division (Goss and Peccoud 1998; Lewontin and Goss 2004; McAdams and Arkin 1997).

Extensive measurements of fluctuating asymmetry and of asynchrony have demonstrated that these stochastic effects are important sources of phenotypic variation.

4. Conclusion

The complex contingency of the relations between genotype and phenotype arise from the nature of organisms as physical systems. They differ from the physical systems that have been the objects of study of most physics and chemistry in two respects. Unlike atoms or planets they are intermediate in size and internally functionally heterogeneous. As a consequence they are the nexus of a very large number of weakly determining interacting causal chains and subject the effect of random noise at all levels. The consequence for the understanding of the structure and function of organisms, including their individual and social behavior, is that there is not some small set of universals like Newton's Laws. Even Mendel's Laws have many exceptions and the Biogenetic Law of all life from life cannot always have been true or there would be no organisms. As is true for living systems in general, relations between genotype and phenotype are contingent, varying from case to case.


Other Internet Resources

Related Entries

biology: philosophy of | character/trait | Darwinism | developmental biology | evolution | gene | genetics: and genomics | information: biological | innate/acquired distinction | molecular biology | natural selection | natural selection: units and levels of | types and tokens