Genomics and Postgenomics

First published Thu Oct 20, 2016

About 30 years ago researchers and other stakeholders started setting up the first genomics initiative, the Human Genome Project (HGP) (see the link to All About the Human Genome Project (HGP) in the Other Internet Resources section below). What was conceived as an audacious plan in the 1980s turned into an official multi-centre, international program in 1990 and was brought to a conclusion in 2003.

More than a decade later genomics is still big in business (and big business): the Obama administration announced in January 2015 that they intend to sequence one million human genomes (see Precision Medicine Initiative in the Other Internet Resources section below; see also Reardon 2015). Craig Venter, the commercially minded nemesis of the publicly-funded HGP is also in the mix again, this time involved in a privately-funded collaboration that aims to sequence two million genomes over the course of the next ten years (Ledford 2016). And equally important, we see not only the same players clash again but also the same promises being made, with talk of “groundbreaking health benefits” and “new medical breakthroughs” appearing once again in press releases and other announcements (see for instance Collins & Varmus 2015 or NIH 2015).

But many things are also different now. For instance, China has emerged as a major player in the genomics field, with the BGI (formerly the Beijing Genomics Institute) already announcing in 2011 the aim to sequence one million genomes. Moreover, DNA sequencing is no longer the only goal of these large-scale initiatives: the new genomics is of course still a genome-based effort, but it is a transformed enterprise that also focuses on data about proteins, DNA methylation^[1] patterns or the physiology and the environment of the people studied; DNA sequence data now forms only part of a much larger picture in the push for what is called ‘precision’ or ‘personalised’ medicine. Developments such as these have led many to refer to the present as a ‘postgenomic’ age (Richardson & Stevens 2015). The goal of this entry is to look at this constantly developing space of genomic and postgenomic research and outline some of the central philosophical issues it raises.

Section 1 will introduce and discuss several key terms, such as ‘genome’ or ‘genomics’. Section 2 will then turn to the question of what it means to read and interpret the genome. What did the sequencing and the mapping of the human genome entail and what philosophical issues arose in the context of the human genome project? How did sequencing evolve into a much larger ‘postgenomic’ enterprise and what issues did this transformation bring about? To answer the last question Section 3 will consider two different projects, perhaps newly emerging fields, namely the HapMap project, and metagenomics. In the supplementary document The ENCODE Project and the ENCODE Controversy, we will look at the ENCODE project and the controversy that surrounded it. These three cases will highlight key issues that come up again and again in the context of genomics and postgenomics.

It is also important to point out what this article is not about. There are already a number of entries in the Stanford Encyclopedia of Philosophy (SEP) that deal specifically with genes, genetics and also the HGP, and the present entry will not, therefore, address in much detail the history of, or the philosophical issues surrounding, the concept of the ‘gene’ (see SEP entry gene, but also the entries molecular biology, molecular genetics, and the human genome project), or the history of the HGP (see SEP entry, the human genome project). Broader issues that also play a role in genomics, such as the notion of biological information and the issue of reductionism have also been discussed in a set of SEP entries (for more on reductionism see reductionism in biology; gene; HGP; and molecular genetics and for more on the metaphor of a ‘genetic program’ and biological information see entries on biological information; gene; molecular genetics; and molecular biology). Furthermore, and probably most importantly of all, our focus here will be on the epistemological, ontological and methodological issues raised by genomics rather than the ethical, legal and social issues that the sequencing of DNA inevitably brings up (but see HGP entry for more on these topics).

1. Terminology and Definitions
- 1.1 Gene—Genome—Genomics
- 1.2 What is a Genome?
2. Reading the Genome
3. Beyond Sequencing
- 3.1 The International HapMap Project
- 3.2 Metagenomics
4. Outlook
- Supplement: The ENCODE Project and the ENCODE Controversy
Bibliography
Academic Tools
Other Internet Resources
Related Entries

1. Terminology and Definitions

1.1 Gene—Genome—Genomics

The term ‘genomics’ derives from the term ‘genome’, which itself derives (in part) from the term ‘gene’. The meaning(s) of—and the relationships between—these different terms is by no means simple.

The term ‘gene’ was introduced in 1909 by the Danish biologist Wilhelm Johannsen, who used it to refer to the (then uncharacterised) elements that specify the inherited characteristics of an organism (see gene and molecular genetics entries for an overview of the complex history of the term ‘gene’).

The term ‘genome’ was introduced in 1920 by the German botanist Hans Winkler (1877–1945) in his publication “Verbreitung und Ursache der Parthenogenesis im Pflanzen- und Tierreiche” (Prevalence and Cause of Parthenogenesis in the Plant and Animal Kingdom). Winkler defined the term as follows:

Ich schlage vor, für den haploiden Chromosomensatz, der im Verein mit dem zugehörigen Protoplasma die materielle Grundlage der systematischen Einheit darstellt, den Ausdruck: das Genom zu verwenden […]. (Winkler 1920: 165)

I propose to use the expression ‘genome’ for the haploid set of chromosomes that, in conjunction with the associated protoplasm, represents the material foundation of the systematic unit [often translated as ‘species’]. (Translation by S.G.)

The etymology of the term is not clear but most authors and encyclopaedia entries assume that it is a combination of the German words ‘Gen’ and ‘Chromosom’, leading to the composite ‘Genom’. In general, the origin and the different meanings of the -ome suffix are not entirely clear and there are now several accounts that try to bring some structure and/or meaning to the ever flourishing -omes terminology in contemporary life sciences (see, e.g., Lederberg & McCray 2001; Fields & Johnston 2002; Yadav 2007; Eisen 2012: Baker 2013; for interesting/entertaining lists, see -omes and -omics in the Other Internet Resources section below).

The term ‘genomics’, finally, was invented in 1986 at a meeting of several scientists who were brainstorming (in a bar) to come up with a name for a new journal that Frank Ruddle (Yale University) and Victor McKusick (Johns Hopkins University) were setting up. The aim of this journal was to publish data on the sequencing, mapping and comparison of genomes. To capture these different activities—and in analogy to the well-established discipline of genetics—Thomas Roderick (Jackson Laboratory) proposed the term ‘genomics’ (Kuska 1998). Unbeknownst to the people involved this was a significant moment in the history of the life sciences, as it is here that the -omics suffix appears for the first time.

1.2 What is a Genome?

Looking at the history and the etymology of a term does not, of course, necessarily tell us a lot about how it is used in the context of current science. So what is a genome in today’s life sciences? Is it the (haploid) set of chromosomes we find in the nucleus of a eukaryotic cell, in line with the original definition by Winkler? Or is it the totality of genes we find in an organism or the totality of DNA present in a cell? And if so, which DNA? Most definitions that are currently in circulation are an intricate mix of different ways of approaching the issue. This can be illustrated by looking at the definitions given in several key online resources (for more definitions of the term ‘genome’ see Table 1 in Keller 2011).

The term is defined on the genome.gov website glossary:

The genome is the entire set of genetic instructions found in a cell. In humans, the genome consists of 23 pairs of chromosomes, found in the nucleus, as well as a small chromosome found in the cells’ mitochondria. Each set of 23 chromosomes contains approximately 3.1 billion bases of DNA sequence. (Talking Glossary: genome, in the Other Internet Resources)

And this is how the U.S. National Library of Medicine defines it:

A genome is an organism’s complete set of DNA, including all of its genes. Each genome contains all of the information needed to build and maintain that organism. In humans, a copy of the entire genome—more than 3 billion DNA base pairs—is contained in all cells that have a nucleus. (NIH 2016)

Similarly the education portal of the journal Nature:

A genome is the complete set of genetic information in an organism. It provides all of the information the organism requires to function. In living organisms, the genome is stored in long molecules of DNA called chromosomes. (Scitable: genome, in the Other Internet Resources)

All of these definitions refer both to information and to instructions for the development and/or functioning of an organism. In the first two, the genome is also identified with a material entity, in the first case the chromosomes, in the second a sequence of base pairs. Nature allows only that the information is “stored in” the chromosomes.

The combination of these two aspects is highly problematic. The definition from the U.S. National Library of Medicine implies that “all of the information needed to build and maintain that organism” is contained in the DNA,^[2] which is certainly false: many environmental factors, not to mention factors in the maternal cytoplasm, are required for the first task, and even more obviously (food, light, etc.) for the second. Moreover when, as is almost always the case, an organism requires symbiotic partners for its proper functioning, such a definition will imply that the DNA of these symbionts is part of the genome of the first organism, a result that few would welcome.^[3] The Nature definition commits the same error in its second sentence. The genome.gov definition appears to identify the chromosomes both with a set of instructions and a material entity, which appears rather problematically to conflate a material object with an abstract entity.

The problem is not hard to see. Attempting to combine aspects of the material base of the genome and its informational content, as all these definitions do, inevitably assume some simple relation between these two; but in fact the relationship is extremely complex. Because the informational content of the genome is dependent in multiple ways on elements that are not, on any account, part of the genome, an account in terms purely of informational content seems a hopeless project.

One commonly held view that can be quickly dismissed, is the idea that the genome is just the sum total of an organism’s genes. The problem here is just that even passing over the well-known problems with saying what a gene is (see Barnes & Dupré 2008; SEP entry on the gene), on any tenable account of genes, there is far more to the genome than genes, and only a fraction of the actual DNA contained in the chromosomes would be part of the genome, at least in the case of humans and other organisms that have a relatively large amount of non-coding DNA (Barnes & Dupré 2008: 76).^[4] Even if ‘gene’ is interpreted in the widest possible sense, including any section of the genome that has some identifiable function, no one denies that a significant amount of DNA is not functional. The rest of the DNA would not form part of the genome, an outcome that contradicts all definitions of the genome of which we are aware, and makes nonsense of such familiar concepts as ‘whole-genome sequencing’, which refers to the analysis of all the DNA found in the chromosomes.

There are, we suggest, two initially tenable approaches to the problem.^[5] The first, and one that is often implicitly or explicitly assumed to be correct, is to define the genome as the sequence of nucleotides. This may or may not contain extranuclear DNA, as in mitochondria or chloroplasts; the genome.gov definition explicitly includes the former. This last question figured largely in debates over the moral permissibility of so-called mitochondrial transplants (a designation that speaks volumes, incidentally, about the almost magical importance attached to DNA as opposed to the remaining contents of the cell), but it is not one of great philosophical significance. The alternative approach is to understand the genome strictly as a material object, presumably, in most cases, the nuclear chromosomes.

The problem with the first approach is that it is largely motivated by the assumption that the nucleotide sequence is what contains all the important information in the genome. But in fact it has become increasingly clear that this is not the case, especially as a result of the growing understanding of epigenetics. Epigenetics is the study of material modifications of the genome that affect what parts of the genome sequence are or are not transcribed into RNA, the first stage of the process by which the genome influences the containing organism. The two most well-studied classes of epigenetic modification are methylation, the attachment of a methyl group (-CH₃) to one of the four nucleotides, cytosine, and various chemical modifications of the histone proteins, proteins that form the core structure of the chromosomes, and around which the DNA double helix is wrapped (Bickmore & van Steensel 2013; Cutter & Hayes 2015). The nucleotide sequence, then, provides the (extremely large) set of possible transcripts that the genome can produce, but the epigenetic state of the genome determines which transcripts are actually produced (Jones 2012). Both features of the genome (qua material object) must be specified, therefore, if we want to understand the biologically relevant behavior of the whole system.

So if the motivation for defining the genome in terms of sequence is to capture its informational content, the definition fails to serve its goal. Indeed, the definition that will come closest to this goal is that which identifies the genome as the material object, the set of chromosomes (this interpretation of the genome is defended in detail in Barnes & Dupré 2008). An implication of this definition that is often taken to be counterintuitive by biologists is that the genome will on this account encompass not only DNA, but the histone proteins that are material parts of the chromosomes. But of course the point of the preceding discussion is that the variable chemical states of the histones are, in fact, essential bases for some of the information inherent in the genome.

The phenomenon of methylation makes a similar point in a slightly different way. The nucleotides that comprise the familiar sequence are cytosine, thymine, adenine and guanine). When a methyl group attaches to the cytosine molecule the resultant nucleotide is not, strictly speaking, cytosine, but 5-methyl cytosine. So unless one takes the letter ‘C’ in the standard representation of sequence to mean, rather counterintuitively, “cytosine or 5-methyl cytosine”, it is only a partially accurate representation of the feature of the genome it purports to represent. More importantly, it is a representation that fails to capture crucial functional aspects of the genome.

A final telling point is that it has recently become clear that there are functions of the genome, as material object, that go well beyond even the broadest interpretation of the genetic (Bustin & Misteli 2016). It appears that the genome plays an essential role in a range of cellular processes. First, its physical arrangement into domains of varying sizes plays a central role in the coordination of gene expression. But much further from the genetic, it is a large object the mechanical forces of which are involved in various cellular processes and cellular homeostasis, and the chromatin fiber provides a scaffolding for both proteins and membranes (Bustin & Misteli 2016). Unless we are to introduce a new word to refer to this biologically vital entity, only a material conception of the genome can capture the full range of its activities.

One might be tempted to object to the argument above concerning methylation, that whereas methylation is a somewhat transitory state, the underlying four-letter sequence is extremely durable, lasting across many generations. Richard Dawkins (1976) famously emphasized the importance of this durability in arguing for the importance of this stability in evolution, even going so far as to describe genes as “immortal”. So perhaps there is a good reason for understanding “C” as referring to a disjunction.

This is not the place to address the quasi-theological view of gene immortality. However, this does point to a fundamental issue about the nature of genes. Even if genes, somehow, were unchanging immortal substances, the genome is nothing of the sort. It is an extremely dynamic entity, constantly changing its properties in generally adaptive response to it environment. Moreover even the constancy of its nucleotide sequence is something maintained only by the continuous application of various editing and repair mechanisms. Indeed, far from being an eternal substance, we suggest it is much better seen as a process, a highly complex set of dynamic activities crucial in maintaining the structural and functional stability not only of the organism but also, through its role in reproduction, of the lineage. Importantly, these relations are bi-directional and, specifically, the organism is also crucial to maintaining the necessary aspects of stability of the genome.^[6]

2. Reading the Genome

The first genome to be sequenced was that of a virus, namely bacteriophage ΦX174, sequenced by Frederick Sanger in 1977 (Sanger et al. 1977). Up to about 1985, work on several other viruses was initiated in different laboratories across the world and even the sequencing of model organisms such as the bacterium Escherichia coli or the roundworm Caenorhabditis elegans was being tackled.^[7]

Of all the different sequencing efforts at the time the human genome project (HGP) of course stands out. Not only is the human genome relatively large (roughly 3.2 billion base pairs (bps)) and of key interest to us as human beings, but the HGP itself was envisioned as a diverse large-scale research project with various strands and aims. Getting the sequence out of this project was the one goal that got the most attention in the wider media, but surely many would agree that other findings and practices developed within the HGP were of equal or even greater importance.

In what follows we will treat the HGP as a pivot around which genomics developed as a field of research and as a set of techniques. For ease of exposition we will talk here of a pre-HGP and a post-HGP phase. Obviously, this is a simplification; there is not just one single trajectory along which the story of genomics runs and there is not one clear break between a pre- and a post-genome era (Richardson & Stevens 2015). Nevertheless, as a way of structuring the discussion this distinction will be a helpful tool.

2.1 The Run-up to the HGP

A decade after Sanger and Maxam and Gilbert published their DNA sequencing methods in 1975 the first concrete talk of a human genome project started to appear in writing (Dulbecco 1986) and at different workshops (Sinsheimer 1989; Palca 1986). The Human Genome Project (HGP) itself became a reality in 1990 when it was officially launched as a US federal program (see 1990 in a brief history and timeline [NHGRI] in the Other Internet Resources section below).

In the run-up to the HGP there were high expectations (some would say “hype”) developing, which inevitably also brought critics of the project onto the scene (Koshland 1989; Luria et al. 1989). As so often, the issue of funding had a key role to play. When the HGP was initiated there were no ‘big science’ projects being pursued in the life sciences. The HGP therefore was a true first for biology. But pushing such a large project that absorbed a significant proportion of the funding allocated to the biological sciences encountered a lot of resistance from other scientists.

There were three key criticisms: 1) Some claimed that the HGP was a waste of money because much useless (read: junk) DNA was sequenced; the focus should be more directly on the functional parts of the genome, i.e., the genes or regulatory elements, which could be achieved using simpler and less expensive methods (Brenner 1990; Weinberg 1991; Rechsteiner 1991; Lewontin 1992; Rosenberg 1994). Others claimed 2) that the HGP was a waste of money as it was merely a descriptive and not a hypothesis-driven project. This was an issue that became much more prominent ten years after the project was finished, when it became clear that big data science was here to stay (see, e.g., Weinberg 2010).^[8]

And last but not least there was also the critique 3) that the HGP is fundamentally misguided as it assumes that by using sequence knowledge alone we would be able to develop an understanding of how our body works, how it develops disease, and that this understanding will eventually lead to cures for many diseases (Lewontin 1992; Tauber & Sarkar 1992; Kitcher 1994). This more general critique of a narrowly sequence-focused approach to biomedical issues also comes up 20 years later in discussions about the use of common genetic variants to learn more about common diseases and traits (see Section 3.1.2).

It is difficult to evaluate criticisms of the last kind. There is no doubt that enthusiasm for the HGP and many other successor projects in genomics has often been grounded in simplistic assumptions about the power of DNA and its pre-eminent role in biological systems. On the other hand it is arguable that many unanticipated benefits have derived from genomics quite independently of such assumptions. For instance the ability to make very precise comparisons of genome sequences has led to major advances in unraveling the details of evolutionary history, not to mention its application to technologies such as forensic DNA testing. Moreover, it can be argued with Waters (2007b) that what makes genomes so central to biological research is not the erroneous belief that they are the ultimate causes of everything, but rather the unique possibilities they present for precise intervention in organisms or cells.

2.2 The Results and Impact of the HGP

The main output of the HGP is usually seen as ‘the’ human genome sequence. The draft human genome sequence (about 90% complete) was announced in June 2000, followed in 2001 by the publication of the draft sequences produced by the HGP (International Human Genome Sequencing Consortium 2001) and the privately funded initiative (Venter et al. 2001). The complete (or almost complete (99%)) sequence of the human genome was released in 2003, which also marked the official ending of the HGP (International Human Genome Sequencing Consortium 2004).

But the view that the sequence of ‘the’ human genome was the key output is wrong in several ways. First of all there is in general no such thing as ‘the’ human genome, as each individual (except for monozygotic twins) carries their own set of small and large variations in their genome (and even for twins there are many differences they accumulate in their genomes during their lifetime). The sequence that was produced in the HGP is therefore nothing more than an example of one particular sequence, meaning it can only serve as a reference genome. Importantly, the reference sequences that both the HGP and Venter’s project delivered did not correspond to the genome of a single person as the DNA used to produce them was derived from several individuals.^[9] The genomes that came out of the two sequencing efforts were therefore composite reference sequences. But the HGP also produced much more than just a DNA sequence. Here we will highlight three outcomes or aspects of the HGP that are of particular importance, also for the period that followed the completion of the project.

One key feature of the HGP was that it involved the sequencing of a range of different model organisms, an aspect of the HGP that was often overlooked in discussions of the project in the philosophical literature and elsewhere (Ankeny 2001; for a searchable list of sequenced genomes see genome information by organism in the Other Internet Resources section below). The HGP provided not only a first reference genome of Homo sapiens but also the first bacterial genome (Haemophilus influenzae, Fleischmann et al. 1995), the first eukaryotic genome (Saccharomyces cerevisiae, Goffeau et al. 1996), and the genomes of key model organisms (Escherichia Coli, Blattner et al. 1997; Caenorhabditis elegans, C. elegans Sequencing Consortium 1998; Arabidopsis thaliana, Arabidopsis Genome Initiative 2000; Drosophila melanogaster, Adams et al. 2000, Myers et al. 2000).^[10]

A further crucial output was the acceleration in technology development the HGP brought about. It is safe to say that without the HGP (and subsequent initiatives such as the Advanced Sequencing Technology Awards created in 2004 by the National Human Genome Research Institute (NHGRI) (NIH 2004)) there wouldn’t have been such a rapid development in next-generation sequencing (NGS) approaches and the cost of whole genome sequencing would not have dropped as quickly as it has (see Mardis 2011 for a review of the development of NGS). And these improvements in the sequencing technology had further consequences, for example allowing scientists to sample DNA in different ways and from different sources, as new sequencing methods could process more DNA material more quickly and work with less starting material. This, finally, made possible whole new sub-disciplines, such as metagenomics (see Section 3.2).

A final noteworthy output of the HGP is what scientists learned about the structure of the genome. Beginning with the HGP, and building on further studies, researchers have gained a much more detailed picture of the fine structure, the dynamics and the functioning of the human genome. It was not only that there were many fewer genes present than expected, but there was also much more repetitive DNA and transposable elements present (it is estimated that about 45% of human DNA consists of transposable elements or their inactive remnants). These findings relate to a more general and older discussion about genome size and complexity to which we next turn.

2.3 Genome Size, the C-value Paradox and Junk DNA

It has been known since the 1950s that genome size varies greatly between different organisms (Mirsky & Ris 1951; see also Gregory 2001), but from the very beginning it was also clear that this diversity has some surprising features. One of these features is the absence of correlation between the complexity of an organism and the size of its genome.

2.3.1 The C-value Paradox

Assuming an informational account of the genome one would expect that the more complex an organism is, the more DNA its genome should contain (this is in fact what many biologists assumed at least until about the 1960s). How to define and assess the complexity of an organism is a tricky issue, but intuitively it seems reasonable to assume that a single-celled amoeba is less complex than an onion, which in turn is less complex than a large metazoan such as a human being, both in terms of the complexity of the workings and the structure of the organism. The expectation was that the DNA content of human cells should be much larger than that of onions or amoebae. As it turns out, however, both the onion and the amoeba have much larger genomes than human beings. The onion, for instance, has a genome of about 16 billion base pairs, meaning it is about five times the size of the human genome (Gregory 2007). The same lack of correlation between genome size and complexity can be found in many other instances (for an overview of different genome sizes see the animal genome size database in the Other Internet Resources section below).

It was also found early on that very similar species in the same genus show large variation in genome size, despite having similar phenotypes and karyotypes (i.e., number and shape of chromosomes in a genome). Within the family of buttercups, for instance, DNA content varied up to 80-fold (Rothfels et al. 1966). Also, Holm-Hansen (1969) showed that species of unicellular algae display a 2000-fold difference in DNA content despite all being of similar developmental complexity. It was findings such as these that gave a real urgency to addressing this discrepancy that was now labelled the C-value paradox (Thomas 1971). The term ‘C-value’ refers to the constant (‘C’) amount (‘value’) of haploid DNA per nucleus and is measured in picograms of DNA per nucleus. The C-value is a measure of the amount of DNA each genome contains (we can see here Winkler’s original definition of the genome at work).

2.3.2 Junk DNA

These discussions of genome sizes were closely related to concerns about gene numbers. And this consideration of genome size vs. gene numbers is what originally gave rise to the concept of ‘junk DNA’ (Ohno 1972).^[11] The reasoning behind this concept was the following: if one assumes a) that more complex organisms will have more DNA than less complex organisms and b) that gene numbers increase in proportion with genome size, then the genome of the more complex organism should have more genes than the less complex one.^[12] Human cells, for instance, contain about 750x more DNA than E. coli, meaning that they should turn out to have in the range of 3.7 million genes, as E. coli has about 5000 genes. This is clearly not the case; even in the 1970s it was generally supposed that the human genome might contain no more than 150,000 genes (Crollius et al. 2000). This discrepancy leads to the conclusion that the vast majority of the DNA in our genome cannot be genes and is therefore what Ohno referred to as ‘junk’.^[13]

The problem that the junk DNA discussion brings up has also been referred to as the ‘G-value paradox’ (‘G’ stands for ‘gene’), which directly concerns the discrepancy between the number of genes in an organism and its complexity (Hahn & Wray 2002). This paradox has been reinforced by the findings of the HGP. As Gregory (2005) and other commentators have pointed out, the finding that the human genome contains many fewer genes than expected was one of the most surprising outcomes of the HGP. Initial estimates from before the project were in the range of 50,000 to 150,000. These were reduced to about 30,000—35,000 after the publication of the first sequence draft in 2001 and have now been further revised to the order of 20,000 (Gregory 2001).

Some researchers assumed that the C-value paradox was fully resolved by the recognition that there is non-coding DNA in genomes (Gregory 2001). Larger genome size in ‘simpler’ organisms merely means that they have large quantities of non-coding DNA. But as Gregory points out, the fact that the majority of DNA in our genomes is non-coding might make the C-value discrepancies less of a paradox, but it gives rise to a whole range of further puzzles (Where does this extra DNA come from? What is its function? Etc.), which is why he proposes to talk of the C-value as an enigma rather than a paradox (Gregory 2001). The C-value enigma consists of many different and layered problems and these require a pluralistic approach to answering them, or so Gregory claims.

The publication of the draft genome sequence in 2001 and the conclusion of the HGP in 2003 did not give researchers all the tools and insights they needed to tackle these long-standing problems. But after the HGP, building on the initial sequencing effort, researchers could start to go beyond the mere sequence and gain a deeper understanding of the workings of the genome. This put them in a position to tackle issues such as the significance of junk DNA and the C-value paradox more directly (or at least from a different angle). The post-HGP phase is also characterized by an intense debate about the best way of doing research: the question of whether biological research should best be done on a small or a large scale has come up again and again in the post-HGP era, especially with the rise of other post-HGP large scale projects. The next section will address two projects/research fields that symbolize the various efforts and aspirations that were characteristic of the post-HGP era and which will help to illuminate some of the philosophical issues these developments raised.

3. Beyond Sequencing

The post-HGP phase is marked by a flourishing of different projects, closely connected in their origins to the HGP, but going beyond it in many different ways. This section discusses two such post-HGP projects, namely the International HapMap project and a new field of research called ‘metagenomics’. These examples indicate some important directions in which the postgenomic era is heading and identify some, though certainly not all, of the key characteristics and issues that mark this new period.

3.1 The International HapMap Project

The International HapMap project was a multi-centre project launched in 2002 that came to an initial conclusion in 2005 (NIH 2002).^[14] The acronym ‘HapMap’ stands for ‘haplotype map’ and (indirectly) refers to the main goal of the project, namely to map the common genetic variation in the human genome.

3.1.1 The HapMap Project and Genomic Variation

It is a well-known fact that everyone’s genome is different. There are, however, several ways in which genomes of individuals can vary from each other, ranging from the deletion, insertion or rearrangement of longer stretches of DNA to differences in single nucleotides at specific locations on a chromosome. The latter form of variation was the focus of the HapMap project. If we align the DNA sequence of two individuals they will be identical for hundreds of nucleotides; the DNA of two human beings typically displays about 99.9% sequence identity (Li & Sadler 1991; Wang et al. 1998; Cargill et al. 1999). But the 0.1% difference means that approximately every 1000 nucleotides there will be a difference in a single nucleotide between any two individuals.

Any variation at a specific genomic locus is referred to as an ‘allele’. If there are two different versions of a specific gene that can be found in a population at a specific locus on a chromosome, then that means that there are two different alleles of that gene present in that population.^[15] If one of these single nucleotide alleles is found in more than 1% of a specific population it is treated as a ‘common’ variant and researchers speak of a ‘polymorphism’ or, more precisely, a ‘single nucleotide polymorphism’ (abbreviated ‘SNP’; pronounced ‘snip’). If a variation is found in less than 1% of the population researchers simply call it a ‘mutation’ (or also a ‘point mutation’).^[16] On average there are about 3 million SNPs found in each individual and there is a pool of more than 10 million SNPs present in the human population as a whole (HapMap 2005).

Many of these alleles are (or have an increased likelihood of being) inherited together, meaning that they do not easily become separated through recombination events during meiosis.^[17] This leads to the non-random association of different alleles at two or more loci, a phenomenon that has been dubbed ‘linkage disequilibrium’ or ‘LD’. The concept of LD is key for the HapMap project as the fact that some SNPs stay associated (whereas the clusters themselves might get separated from each other over time by recombination events) explains the haplotype structure of the genome (Daly et al. 2001). The term ‘haplotype’ simply refers to a particular cluster of alleles (in this case SNPs) that a) are on the same chromosomes and b) are commonly inherited as one. The aim of the HapMap project was to characterize human SNPs, their frequency in different populations and the correlations between them (HapMap 2003). The first haplotype map was published in 2005, reporting on data from 269 samples derived from four different populations (HapMap 2005). Five years later, a follow up was published, now reporting on data from 1184 individuals sampled from 11 different populations (HapMap 2010).

The realization that the structure of genetic variation in the genome can be understood in terms of haplotypes was important for at least two reasons. First it opened the door for a relatively easy and efficient analysis of (single nucleotide) genetic variation in populations: the clustering of SNPs meant that in principle only one or a few of the SNPs in each cluster (so-called ‘tag SNPs’) would have to be tested to verify the presence of the cluster of variants as a whole. This made the analysis of genetic variation at the level of whole genomes from a large number of subjects feasible at a time when whole-genome sequencing was still too expensive for such a task (HapMap 2003). The development of a haplotype map was therefore a crucial step to enable what are now called ‘genome-wide association studies’ (GWAS) (see Section 3.1.2).

Secondly, as the distribution of haplotypes varies between different populations, the HapMap project had a strong focus on sampling DNA from different populations. This is an important aspect of this type of research as it brought, unwittingly perhaps, the issue of race and the question of its biological basis right back into genomics. This point will be revisited in Section 3.1.4.

3.1.2 The HapMap, GWAS and the Idea of Personalized Medicine

A key point driving the HapMap project was the fact that SNPs can be used to uncover connections between an individual’s DNA sequence and specific conditions or traits. At face value an SNP is simply a distinguishing mark in the genome of a person. Such marks allow researchers to screen groups of a population with different phenotypes, for instance those with a condition (e.g., high blood pressure) and those without. Looking at the frequency of specific SNPs or haplotypes in either group the researchers can use statistical analysis to get insight into the association between a particular SNP or haplotype and a trait (Cardon & Bell 2001). As mentioned above, this analysis can be focused on tag SNPs that are treated as proxies for a whole cluster of SNPs (if the cluster has a high LD).

Once a haplotype has been associated with a particular condition, other people can be screened for the presence of that haplotype and therefore gain some understanding of the risk groups they belong to. Although the test will not tell carriers of disease-linked SNPs whether they will develop the condition or not, it can nevertheless give them some information about their chances. Furthermore, even though the tag SNP itself might not be the genetic variation that causes or contributes to the variation in phenotype, it might be linked to so-called ‘causal SNPs’. Learning about SNPs associated with a condition or trait therefore can give the researcher clues as to which genes or regulatory DNA regions might be causally involved in the development of that condition. Findings from association studies can therefore in some cases contribute to the analysis of the condition itself.

The HapMap initially only looked for common variants (SNPs include by definition only common variants). This was in line with the so-called common disease/common variant (CD/CV) hypothesis formulated by Lander (1996); Cargill et al. (1999), and Chakravarti (1999).^[18] This hypothesis postulates, roughly, that common conditions are linked to genetic variations that are common in a population.

This link between common variants and common diseases also explains why the HapMap project could be promoted from the very beginning as the ‘next big thing’ after the sequence of the human genome had been determined: it was with the haplotype map that genomics should really start to have an impact on biomedical research and ultimately our understanding of disease.^[19]

3.1.3 The HapMap and its Critics

But the HapMap project was not without its critics; indeed the biologist David Botstein called it a “magnificent failure” (cited in Hall 2010).^[20] Some commentators, for instance, were worried that the project is nothing more than a make-work project filling a gap that the finished HGP left behind, and therefore a waste of precious funds (Couzin 2002). But more often, criticism of the HapMap project was part of wider debates about the way post-HGP research should be conducted. The HapMap project can therefore provide a useful window on some of the key tendencies and disputes that marked (or marred) the post-HGP era.

One such indirect criticism of the HapMap derives from the apparent failure of GWAS to lead researchers to clearer information about the links between our genetic makeup and the different conditions to which our bodies can succumb. In the eyes of these critics the CD/CV hypothesis was the key problem, as the common variants simply do not explain much of the heritability of common diseases. This observation gave rise to the concept of ‘missing heritability’ (Eichler et al. 2010).

The general focus on common variants in genomics was criticized by other authors who claimed that the focus of geneticists should rather be on rare variants (McClellan & King 2010). These rare variants, they claim, are where the missing heritability will be found. The problem with the rare variants is that they cannot be picked up in GWAS that use SNP databases, as SNPs are by definition common variants. Also, finding rare variants is a technical challenge as researchers have to analyse the genomic data of a very large number of individuals to do so reliably. This hunt for rare variants is a major reason behind the current push for the sequencing of millions (rather than a couple of hundreds or thousands) of genomes. As discussed earlier, such large-scale approaches have become feasible in recent years due to the reduced cost and increased speed of next-generation DNA sequencing.

The current shift to whole-genome sequencing will also help to address another critique of the GWAS/SNP/HapMap approach, namely its strict focus on single base pair changes in the genome. Other changes in the genome, such as variations in the numbers of copies of repeated elements or rearrangements, deletions or insertions of larger chunks of genomic DNA, might in many cases be what is at the core of a disorder, necessitating (again) a shift in focus away from point mutations and single genes to the genome as a whole (Lupski 1998, 2009).

As one of the first follow-ups to the original HGP, the HapMap project was a topic that often came up in discussions of the legacy of the HGP. Such discussions became especially prominent at the tenth anniversary of the publication of the draft genome sequence. In general, there was an overwhelming sense of disappointment at what had come out of the HPG, at least in the medical context. Given the grand promises that were made both around the start of the project in the 1980s and then again in the year 2000 at the presentation at the White House,^[21] it is not surprising that people were unimpressed by what had been delivered by 2010/2011. Interestingly, it was not only the usual suspects, such as Lewontin (2011), but also key proponents of the HGP itself who were critical and pointed out the minimal medical advances that had been achieved in the first post-HGP decade (Collins 2010; Venter 2010).

However, one thing that all critics, including the above-mentioned, agreed on was that even though its effect on medical practice had been negligible, the HGP had transformed biological research (see for instance Wade 2010; Varmus 2010; Hall 2010; Butler 2010; Green et al. 2011). One area in which genomic research had fundamentally changed both concepts and practices was in the understanding of what a gene is and how gene expression works and is regulated (Keller 2000; Moss 2003; Dupré 2005; Griffiths & Stotz 2006; Stotz et al. 2006; Check 2010). With great foresight, Evelyn Fox Keller pointed out already in 2000 that the HGP was interesting not so much because of the raw sequence it produced, but more because of the transformations it brought about in our expectations when it comes to ‘genes’ and DNA (Keller 2000).

3.1.4 The HapMap, Genomics and Race

As mentioned above, HapMap’s use of samples from different populations brought the concept of race into discussions of the project. Studies that looked into the genetic variation between population groups (of which the HapMap was a key representative) are among several recent developments (Duster 2015) that reignited discussion about a) the biological reality of race and b) the question whether racial classifications should be used in biomedical research at all. Several authors have picked up the relation between the HapMap project and a renewed concern with race (see, e.g., Ossorio 2005; Duster 2005; Hamilton 2008). The question that dominates these discussions is whether racial classifications reflect a ‘biological reality’.

Race has of course been an important topic in epidemiology and clinical research for a long time (Witzig 1996; Stolley 1999), but it has been widely perceived as a socially constructed category that has no biological basis.^[22] And many researchers imagined that as the HGP demonstrated how highly similar any two human beings are to each other at the DNA level, any idea of race as serious biological concept would be disposed of once and for all (see, e.g., Gilbert 1992; Venter 2000). But the concept of biological race was if anything rejuvenated rather than laid to rest by the developments in genomics (Kaufman & Cooper 2001; Foster & Sharp 2002; Hamilton 2008; Roberts 2011). This is exemplified by the fact that more and more scientists have claimed in recent years that there is a biological basis to our traditional notions of race, basing their claims on elaborate statistical analyses of data on genetic variation derived from a large number of human DNA samples. These developments led for many to what Troy Duster has called a ‘post-genomic surprise’ (Duster 2015).

An important point here is that linking genomics and race does not mean that researchers search for, or even that there are, any ‘genes for race’, even if we consider the many different ways in which this term can be interpreted (Dupré 2008). The discussion about the possible genetic basis for race is now more subtle, as it is not simply concerned with the presence or absence of specific genes or DNA elements and hence some sort of biological essence of races, but rather with the variation in the frequencies of alleles in the population of interest (Gannett 2001, 2004). The question is therefore not whether DNA element X is absent or present in one population or the other, but rather which variant of X is present at what frequency in a population (in the context of the HapMap researchers will talk of SNP frequencies).

Data from population genetics shows that the global distribution of allele frequencies in the human population is not discontinuous (Jorde & Wooding 2004; Feldman & Lewontin 2008) but clinal, meaning that human DNA sequences vary in a gradual manner over geographic space (Livingstone 1962; Serre & Pääbo 2004; Barbujani & Colonna 2010). Moreover, both genetic and phenotypic traits display what is called ‘nonconcordant’ clinal variation, meaning that different traits do not necessarily co-vary with each other; the pattern of how trait A varies across geographic space might be very different from the pattern displayed by trait B (Livingstone 1962; Goodman 2000; Jorde & Wooding 2004).

But despite these widely accepted findings, it is in the discussion of these distributions that the idea of a biological basis for our traditional understanding of race classifications has re-emerged. Based on the analysis of large sets of genetic variants in samples derived from various locations around the globe, a number of researchers have made the claim that human genetic variation displays geographical clustering (see, e.g., Rosenberg et al. 2002; Edwards 2003; Burchard et al. 2003; Bamshad et al. 2003; Leroi 2005; Tang et al. 2005). Importantly, these findings often also gave rise to, or were interpreted to support, the claim that this geographical distribution matches our traditional racial classifications.

Such findings also led a number of authors to claim that race still has a valid place in biomedical research: since these classifications are supposed to describe groups that are internally genetically similar, but genetically different from other groups, they can serve as useful proxies in estimating, for instance, the group member’s average risk of developing a particular condition (see, e.g., Xie et al. 2001; Wood 2001; Risch et al. 2002; Rosenberg et al. 2002; Shiao et al. 2012). Some authors are more cautious and claim that race should only serve as a loose and temporary proxy (Foster & Sharp 2002; Jorde & Wooding 2004) that should be abandoned as soon as we know the actual genetic variations that are linked to a particular condition or trait (Jorde & Wooding 2004; Leroi 2005; Dupré 2008). Such critics may note that the most that these genetic studies show is that there is a correlation between a person’s genetic variants and their geographical origin, if only because variants originate in a specific place; and there is a loose relation between the socially constructed concept of race and geographic origin. But given the tenuous connection that this generates between perceived or self-identified racial categories and genetic constitution, race is a poor substitute for any actually salient genetic information that may eventually be related to disease.

But there is also a significant group of researchers who are not convinced by these analyses and who don’t think that there is any biological basis to the race concept (see, e.g., Schwartz 2001; Duster 2005, 2006; Krieger 2000; Ossorio 2005). All of these authors criticise the above studies and the geographic clusters of genetic variation they identify, mainly because of flaws in the way samples are collected (see, e.g., Duster 2015) and how the data is ultimately analysed. The latter criticism has mainly focused on the program ‘Structure’ that is used by a majority of the studies mentioned above to churn out clusters of genetic variation (Bolnick 2008; Kalinowski, 2011; Fujimura et al. 2014). A telling criticism is that while Structure can be made to report that there are five main geographical clusters that show distinctive allele frequencies and which roughly match traditional notions of race (African, Asian, European, etc.), the programme can equally be set up to report any arbitrarily selected number of genetically different groups, as the user has to specify the number of clusters they are looking for before the Structure program is applied to any actual dataset.

Two interesting aspects of these discussions are that they a) usually only deal with one way of analyzing the biological reality of race classifications (as genetic) and b) adhere to a sharp distinction between race as biological reality or as social construct. Regarding a) several philosophers of biology have come up with alternative ways of thinking about a biological basis for race (for instance race as clades (Andreasen 1998), inbred lines (Kitcher 1999), or ecotypes (Pigliucci & Kaplan 2003)). This expansion of concepts brought with it the question of classificatory monism vs. pluralism, i.e., the question whether there is one privileged way of classifying race that somehow captures the ‘true nature’ of races (natural kinds) or whether there are several ways of doing so, depending on theoretical or practical interests/context (Gannett 2010). As Gannet argues, however, this focus on the monism/pluralism debate and on natural kinds comes at a cost, as it can mean that questions of practical significance are systematically ignored (2010). Regarding b), Gannett points out that drawing a sharp distinction between race as social construct or biological reality has not only been proven meaningless by recent work in population genetics but can also mean that the much messier reality of human history and diversity on this planet (and the complex interactions between scientific and social concepts of race) is being overlooked, leading to an impoverished analysis of the problems at hand (Gannett 2010).

3.2 Metagenomics

Metagenomics (also referred to as ‘environmental’ or ‘community’ genomics) is a research field that aims to analyse the collective genomes of microbial communities. These communities are usually extracted from environmental samples, ranging from soil to water or even air samples. A major advantage of metagenomics is that it does not rely on techniques for culturing microbes. This is important because only an estimated 1%–5% of all microbes can be cultured at all (Amann et al. 1995), an issue that has been referred to as the ‘great plate count anomaly’ (Staley & Konopka 1985).^[23]

The term ‘metagenomics’ was first coined in 1998 (Handelsman et al. 1998). The prefix ‘meta’ in ‘metagenomics’ can be read in at least three different ways (O’Malley 2013): 1) As referring to the fact that metagenomics transcends culturing limitations. 2) As emphasising the aggregate-level approach to biology that characterises metagenomics (looking beyond single entities (cells or genomes)). And 3) as referring to the goal of creating an overarching understanding of the genomic diversity of the microbial realm.

The methodology of metagenomics can be described as a four step process, consisting of: 1) the collection of environmental samples, 2) the isolation of microbial DNA from these samples, 3a) the direct analysis of the DNA or 3b) the creation of a genomic DNA library by fragmentation and insertion of the sampled DNA into suitable vectors (for instance plasmids that can be propagated in laboratory bacterial strains). These genomic libraries can then be used to 4a) sequence or 4b) perform a functional screen of the sampled genomic DNA. As the distinction between steps 4a) and 4b) already implies, metagenomics can be divided into a sequence- and a function-based approach (Gabor 2007; Sleator et al. 2008). In the former the collected DNA is sequenced so that potential genes present in the sample can be identified and, if feasible, the genomes of all the microbes that were present in the sample can be reconstituted.

The sequence-based approach is feasible due to the vastly reduced costs of sequencing and the increased computing power available. The goal of the approach is to get an idea of the diversity and distribution of microbes present in the sample and to also get an insight into their functioning (for instance by identifying metabolism-related enzymes that can give clues about the metabolic pathways active in the different microbes). This can give insights into the workings of the microbial ecosystem present in the sampled environment more generally.

In the functional approach the fragments of DNA that are stored in the library are used in what is often called a ‘functional screen’. To perform such a screen the researchers introduce the library plasmids into specific bacterial strains which then read and express any protein-coding sequence that might be present on the fragments, thereby producing the protein(s) the fragment codes for.^[24] The key to a functional screen is to create conditions in which only those bacteria that express a protein with the function of interest can be singled out (for instance by making sure that only those cells survive). Once the cells are singled out the library plasmid they contain can be recovered and sequenced allowing the researcher to identify the protein(s) encoded by that fragment. Functional metagenomics is often used to identify novel microbial proteins that can be used in biotechnological and pharmaceutical contexts and it is not surprising that metagenomics was and still is of great interest to the biotechnological sector (Streit & Schmitz 2004; Lorenz & Eck 2005; Culligan et al. 2014; Ekkers et al. 2012).

One of the first actual (sequence-based) metagenomics projects was performed (yet again) by one of the pioneers of genomics, Craig Venter. The goal of Venter and his team was to sample microbes from the surface of the nutrient-poor Sargasso sea (Venter et al. 2004). This particular environment was chosen for this pilot study because it was expected to have a microbial community with relatively low diversity. This assumption turned out to be wrong and the project identified more than a million putative protein-coding sequences derived from at least 1800 different genomic species extracted from the sea water.

Another early metagenomics study consisted of the analysis of an acidophilic biofilm with low microbial diversity from an acid mine drain in California (Tyson et al. 2004). The analysed biofilm survives in one of the most extreme environments including a very low pH (i.e., high acidity), relatively high temperature and high concentration of metals. Importantly, this specific biofilm truly displays low complexity as it is composed of only three bacterial and two archaeal species. This simplicity greatly aided the analysis effort and allowed the researchers an almost complete recovery of two of the genomes and a partial recovery of the other three.

There have been many other metagenomics studies conducted since and there is little point in listing them here, as the list is growing by the month. One aspect of the ongoing research that is important to point out, however, is that the projects are becoming increasingly ambitious. The trend now is not just to have an integrated view on the genomes but to combine metagenomics with other techniques such as metabolomics (the assay of small molecules present in a system), metatranscriptomics (the analysis of all RNA transcripts of a community of microbes) and viromics (the analysis of all the viral genomes present in the system of interest) (see Turnbaugh & Gordon 2008; Bikel et al. 2015). In a sense the field is moving towards a highly integrated meta-Metagenomics approach (Dupré & O’Malley (2007) talk of “metaorganismal metagenomics”). This is also in line with the general trend towards big-data and discovery-based approaches in the life sciences (Ankeny & Leonelli 2015; Dolinski & Troyanskaya 2015; Leonelli 2014, 2016).

The rise of metagenomics is also linked to other changes in biological sciences more generally, especially the rise of systems biology starting around the year 2000 (which is itself closely linked to the development of genomics since the 1990s). O’Malley and Dupré (2005) point out that there is an important distinction to be made when looking at fields like systems biology, because there is not only a change in epistemology but also one in ontology. They therefore distinguish between pragmatic and systems-theoretic biologists. For the former, the idea of a ‘system’ is merely an epistemic tool. For the latter, the system becomes the new fundamental ontological unit. Doolittle and Zhaxybayeva (2010) claim that the same can be seen in metagenomics where there is a drive to see the community or the ecosystem as the new fundamental unit, and not the single species (see also Dupré & O’Malley 2007).

Moving away from a focus on single organisms or monogenomic species allows us to make better sense of many recent findings in microbiology (in which metagenomics has played a key role). Central to all of this are mobile DNA elements that can travel horizontally, meaning between different members of a community (including between different kinds of organisms). Obtaining such mobile DNA elements can have a crucial effect on the survival and reproduction capacity of the recipient cell. Mobile DNA can therefore be a key element in the evolutionary processes as it becomes a ‘communal resource’ (McFall-Ngai et al. 2013). Acquired antibiotic resistance is only one of many benefits cells are known to obtain through acquired DNA elements.

It is then the composition of functional elements that the community as a whole contains which is preserved over evolutionary time. And the community could be seen as an assembly of biochemical activities and not of distinct microbial lineages (see for instance Turnbaugh et al. 2009 and also Burke et al. 2011). The metagenome then becomes a ‘genome of communities’ and not a ‘community of genomes’ (Doolittle & Zhaxybayeva 2010). All of this also feeds into the more general, and currently very active, discussion about the problem of individuality in biology (Clarke 2010; Bouchard & Huneman 2013; Ereshefsky & Pedroso 2013; Guay & Pradeu 2015; SEP entry on the biological notion of individual).

Apart from these issues in biological ontology, there are also epistemological issues raised by metagenomics, namely the discrepancy between our ability to sequence DNA and to interpret it. These discussions about the challenges of DNA sequence interpretation are not just a problem for (meta)genomics and other -omics approaches, but also for biomedicine more generally and its push towards a truly personalised medicine. A key issue for this push is the discrepancy between the (ever-decreasing) costs of obtaining a personal genome sequence (Bennett et al. 2005; Mardis 2006; Check 2014a,b) and the high costs of making sure the data can be appropriately interpreted (Mardis 2006; Sboner et al. 2011; Phillips et al. 2015). This problem is related to the so-called ‘bioinformatic bottleneck’, the handling and the interpretation of the large amounts of sequence data that provides the main obstacle to progress (Green et al. 2011; Desai et al. 2012; Scholz et al. 2012; Marx 2013). In the days of next-generation sequencing the sequencing step itself is no longer the rate-limiting step.

4. Outlook

Genomics is now an integral part of all of the life sciences. Not that every life scientist is now a genomicist—there are still researchers who focus on the biochemistry, development, or the molecular networks of human cells and other organisms. But the DNA sequences of the human genome and the numerous model organisms that came out of the HGP enter every laboratory, if not on a daily basis than at least at some stage of every research project. The same applies to the maps of genetic variation that were discussed in Section 3 and to the (somewhat controversial) data on functional DNA elements that the ENCODE project generated (see the supplementary document The ENCODE Project and the ENCODE Controversy).

And it is not just the quantity of data and the many new “-omes” that researchers now work with that have transformed the science. As we have pointed out in several places, insights into the genome and its functioning have transformed researchers’ understanding of the entities and processes they are working with in the course of the last few decades. Part of this was also a transformation in our understanding of what it means to do ‘good’ science. What the HPG and its various offshoots have achieved, therefore, is to change the life sciences at the epistemological, the ontological and also the methodological level.

As so often, an interesting and even pressing question is where all of this is going. Predicting the future might not be possible, but there are trends that can be identified and which can be expected to follow a similar trajectory in the near future. One such trend is the drive for big data. ‘Big’ here refers not only to the quantity but also to the different types of data collected. A derivative of this big-data drive is the goal to integrate all of the diverse data and mould it into models that can further our understanding of biological systems and the prediction of their behaviour. The relatively young discipline of systems biology, which could not be discussed in detail in this entry, will certainly play a key role in this endeavour.

Bibliography

Adams, M.D., et al., 2000, “The Genome Sequence of Drosophila melanogaster”, Science, 287(5461): 2185–2195. pmid:10731132
Amann, Rudolf I., Wolfgang Ludwig, and Karl-Heinz Schleifer, 1995, “Phylogenetic identification and in situ detection of individual microbial cells without cultivation”, Microbiological Reviews, 59(1): 143–169. pmcid:PMC239358
Amundson, Ron and George V. Lauder, 1994, “Function Without Purpose: The Uses of Causal Role Function in Evolutionary Biology”, Biology and Philosophy, 9(4): 443–469. doi:10.1007/BF00850375
Andreasen, Robin O., 1998, “A New Perspective on the Race Debate”, The British Journal for the Philosophy of Science, 49(2): 199–225. doi:10.1093/bjps/49.2.199
Ankeny, Rachel A., 2001, “Model organisms as models: understanding the ‘Lingua Franca’ of the human genome project”, Philosophy of Science, 68(3): S251–S261.
Ankeny, Rachel A. and Sabina Leonelli, 2015, “Valuing Data in Postgenomic Biology: How Data Donation and Curation Practices Challenge the Scientific Publication System”, in Postgenomics: Perspectives on Biology After the Genome, S.S. Richardson, and H. Stevens (eds.), Chapel Hill, NC: Duke University Press, pp. 126–149.
Arabidopsis Genome Initiative, 2000, “Analysis of the Genome Sequence of the Flowering Plant Arabidopsis thaliana”, Nature, 408(6814): 796–815. doi:10.1038/35048692
Baker, Monya, 2013, “Big Biology: The ‘omes Puzzle”, Nature, 494(7438): 416–419. doi:10.1038/494416a
Bamshad, Michael J., Stephen Wooding, W. Scott Watkins, Christopher T. Ostler, Mark A. Batzer, and Lynn B. Jorde, 2003, “Human Population Genetic Structure and Inference of Group Membership”, The American Journal of Human Genetics, 72(3): 578–589. doi:10.1086/368061
Barbujani, Guido and Vincenza Colonna, 2010, “Human Genome Diversity: Frequently Asked Questions”, Trends in Genetics, 26(7): 285–295. doi:10.1016/j.tig.2010.04.002
Barnes, Barry and John Dupré, 2008, Genomes and What to Make of Them, Chicago: University of Chicago Press.
Bennett, Simon T., Colin Barnes, Anthony Cox, Lisa Davies, and Clive Brown, 2005, “Toward the $1000 Human Genome”, Pharmacogenomics, 6(4): 373–382. doi:10.1517/14622416.6.4.373
Bickmore, Wendy A. and Bas van Steensel, 2013, “Genome Architecture: Domain Organization of Interphase Chromosomes”, Cell, 152(6): 1270–1284. doi:10.1016/j.cell.2013.02.001
Bikel, Shirley, Alejandra Valdez-Lara, Fernanda Cornejo-Granados, Karina Rico, Samuel Canizales-Quinteros, Xavier Soberón, Luis Del Pozo-Yauner, and Adrián Ochoa-Leyva, 2015, “Combining Metagenomics, Metatranscriptomics and Viromics to Explore Novel Microbial Interactions: Towards a Systems-level Understanding of Human Microbiome”, Computational and Structural Biotechnology Journal, 13: 390–401. doi:10.1016/j.csbj.2015.06.001
Blattner, Frederick R., et al., 1997, “The Complete Genome Sequence of Escherichia coli K-12”, Science, 277(5331): 1453–1462. doi:10.1126/science.277.5331.1453
Bolnick, Deborah A., 2008, “Individual Ancestry Inference and the Reification of Race as a Biological Phenomenon”, in Koenig et al. 2008: 70–85.
Bouchard, Frédéric and Philippe Huneman (eds.), 2013, From Groups to Individuals: Evolution and Emerging Individuality, Cambridge, MA: MIT Press.
Brenner, Sidney, 1990, “The Human Genome: The Nature of the Enterprise”, Human Genetic Information: Science, Law and Ethics, 149: 6–12. doi:10.1002/9780470513903.ch2
Burchard, Esteban González, Elad Ziv, Natasha Coyle, Scarlett Lin Gomez, Hua Tang, Andrew J. Karter, Joanna L. Mountain, Eliseo J. Pérez-Stable, Dean Sheppard, and Neil Risch, 2003, “The Importance of Race and Ethnic Background in Biomedical Research and Clinical Practice”, New England Journal of Medicine, 348(12): 1170–1175. doi:10.1056/NEJMsb025007
Burian, Richard M., 1997, “Exploratory Experimentation and the Role of Histochemical Techniques in the Work of Jean Brachet, 1938–1952”, History and Philosophy of the Life Sciences, 19(1): 27–45.
–––, 2007, “On MicroRNA and the Need for Exploratory Experimentation in Post-Genomic Molecular Biology”, History and Philosophy of the Life Sciences, 29(3): 285–312. pmid:18822659
Burke, Catherine, Peter Steinberg, Doug Rusch, Staffan Kjelleberg, and Torsten Thomas, 2011, “Bacterial Community Assembly Based on Functional Genes Rather Than Species”, Proceedings of the National Academy of Sciences, 108(34): 14288–14293.
Bustin, Michael and Tom Misteli, 2016, “Nongenetic Functions of the Genome”, Science, 352(6286): 671, aad6933 (7 pages). doi:10.1126/science.aad6933
Butler, Declan, 2010, “Human Genome at Ten: Science After the Sequence”, Nature, 465(7301): 1000–1001. doi:10.1038/4651000a
C. elegans Sequencing Consortium, 1998, “Genome Sequence of the Nematode C. elegans: A Platform for Investigating Biology”, Science, 282(5396): 2012–2018. doi:10.1126/science.282.5396.2012
Cardon, Lon R. and John I. Bell, 2001, “Association Study Designs for Complex Diseases”, Nature Reviews Genetics, 2(2): 91–99. doi:10.1038/35052543
Cargill, Michele, et al., 1999, “Characterization of Single-nucleotide Polymorphisms in Coding Regions of Human Genes”, Nature Genetics 22(3): 231–238. doi:10.1038/10290
Chakravarti, Aravinda, 1999, “Population Genetics—Making Sense Out of Sequence”, Nature Genetics, 21(1 Suppl): 56–60. doi:10.1038/4482
Check Hayden, Erica, 2010, “Human Genome at Ten: Life is Complicated”, Nature, 464(7289): 664–667. doi:10.1038/464664a
–––, 2014a, “Is the $1,000 Genome for Real?” Nature News, (15 January 2014) doi:10.1038/nature.2014.14530.
–––, 2014b, “The $1,000 Genome”, Nature, 507(7492): 294–5. doi:10.1038/507294a
Clark, Michael B., Anupma Choudhary, Martin A. Smith, Ryan J. Taft, and John S. Mattick, 2013, “The Dark Matter Rises: the Expanding World of Regulatory RNAs”, Essays in Biochemistry, 54: 1–16. doi:10.1042/bse0540001
Clarke, Ellen, 2010, “The Problem of Biological Individuality”, Biological Theory, 5(4): 312–325. doi:10.1162/BIOT_a_00068
Collins, Francis S., 2010, “Has the Revolution Arrived?” Nature, 464(7289): 674–675. doi:10.1038/464674a
Collins, Francis S. and Harold Varmus, 2015, “A New Initiative on Precision Medicine”, New England Journal of Medicine, 372(9): 793–795. doi:10.1056/NEJMp1500523
Couzin, Jennifer, 2002, “New Mapping Project Splits the Community”, Science, 296(5572): 1391. doi:10.1126/science.296.5572.1391
Crollius, Hugues Roest, et al., 2000, “Estimate of Human Gene Number Provided by Genome-wide Analysis Using Tetraodon nigroviridis DNA Sequence”, Nature Genetics, 25(2): 235–238. doi:10.1038/76118
Culligan, Earmon P., Roy D. Sleator, Julian R. Marchesi, and Colin Hill, 2014, “Metagenomics and Novel Gene Discovery: Promise and Potential for Novel Therapeutics”, Virulence, 5(3): 399–412. doi:10.4161/viru.27208
Cummins, Robert, 1975, “Functional Analysis”, The Journal of Philosophy, 72(20): 741–765. doi:10.2307/2024640
Cutter, Amber R. and Jeffrey J. Hayes, 2015, “A Brief Review of Nucleosome Structure”, FEBS Letters, 589(20): 2914–2922. doi:10.1016/j.febslet.2015.05.016
Daly, Mark J., John D. Rioux, Stephen F. Schaffner, Thomas J. Hudson, and Eric S. Lander, 2001, “High-resolution Haplotype Structure in the Human Genome”, Nature Genetics, 29(2): 229–232. doi:10.1038/ng1001-229
Dawkins, Richard, 1976, The Selfish Gene, Oxford: Oxford University Press.
Desai, Narayan, Dion Antonopoulos, Jack A. Gilbert, Elizabeth M. Glass, and Folker Meyer, 2012, “From Genomics to Metagenomics”, Current Opinion in Biotechnology, 23(1): 72–76.
Dolinski, Kara and Olga G. Troyanskaya, 2015, “Implications of Big Data for Cell Biology”, Molecular Biology of the Cell, 26(14): 2575–2578. doi:10.1091/mbc.E13-12-0756
Doolittle, W. Ford, 2013, “Is Junk DNA Bunk? A Critique of ENCODE”, Proceedings of the National Academy of Sciences, 110(14): 5294–5300. doi:10.1073/pnas.1221376110
Doolittle, W. Ford and Carmen Sapienza, 1980, “Selfish Genes, the Phenotype Paradigm and Genome Evolution”, Nature, 284(5757): 601–603. doi:10.1038/284601a0
Doolittle, W. Ford and Olga Zhaxybayeva, 2010, “Metagenomics and the Units of Biological Organization”, Bioscience, 60(2): 102–112. doi:10.1525/bio.2010.60.2.5
Dulbecco, Renato, 1986, “A Turning Point in Cancer Research: Sequencing the Human Genome”, Science, 231(4742): 1055–1056. doi:10.1126/science.3945817
Dupré, John, 2005, “Are There Genes?” Royal Institute of Philosophy Supplement, 56: 193–211. doi:10.1017/S1358246105056092
–––, 2008, “What Genes Are, and Why There Are No ‘Genes for Race’”, in Revisiting Race in a Genomic Age, Barbara A. Koenig, Sandra Soo-Jin Lee and Sarah S. Richardson (eds.), New Brunswick, N.J.: Rutgers University Press, 2008 pp. 39–55.
–––, 2010, “The Polygenomic Organism”, The Sociological Review, 58(s1): 19–31. doi:10.1111/j.1467-954X.2010.01909.x
–––, 2012, Processes of Life: Essays in the Philosophy of Biology, Oxford: Oxford University Press.
Dupré, John and Maureen A. O’Malley, 2007, “Metagenomics and Biological Ontology”, Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 38(4): 834–846. doi:10.1016/j.shpsc.2007.09.001
Duster, Troy, 2005, “Race and Reification in Science”, Science, 307(5712): 1050–1051. doi:10.1126/science.1110303
–––, 2006, “The Molecular Reinscription of Race: Unanticipated Issues in Biotechnology and Forensic Science”, Patterns of Prejudice, 40(4–5): 427–441.
–––, 2015, “A Post-genomic Surprise. the Molecular Reinscription of Race in Science, Law and Medicine”, The British Journal of Sociology, 66(1): 1–27. doi:10.1111/1468-4446.12118
Ecker, Joseph R., Wendy A. Bickmore, Inês Barroso, Jonathan K. Pritchard, Yoav Gilad, and Eran Segal, 2012, “Genomics: ENCODE Explained”, Nature, 489(7414): 52–55. doi:10.1038/489052a
Eddy, Sean R., 2012, “The C-value Paradox, Junk DNA and ENCODE”, Current Biology, 22(21): R898–R899. doi:10.1016/j.cub.2012.10.002
–––, 2013, “The ENCODE Project: Missteps Overshadowing a Success”, Current Biology, 23(7): R259–R261.
Edwards, A.W.F., 2003, “Human Genetic Diversity: Lewontin’s Fallacy”, BioEssays, 25(8): 798–801. doi:10.1002/bies.10315
Ehret, Charles F. and Gérard De Haller, 1963, “Origin, Development, and Maturation of Organelles and Organelle Systems of the Cell Surface in Paramecium”, Journal of Ultrastructure Research, 9(Suppl 6): 1–42. doi:10.1016/S0022-5320(63)80088-X
Eichler, E.E., J. Flint, G. Gibson, A. Kong, S.M. Leal, J.H. Moore, and J.H. Nadeau, 2010, “Missing Heritability and Strategies for Finding the Underlying Causes of Complex Disease”, Nature Reviews Genetics, 11(6): 446–450. doi:10.1038/nrg2809
Eisen, Jonathan A., 2012, “Badomics Words and the Power and Peril of the ome-meme”, GigaScience, 1(1): 6. doi:10.1186/2047-217X-1-6
Ekkers, David Matthias, Mariana Silvia Cretoiu, Anna Maria Kielak, and Jan Dirk van Elsas, 2012, “The Great Screen Anomaly—A New Frontier in Product Discovery Through Functional Metagenomics”, Applied Microbiology and Biotechnology, 93(3): 1005–1020. doi:10.1007/s00253-011-3804-3
Elliott, Kevin C., 2007, “Varieties of Exploratory Experimentation in Nanotoxicology”, History and Philosophy of the Life Sciences, 29(3): 313–336.
Elliott, Tyler A., Stefan Linquist, and T. Ryan Gregory, 2014, “Conceptual and Empirical Challenges of Ascribing Functions to Transposable Elements”, The American Naturalist, 184(1): 14–24. doi:10.1086/676588
ENCODE Project Consortium, 2004, “The ENCODE (ENCyclopedia of DNA elements) Project”, Science, 306(5696): 636–640. doi:10.1126/science.1105136
–––, 2007, “Identification and Analysis of Functional Elements in 1% of the Human Genome by the ENCODE Pilot Project”, Nature, 447(7146): 799–816. doi:10.1038/nature05874
–––, 2012, “An Integrated Encyclopedia of DNA Elements in the Human Genome”, Nature, 489(7414): 57–74. doi:10.1038/nature11247
Ereshefsky, Marc and Makmiller Pedroso, 2013, “Biological Individuality: the Case of Biofilms”, Biology & Philosophy, 28(2): 331–349. doi:10.1007/s10539-012-9340-4
Feldman, Marcus W. and Richard C. Lewontin, 2008, “Race, Ancestry, and Medicine”, in Koenig 2008: 89–101.
Fields, Stanley and Mark Johnston, 2002, “A Crisis in Postgenomic Nomenclature”, Science, 296(5568): 671–672. doi:10.1126/science.1070208
Fleischmann, R.D., et al., 1995, “Whole-genome Random Sequencing and Assembly of Haemophilus influenzae Rd.” Science, 269(5223): 496–512. doi:10.1126/science.7542800
Foster, Morris W. and Richard R. Sharp, 2002, “Race, Ethnicity, and Genomics: Social Classifications as Proxies of Biological Heterogeneity”, Genome Research, 12(6): 844–850. doi:10.1101/gr.99202
Franklin, L.R., 2005, “Exploratory Experiments”, Philosophy of Science, 72(5): 888–899. doi:10.1086/508117
Fujimura, Joan H., D.A. Bolnick, R. Rajagopalan, J.S. Kaufman, R.C. Lewontin, T. Duster, P. Ossorio, and J. Marks, 2014, “Clines Without Classes How to Make Sense of Human Variation”, Sociological Theory, 32(3): 208–227. doi:10.1177/0735275114551611
Gabor, Esther, Klaus Liebeton, Frank Niehaus, Juergen Eck, and Patrick Lorenz, 2007, “Updating the Metagenomics Toolbox”, Biotechnology Journal, 2(2): 201–206. doi:10.1002/biot.200600250
Gannett, Lisa, 2001, “Racism and Human Genome Diversity Research: the Ethical Limits of ‘Population Thinking’”, Philosophy of Science, 68(3): S479–S492. doi:10.1086/392930
–––, 2004, “The Biological Reification of Race”, The British Journal for the Philosophy of Science, 55(2): 323–345. doi:10.1093/bjps/55.2.323
–––, 2010, “Questions Asked and Unasked: How by Worrying Less About the ‘Really Real’Philosophers of Science Might Better Contribute to Debates About Genetics and Race”, Synthese, 177(3): 363–385. doi:10.1007/s11229-010-9788-1
Germain, Pierre-Luc, Emanuela Ratti, and Frederico Boem, 2014, “Junk or Functional DNA? ENCODE and the Function Controversy”, Biology & Philosophy, 29(6): 807–831. doi:10.1007/s10539-014-9441-3
Gilbert, Walter, 1992, “A Vision of the Grail”, in The Code of Codes: Scientific and Social Issues in the Human Genome Project, Daniel Kevles, and Leroy Hood (eds.), Cambridge, MA: Harvard University Press, pp. 83–97.
Godfrey-Smith, Peter, 1994, “A Modern History Theory of Functions”, Noûs, 28(3): 344–362. doi:10.2307/2216063
Goffeau, A., et al., 1996, “Life with 6000 Genes”, Science, 274(5287): 546–567. doi:10.1126/science.274.5287.546
Goodman, Alan H., 2000, “Why Genes Don’t Count (For Racial Differences in Health)”, American Journal of Public Health, 90(11): 1699. pmcid:PMC1446406
Graur, Dan, 2013, “The Origin of the Term ‘Junk DNA’: A Historical Whodunnit”, Judge Starling (blog), October 19, 2013, Graur 2013 available online>
Graur, D., Y. Zheng, N. Price, R.B. Azevedo, R.A. Zufall, and E. Elhaik, 2013, “On the Immortality of Television Sets: ‘Function’ in the Human Genome According to the Evolution-free Gospel of ENCODE”, Genome Biology and Evolution, 5(3): 578–590. doi:10.1093/gbe/evt028
Green, Eric D., Mark S. Guyer, and National Human Genome Research Institute, 2011, “Charting a Course for Genomic Medicine from Base Pairs to Bedside”, Nature 470(7333): 204–213. doi:10.1038/nature09764
Gregory, T. Ryan, 2001, “Coincidence, Coevolution, or Causation? DNA Content, Cell Size, and the C-value Enigma”, Biological Reviews, 76(1): 65–101. pmid:11325054
–––, 2005, “Synergy Between Sequence and Size in Large-scale Genomics”, Nature Reviews Genetics, 6(9): 699–708. doi:10.1038/nrg1674
–––, 2007, “The onion test”, Evolver Zone Genomicron, April 25, 2007, Gregory 2001 available online.
Griffiths, Paul E., 1992, “Adaptive Explanation and the Concept of a Vestige”, in Trees of Life. Essays in Philosophy of Biology, Paul E. Griffiths (ed.), Dordrecht: Kluwer Academic Publishers, pp. 111–131. doi:10.1007/978-94-015-8038-0_5
–––, 1993, “Functional Analysis and Proper Functions”, British Journal of the Philosophy of Science, 44(3): 409–422. doi:10.1093/bjps/44.3.409
–––, 1994, “Cladistic Classification and Functional Explanation”, Philosophy of Science, 61(2): 206–227. doi:10.1086/289796
–––, 2006, “Function, Homology, and Character Individuation”, Philosophy of Science, 73(1): 1–25. doi:10.1086/510172
Griffiths, Paul E. and Karola Stotz, 2006, “Genes in the Postgenomic Era”, Theoretical Medicine and Bioethics, 27(6): 499–521. doi:10.1007/s11017-006-9020-y
Guay, Alexandre and Thomas Pradeu (eds.), 2015, Individuals Across the Sciences, Oxford: Oxford University Press.
Hahn, Matthew W. and Gregory A. Wray, 2002, “The g-value Paradox”, Evolution and Development, 4(2): 73–75. doi:10.1046/j.1525-142X.2002.01069.x
Hall, Stephen S., 2010, “Revolution Postponed”, Scientific American, 303(4): 60–67. doi:10.1038/scientificamerican1010-60
Hamilton, Jennifer A., 2008, “Revitalizing Difference in the HapMap: Race and Contemporary Human Genetic Variation Research”, The Journal of Law, Medicine & Ethics, 36(3): 471–477. doi:10.1111/j.1748-720X.2008.293.x
Handelsman, Jo, Michelle R. Rondon, Sean F. Brady, Jon Clardy, and Robert M. Goodman, 1998, “Molecular Biological Access to the Chemistry of Unknown Soil Microbes: A New Frontier for Natural Products”, Chemistry & Biology, 5(10): R245–R249. doi:10.1016/S1074-5521(98)90108-9
[HapMap] The International HapMap Consortium, 2003, “The International HapMap Project”, Nature, 426(6968): 789–796. doi:10.1038/02168
–––, 2005, “A Haplotype Map of the Human Genome”, Nature, 437(7063): 1299–1320. doi:10.1038/nature04226
–––, 2010, “Integrating Common and Rare Genetic Variation in Diverse Human Populations”, Nature, 467(7311): 52–58. doi:10.1038/nature09298
Harrow, Jennifer, A. Frankish, J.M. Gonzalez, E. Tapanari, M. Diekhans, F. Kokocinski, B.L. Aken et al., 2012, “GENCODE: The Reference Human Genome Annotation for the ENCODE Project”, Genome Research, 22(9): 1760–1774. doi:10.1101/gr.135350.111
Holm-Hansen, Osmund, 1969, “Algae: Amounts of DNA and Organic Carbon in Single Cells”, Science, 163(3862): 87–88. doi:10.1126/science.163.3862.87
International Human Genome Sequencing Consortium, 2001, “Initial Sequencing and Analysis of the Human Genome”, Nature, 409(6822): 860–921. doi:10.1038/35057062
–––, 2004, “Finishing the Euchromatic Sequence of the Human Genome”, Nature, 431(7011): 931–945. doi:10.1038/nature03001
Jones, Peter A., 2012, “Functions of DNA Methylation: Islands, Start Sites, Gene Bodies and Beyond”, Nature Reviews Genetics, 13(7): 484–492. doi:10.1038/nrg3230
Jorde, Lynn B. and Stephen P. Wooding, 2004, “Genetic Variation, Classification and ‘Race’”, Nature Genetics, 36: S28–S33. doi:10.1038/ng1435
Kalinowski, S.T., 2011, “The Computer Program STRUCTURE Does Not Reliably Identify the Main Genetic Clusters Within Species: Simulations and Implications for Human Population Structure”, Heredity, 106(4): 625–632. doi:10.1038/hdy.2010.95
Karaca, Koray, 2013, “The Strong and Weak Senses of Theory-ladenness of Experimentation: Theory-driven Versus Exploratory Experiments in the History of High-energy Particle Physics”, Science in Context, 26(1): 93–136. doi:10.1017/S0269889712000300
Kaufman, Jay S. and Richard S. Cooper, 2001, “Commentary: Considerations for Use of Racial/Ethnic Classification in Etiologic Research”, American Journal of Epidemiology, 154(4): 291–298. doi:10.1093/aje/154.4.291
Keller, Evelyn Fox, 2000, The Century of the Gene, Cambridge, MA: Harvard University Press.
–––, 2011, “Genes, Genomes, and Genomics”, Biological Theory, 6(2): 132–140. doi:10.1007/s13752-012-0014-x
Kellis, Manolis, et al., 2014, “Defining Functional DNA Elements in the Human Genome”, Proceedings of the National Academy of Sciences, 111(17): 6131–6138. doi:10.1073/pnas.1318948111
Kitcher, Philip, 1994, “Who’s Afraid of the Human Genome Project?”, in PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association, 2: 313–321. doi:10.1086/psaprocbienmeetp.1994.2.192941
–––, 1999, “Race, Ethnicity, Biology, Culture”, in Racism (Key Concepts in Critical Theory), Leonard Harris (ed.), New York: Humanity Books, pp. 87–120.
Koenig, Barbara A., Sandra Soo-Jin Lee, and Sharon S. Richardson (eds.), 2008, Revisiting Race in a Genomic Age, New Brunswick: Rutgers University Press.
Koshland, Daniel E. Jr, 1989, “Sequences and Consequences of the Human Genome”, Science, 246(4927): 189. doi:10.1126/science.2799380
Krieger, Nancy, 2000, “Refiguring ‘Race’: Epidemiology, Racialized Biology, and Biological Expressions of Race Relations”, International Journal of Health Services, 30(1): 211–216. doi:10.2190/672J-1PPF-K6QT-9N7U
Kuska, Bob, 1998, “Beer, Bethesda, and Biology: How ‘Genomics’ Came into Being”, Journal of the National Cancer Institute, 90(2): 93–93. doi:10.1093/jnci/90.2.93
Lander, Eric S., 1996, “The New Genomics: Global Views of Biology”, Science, 274(5287): 536. doi:10.1126/science.274.5287.536
Lederberg, Joshua and Alexa T. McCray, 2001, “’Ome Sweet ’Omics—A Genealogical Treasury of Words”, The Scientist, 15(7): 8.
Ledford, Heidi, 2016, “AstraZeneca Launches Project to Sequence 2 Million Genomes”, Nature, 532(7600): 427. doi:10.1038/nature.2016.19797
Leonelli, Sabina, 2014, “What Difference Does Quantity Make? on the Epistemology of Big Data in Biology”, Big Data & Society, 1(1), p.2053951714534395. doi:10.1177/2053951714534395
–––, 2016, Data-Centric Biology: A Philosophical Study, Chicago, IL: Chicago University Press.
Leroi, Armand Marie, 2005, “A Family Tree in Every Gene”, New York Times, March 14, A23.
Levy, Samuel, et al., 2007, “The Diploid Genome Sequence of An Individual Human”, PLoS Biology, 5(10): e254. doi:10.1371/journal.pbio.0050254
Lewontin, Richard C., 1992, Biology as Ideology, New York: Harper Collins.
–––, 2011, “It’s Even Less in Your Genes”, The New York Review of Books, 58(9).
Li, W.H. and L.A. Sadler, 1991, “Low Nucleotide Diversity in Man”, Genetics, 129(2): 513–523.
Lindblad-Toh, Kerstin, et al., 2011, “A High-resolution Map of Human Evolutionary Constraint Using 29 Mammals”, Nature, 478(7370): 476–482. doi:10.1038/nature10530
Livingstone, Frank B., 1962, “On the Non-Existence of Human Races”, Current Anthropology, 3(3): 279–281. doi:10.1086/200290
Lorenz, Patrick and Jürgen Eck, 2005, “Metagenomics and Industrial Applications”, Nature Reviews Microbiology, 3(6): 510–516. doi:10.1038/nrmicro1161
Lupski, James R., 1998, “Genomic Disorders: Structural Features of the Genome Can Lead to DNA Rearrangements and Human Disease Traits”, Trends in Genetics, 14(10): 417–422. doi:10.1016/S0168-9525(98)01555-8
–––, 2009, “Genomic Disorders Ten Years On”, Genome Medicine, 1(4): 42. doi:10.1186/gm42
Luria, S.E., Dan M. Cooper, and Ari Berkowitz, 1989, “Human Genome Project”, Science, 246(4932):873–874. doi:10.1126/science.246.4932.873-b doi:10.1126/science.246.4932.873-d doi:10.1126/science.2814503
Mardis, Elaine R., 2006, “Anticipating the $1,000 Genome”, Genome Biology, 7(7): 112. doi:10.1186/gb-2006-7-7-112
–––, 2011, “A Decade’s Perspective on DNA Sequencing Technology”, Nature, 470(7333): 198–203. doi:10.1038/nature09796
Marks, Jonathan, 2008, “Race: Past, Present, and Future”, in Koenig 2008: 21–38.
Marx, Vivian, 2013, “Biology: the Big Challenges of Big Data”, Nature, 498(7453): 255–260. doi:10.1038/498255a
McClellan, Jon and Mary-Claire King, 2010, “Genetic Heterogeneity in Human Disease”, Cell, 141(2): 210–217. doi:10.1016/j.cell.2010.03.032
McFall-Ngai, M., et al., 2013, “Animals in a Bacterial World, a New Imperative for the Life Sciences”, Proceedings of the National Academy of Sciences, 110(9): 3229–3236. doi:10.1073/pnas.1218525110
Millikan, Ruth Garrett, 1984, Language, Thought, and Other Biological Categories: New Foundations for Realism, Cambridge, MA: MIT Press.
–––, 1989a, “In Defense of Proper Functions”, Philosophy of Science, 56(2): 288–302. doi:10.1086/289488
–––, 1989b, “An Ambiguity in the Notion ‘Function’”, Biology and Philosophy, 4(2): 172–176.
Mirsky, A.E. and Hans Ris, 1951, “The Desoxyribonucleic Acid Content of Animal Cells and Its Evolutionary Significance”, The Journal of General Physiology, 34(4): 451–462. doi:10.1085/jgp.34.4.451
Moss, Lenny, 2003, What Genes Can’t Do, Cambridge, MA: MIT Press.
–––, 2006, “Redundancy, Plasticity, and Detachment: the Implications of Comparative Genomics for Evolutionary Thinking”, Philosophy of Science, 73(5): 930–946. doi:10.1086/518778
Myers, Eugene W., et al., 2000, “A Whole-Genome Assembly of Drosophila”, Science, 287(5461): 2196–2204. doi:10.1126/science.287.5461.2196
National Research Council (Committee on Metagenomics: Challenges and Functional Applications), 2007, The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet, National Research Council Report 13, Washington DC: National Academies Press. doi:10.17226/11902
Neander, Karen, 1991, “Functions as Selected Effects: the Conceptual Analyst’s Defense”, Philosophy of Science, 58(2): 168–184. doi:10.1086/289610
Nicholson, D. and John Dupré (eds), forthcoming, Everything Flows: Towards a Processual Philosophy of Biology, Oxford: Oxford University Press.
NIH: National Institutes of Health, 2002, “International Consortium Launches Genetic Variation Mapping Project: HapMap Will Help Identify Genetic Contributions to Common Diseases”, NIH News Advisory, October 2002, NIH 2002 available online.
–––, 2004, “NHGRI Seeks Next Generation of Sequencing Technologies New Grants Support Development of Faster, Cheaper DNA Sequencing”, NIH News Release, October 14, 2004, NIH 2004 available online.
–––, 2015, “NIH Framework Points the Way Forward for Building National, Large-scale Research Cohort, a Key Component of the President’S Precision Medicine Initiative”, NIH News Releases, September 17, 2015. NIH 2015 available online.
–––, 2016, “What is a genome?”, Genetics Home Reference: Your Guide to Understanding Genetic Conditions, NIH: U.S. National Library of Medicine, NIH 2016 available online, accessed October 10, 2016.
Niu, Deng-Ke and Li Jiang, 2013, “Can ENCODE Tell Us How Much Junk DNA We Carry in Our Genome?” Biochemical and Biophysical Research Communications, 430(4): 1340–1343. doi:10.1016/j.bbrc.2012.12.074
Ohno, Susumu, 1972, “So Much ‘Junk’ DNA in Our Genome”, in Brookhaven Symposium on Biology, 23, Routledge, pp. 366–370.
O’Malley, Maureen A., 2007, “Exploratory Experimentation and Scientific Practice: Metagenomics and the Proteorhodopsin Case”, History and Philosophy of the Life Sciences, 29(3): 337–358.
–––, 2013, “Metagenomics”, in Encyclopedia of Systems Biology, W. Dubitzky, O. Wolkenhauer, H. Yokota, and K.-H. Cho (eds.), Springer, p. 1283.
O’Malley, Maureen A. and John Dupré, 2005, “Fundamental Issues in Systems Biology”, BioEssays, 27(12): 1270–1276. doi:10.1002/bies.20323
O’Malley, Maureen A. and Orkun S. Soyer, 2012, “The Roles of Integration in Molecular Systems Biology”, Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 43(1): 58–68. doi:10.1016/j.shpsc.2011.10.006
O’Malley, Maureen A., Kevein C. Elliott, and Richard M. Burian, 2010, “From Genetic to Genomic Regulation: Iterativity in MicroRNA Research”, Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 41(4): 407–417. doi:10.1016/j.shpsc.2010.10.011
Orgel, L.E. and F.H. Crick, 1980, “Selfish DNA: the Ultimate Parasite”, Nature, 284(5757): 604–607. doi:10.1038/284604a0
Ossorio, Pilar N., 2005, “Race, Genetic Variation, and the Haplotype Mapping Project”, Louisiana Law Review, 66(5): 131–143.
Palca, Joseph, 1986, “The Numbers Game”, Nature, 321(6068): 371. doi:10.1038/321371b0
Pennisi, Elizabeth, 2012, “ENCODE Project Writes Eulogy for Junk DNA”, Science, 337(6099): 1159–1161. doi:10.1126/science.337.6099.1159
Perini, Laura, 2011, “Sequence Matters: Genomic Research and the Gene Concept”, Philosophy of Science, 78(5): 752–762. doi:10.1086/662565
Phillips, Kathryn A., Mark J. Pletcher, and Uri Ladabaum, 2015, “Is the ‘$1000 Genome’ Really $1000? Understanding the Full Benefits and Costs of Genomic Sequencing”, Technology and Health Care: Official Journal of the European Society for Engineering and Medicine, 23(3): 373–379.
Pigliucci, Massimo and Jonathan Kaplan, 2003, “On the Concept of Biological Race and Its Applicability to Humans”, Philosophy of Science, 70(5): 1161–1172. doi:10.3233/THC-150900
Piotrowska, Monika, 2009, “What Does it Mean to be 75% Pumpkin? The Units of Comparative Genomics”, Philosophy of Science, 76(5): 838–850. doi:10.1086/605813
Qu, Hongzhu and Xiangdong Fang, 2013, “A Brief Review on the Human Encyclopedia of DNA Elements (ENCODE) Project”, Genomics, Proteomics & Bioinformatics, 11(3): 135–141. doi:10.1016/j.gpb.2013.05.001
Reardon, Sara, 2015, “US Precision-medicine Proposal Sparks Questions”, Nature, 517(7536): 540. doi:10.1038/nature.2015.16774
Rechsteiner, Martin C., 1991, “The Human Genome Project: Misguided Science Policy”, Trends in Biochemical Sciences, 16: 455–461. doi:10.1016/0968-0004(91)90178-X
Richardson, Sarah S. and Hallam Stevens (eds.), 2015, Postgenomics: Perspectives on Biology After the Genome, Chapel Hill, NC: Duke University Press.
Risch, Neil, Esteban Burchard, Elad Ziv, and Hua Tang, 2002, “Categorization of Humans in Biomedical Research: Genes, Race and Disease”, Genome Biology, 3(7): 1–12.
Roberts, Dorothy, 2011, Fatal Invention: How Science, Politics, and Big Business Re-create Race in the Twenty-first Century, New York: The New Press.
Rosenberg, Alex, 1994, “Subversive Reflections on the Human Genome Project”, in PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association, 2: 329–335. doi:10.1086/psaprocbienmeetp.1994.2.192943
Rosenberg, Noah A., J.K. Pritchard, J.L. Weber, H.M. Cann, K.K. Kidd, L.A. Zhivotovsky, and M.W. Feldman, 2002, “Genetic Structure of Human Populations”, Science, 298(5602): 2381–2385. doi:10.1126/science.1078311
Rothfels, Klaus, Elizabeth Sexsmith, Margaret Heimburger, and Margarida O. Krause, 1966, “Chromosome Size and DNA Content of Species of Anemone L. and Related Genera (Ranunculaceae)”, Chromosoma, 20(1): 54–74. doi:10.1007/BF00331898
Sanger, F. et al., 1977, “Nucleotide Sequence of Bacteriophage ΦX174 DNA”, Nature, 265(5596): 687–695. doi:10.1038/265687a0
Sboner, A., X.J. Mu, D. Greenbaum, R.K. Auerbach, and M.B. Gerstein, 2011, “The Real Cost of Sequencing: Higher Than You Think”, Genome Biology, 12(8): 125. doi: 10.1186/gb-2011-12-8-125.
Scholz, Matthew B., Chien-Chi Lo, and Patrick SG Chain, 2012, “Next Generation Sequencing and Bioinformatic Bottlenecks: The Current State of Metagenomic Data Analysis”, Current Opinion in Biotechnology, 23(1): 9–15. doi:10.1016/j.copbio.2011.11.013
Schwartz, Robert S., 2001, “Racial Profiling in Medical Research”, New England Journal of Medicine, 344(18): 1392–1393. doi: 10.1056/NEJM200105033441810
Serre, David and Svante Pääbo, 2004, “Evidence for Gradients of Human Genetic Diversity Within and Among Continents”, Genome Research, 14(9): 1679–1685. doi:10.1101/gr.2529604
Shiao, Jiannbin Lee, Thomas Bode, Amber Beyer, and Daniel Selvig, 2012, “The Genomic Challenge to the Social Construction of Race”, Sociological Theory, 30(2): 67–88. doi:10.1177/0735275112448053
Sinsheimer, Robert L., 1989, “The Santa Cruz Workshop—May 1985”, Genomics, 5(4): 954–956. doi:10.1016/0888-7543(89)90142-0
Sleator, Roy D., C. Shortall, and C. Hill, 2008, “Metagenomics”, Letters in Applied Microbiology, 47(5): 361–366. doi:10.1111/j.1472-765X.2008.02444.x
Staley, James T. and Allan Konopka, 1985, “Measurement of in Situ Activities of Nonphotosynthetic Microorganisms in Aquatic and Terrestrial Habitats”, Annual Reviews in Microbiology, 39(1): 321–346. doi:10.1146/annurev.mi.39.100185.001541
Steinle, Friedrich, 1997, “Entering New Fields: Exploratory Uses of Experimentation”, Philosophy of Science, 64(Proceedings): S65–S74. doi:10.1086/392587
Stolley, Paul D., 1999, “Race in Epidemiology”, International Journal of Health Services, 29(4): 905–909. doi:10.2190/QAAH-P5DT-WMP8-8HNL
Stotz, Karola C., A. Bostanci, and Paul E. Griffiths, 2006, “Tracking the Shift to ‘Postgenomics’”, Public Health Genomics, 9(3): 190–196. doi:10.1159/000092656
Streit, Wolfgan R. and Ruth A. Schmitz, 2004, “Metagenomics—The Key to the Uncultured Microbes”, Current Opinion in Microbiology, 7(5): 492–498. doi:10.1016/j.mib.2004.08.002
Tang, Hua, et al., 2005, “Genetic Structure, Self-identified Race/Ethnicity, and Confounding in Case-Control Association Studies”, The American Journal of Human Genetics, 76(2): 268–275. doi:10.1086/427888
Tauber, Alfred I. and Sahotra Sarkar, 1992, “The Human Genome Project: Has Blind Reductionism Gone Too Far?” Perspectives in Biology and Medicine, 35(2): 220–235. doi:10.1353/pbm.1992.0015
Thomas, C.A. Jr., 1971, “The Genetic Organization of Chromosomes”, Annual Review of Genetics, 5(1): 237–256. doi:10.1146/annurev.ge.05.120171.001321
Touchman, Jeffrey, 2010, “Comparative Genomics”, Nature Education Knowledge, 3(10): 13.
Turnbaugh, Peter J. and Jeffrey I. Gordon, 2008, “An Invitation to the Marriage of Metagenomics and Metabolomics”, Cell, 134(5): 708–713. doi:10.1016/j.cell.2008.08.025
Turnbaugh, Peter J., et al., 2009, “A Core Gut Microbiome in Obese and Lean Twins”, Nature, 457(7228): 480–484. doi:10.1038/nature07540
Tyson, G.W., J. Chapman, P. Hugenholtz, E.E. Allen, R.J. Ram, P.M. Richardson, V.V. Solovyev, E.M. Rubin, D.S. Rokhsar, and J.F. Banfield, 2004, “Community Structure and Metabolism Through Reconstruction of Microbial Genomes from the Environment”, Nature, 428(6978): 37–43.
Varmus, Harold, 2010, “Ten Years On—The Human Genome and Medicine”, New England Journal of Medicine, 362(21): 2028–2029. doi:10.1056/NEJMe0911933
Venter, J. Craig, 2000, “Remarks at the Human Genome Announcement”, Functional & Integrative Genomics, 1(3): 154–155. doi: 10.1007/s101420000026
–––, 2010, “Multiple Personal Genomes Await”, Nature, 464(7289): 676–677. doi:10.1038/464676a
Venter, J. Craig, et al., 2001, “The Sequence of the Human Genome”, Science, 291(5507): 1304–1351. doi:10.1126/science.1058040
Venter, J. Craig, et al., 2004, “Environmental Genome Shotgun Sequencing of the Sargasso Sea”, Science, 304(5667): 66–74. doi:10.1126/science.1093857
Visscher, Peter M., Matthew A. Brown, Mark I. McCarthy, and Jian Yang, 2012, “Five Years of GWAS Discovery”, The American Journal of Human Genetics, 90(1): 7–24. doi:10.1016/j.ajhg.2011.11.029
Wade, Nicholas, 2010, “A Decade Later, Genetic Map Yields Few New Cures”, New York Times, June 13, 2010, page 1.
Wang, David G., et al., 1998, “Large-Scale Identification, Mapping, and Genotyping of Single-Nucleotide Polymorphisms in the Human Genome”, Science, 280(5366): 1077–1082. doi:10.1126/science.280.5366.1077
Ward, Lucas D. and Manolis Kellis, 2012, “Evidence of Abundant Purifying Selection in Humans for Recently Acquired Regulatory Functions”, Science, 337(6102): 1675–1678. doi:10.1126/science.1225057
Waters, C. Kenneth, 2007a, “The Nature and Context of Exploratory Experimentation: An Introduction to Three Case Studies of Exploratory Research”, History and Philosophy of the Life Sciences, 29(3): 275–284.
–––, 2007b, “Causes that Make a Difference”, The Journal of Philosophy, 104(11): 551–579. doi:10.5840/jphil2007104111
Weinberg, Robert A., 1991, “The Human Genome Initiative. There Are Two Large Questions”, The FASEB Journal, 5(1): 78.
–––, 2010, “Point: Hypotheses First”, Nature, 464(7289): 678–678. doi:10.1038/464678a
Wheeler, David A., et al., 2008, “The Complete Genome of An Individual by Massively Parallel DNA Sequencing”, Nature, 452(7189): 872–876. doi:10.1038/nature06884
White House, 2000, “Remarks Made by the President, Prime Minister Tony Blair of England (via satellite), Dr. Francis Collins, Director of the National Human Genome Research Institute, and Dr. Craig Venter, President and Chief Scientific Officer, Celera Genomics Corporation, on the Completion of the First Survey of the Entire Human Genome Project”, June 26, White House 2000 available online.
Whitfield, T.W., J. Wang, P.J. Collins, E.C. Partridge, S.F. Aldred, N.D. Trinklein, R.M. Myers, and Z. Weng, 2012, “Functional Analysis of Transcription Factor Binding Sites in Human Promoters”, Genome Biology, 13(9): R50, doi:10.1186/gb-2012-13-9-r50.
Winkler, Hans, 1920, Verbreitung und Ursache der Parthenogenesis im Pflanzen- und Tierreiche, Jena: Fischer Verlag.
Witzig, Ritchie, 1996, “The Medicalization of Race: Scientific Legitimization of a Flawed Social Construct”, Annals of Internal Medicine, 125(8): 675–679. doi:10.7326/0003-4819-125-8-199610150-00008
Wood, Alastair J., 2001, “Racial Differences in the Response to Drugs—Pointers to Genetic Differences”, New England Journal of Medicine, 344(18): 1394–1396. doi:10.1056/NEJM200105033441811
Wright, Larry, 1976, Teleological Explanation: An Etiological Analysis of Goals and Functions, Berkeley: University of California Press.
Wright, Mathew W. and Elspeth A. Bruford, 2011, “Naming ‘Junk’: Human Non-protein Coding RNA (NcRNA) Gene Nomenclature”, Human Genomics, 5(2): 90–98. doi:10.1186/1479-7364-5-2-90
Xie, Hong-Guang, Richard B. Kim, Alastair J.J. Wood, and C. Michael Stein, 2001, “Molecular Basis of Ethnic Differences in Drug Disposition and Response”, Annual Review of Pharmacology and Toxicology, 41(1): 815–850. doi:10.1146/annurev.pharmtox.41.1.815
Yadav, Satya P., 2007, “The Wholeness in Suffix -omics, -omes, and the Word Om”, Journal of Biomolecular Techniques, 18(5): 277. pmcid:PMC2392988
Yudell, Michael, 2011, “A Short History of the Race Concept”, in Race and the Genetic Revolution: Science, Myth, and Culture, Sheldon Krimsky and Kathleen Sloan (eds.), New York: Columbia University Press, pp. 13–30.

Academic Tools

How to cite this entry.

Preview the PDF version of this entry at the Friends of the SEP Society.

Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO).

Enhanced bibliography for this entry at PhilPapers, with links to its database.

Other Internet Resources

National Human Genome Research Institute (NHGRI) [https://www.genome.gov/]
- A brief guide to genomics
- All about the Human Genome Project (HGP)
- About NHGRI: A Brief History and Timeline
- Talking Glossary: genome, accessed: 2016-Oct-10
- Talking Glossary: allele, accessed: 2016-Oct-10
White House, Precision Medicine Initiative
National Institutes of Health
Genome Information by organism, National Center for Biotechnology Information (NCBI)
Animal Genome Size Database, maintained by T. Ryan Gregory, University of Guelph, Canada.
-omes and -omics lists
- -Omes and -omics glossary & taxonomy, Cambridge Healthtech Institute,
- Alphabetically ordered list of omes and omics, Omics.org.
A primer on DNA sequencing, Genome News Network
A video explaining the Sanger method of DNA sequencing
The International Sheep HapMap
Nature magazine ENCODE publications explorer
Scitable, Nature Education, Nature Publishing Group,
- Definition: allele, accessed: 2016-Oct-10
- Definition: genome, accessed: 2016-Oct-10
Department of Energy Metagenome Program

This is a file in the archives of the Stanford Encyclopedia of Philosophy.
Please note that some links may no longer be functional.

Genomics and Postgenomics