## Notes to Phylogenetic Inference

1. Conventionally, phylogenetic trees are often oriented with time passing from left to right or bottom to top, with many exceptions. The oldest part of a tree is called the “root”, with branches and nodes describing the genealogical path to “terminal taxa” at the tree’s tips. In figure 1 the terminal taxa are extant groups, though they need not be. Likewise, in figure 1 the branch lengths represent the amount of molecular evolution along the branch, whereas on other trees, branch lengths could represent time or other units, or, in many cases, not be biologically meaningful at all.

2. Systematics is the study of biological diversity and its origins, and generally encompasses biological classification, nomenclature, and taxonomy.

3. Zuckerkandl and Pauling believed that there were two differences in this chain based on the work of Zuckerkandl and Schroeder (1961) whereas today we know there is only one residue difference (Morgan 1998). We continue to use this example for exposition because of its historical importance.

4. The neutral theory of evolution arose in the 1960s, arguing that a substantial portion of molecular variation was the result of mutation and genetic drift, rather than natural selection. This was especially intended to explain variation in non-coding sequences of DNA, etc. There continues to be a rich and ongoing controversy over the relative importance of natural selection and genetic drift (Dietrich 1994, 1998; Duret 2008).

5. DNA is made up of four different nucleotides, adenine, thymine, cytosine, and guanine (ATCG). These come in two different shapes; adenine and guanine are purine rings, cytosine and thymine are pyrimidine rings. Transitions are the exchange of nucleotides of the same shape, e.g., $$\text{A} \leftrightarrow \text{G}$$ or $$\text{C}\leftrightarrow\text{T}$$. Transversions are the exchange of nucleotides of different shapes.

6. These numbers reflect the number of possible trees if we are counting rooted, bifurcating trees. “Rooted” trees indicate a direction of ancestry by identifying a node or branch as the most basal ancestor, i.e., the oldest part of the tree; unrooted trees do not designate which branch is basal. There are fewer unrooted trees. On the other hand, If we relax the constraint that the trees must be bifurcating, the numbers get even larger faster. This becomes even more important once we begin asking how (or whether) phylogenetics might incorporate things like lateral gene transfer, which produce reticulate (i.e., non-bifurcating) tree structures (see §3.1).

7. A problem is NP-complete when it is in NP (so a solution can be verified in polynomial time) and any NP problem can be reduced to it in polynomial time. No efficient algorithms for any such problems are known and if $$\text{P} \neq \text{NP}$$ as is generally suspected, then no efficient algorithm is possible.