Phylogenetic tree
Overview
 
A phylogenetic tree or evolutionary tree is a branching diagram or "tree
Tree (graph theory)
In mathematics, more specifically graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one simple path. In other words, any connected graph without cycles is a tree...

" showing the inferred evolution
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...

ary relationships among various biological species
Species
In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring. While in many cases this definition is adequate, more precise or differing measures are...

 or other entities based upon similarities and differences in their physical and/or genetic characteristics. The taxa joined together in the tree are implied to have descended from a common ancestor
Common descent
In evolutionary biology, a group of organisms share common descent if they have a common ancestor. There is strong quantitative support for the theory that all living organisms on Earth are descended from a common ancestor....

.

In a rooted phylogenetic tree, each node with descendants represents the inferred most recent common ancestor
Most recent common ancestor
In genetics, the most recent common ancestor of any set of organisms is the most recent individual from which all organisms in the group are directly descended...

 of the descendants, and the edge lengths in some trees may be interpreted as time
Time
Time is a part of the measuring system used to sequence events, to compare the durations of events and the intervals between them, and to quantify rates of change such as the motions of objects....

 estimates.
Encyclopedia
A phylogenetic tree or evolutionary tree is a branching diagram or "tree
Tree (graph theory)
In mathematics, more specifically graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one simple path. In other words, any connected graph without cycles is a tree...

" showing the inferred evolution
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...

ary relationships among various biological species
Species
In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring. While in many cases this definition is adequate, more precise or differing measures are...

 or other entities based upon similarities and differences in their physical and/or genetic characteristics. The taxa joined together in the tree are implied to have descended from a common ancestor
Common descent
In evolutionary biology, a group of organisms share common descent if they have a common ancestor. There is strong quantitative support for the theory that all living organisms on Earth are descended from a common ancestor....

.

In a rooted phylogenetic tree, each node with descendants represents the inferred most recent common ancestor
Most recent common ancestor
In genetics, the most recent common ancestor of any set of organisms is the most recent individual from which all organisms in the group are directly descended...

 of the descendants, and the edge lengths in some trees may be interpreted as time
Time
Time is a part of the measuring system used to sequence events, to compare the durations of events and the intervals between them, and to quantify rates of change such as the motions of objects....

 estimates. Each node is called a taxonomic unit. Internal nodes are generally called hypothetical taxonomic units (HTUs) as they cannot be directly observed. Trees are useful in fields of biology such as bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

, systematics
Systematics
Biological systematics is the study of the diversification of terrestrial life, both past and present, and the relationships among living things through time. Relationships are visualized as evolutionary trees...

 and comparative phylogenetics.

History

The idea of a "tree of life
Tree of life (science)
Charles Darwin proposed that phylogeny, the evolutionary relatedness among species through time, was expressible as a metaphor he termed the Tree of Life...

" arose from ancient notions of a ladder-like progression from lower to higher forms of life
Life
Life is a characteristic that distinguishes objects that have signaling and self-sustaining processes from those that do not, either because such functions have ceased , or else because they lack such functions and are classified as inanimate...

 (such as in the Great Chain of Being
Great chain of being
The great chain of being , is a Christian concept detailing a strict, religious hierarchical structure of all matter and life, believed to have been decreed by the Christian God.-Divisions:...

). Early representations of branching phylogenetic trees include a "Paleontological chart" showing the geological relationships among plants and animals in the book Elementary Geology, by Edward Hitchcock (first edition: 1840).

Charles Darwin
Charles Darwin
Charles Robert Darwin FRS was an English naturalist. He established that all species of life have descended over time from common ancestry, and proposed the scientific theory that this branching pattern of evolution resulted from a process that he called natural selection.He published his theory...

 (1859) also produced one of the first illustrations and crucially popularized the notion of an evolutionary "tree"
Natural selection
Natural selection is the nonrandom process by which biologic traits become either more or less common in a population as a function of differential reproduction of their bearers. It is a key mechanism of evolution....

 in his seminal book The Origin of Species
The Origin of Species
Charles Darwin's On the Origin of Species, published on 24 November 1859, is a work of scientific literature which is considered to be the foundation of evolutionary biology. Its full title was On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the...

. Over a century later, evolutionary biologists still use tree diagram
Tree structure
A tree structure is a way of representing the hierarchical nature of a structure in a graphical form. It is named a "tree structure" because the classic representation resembles a tree, even though the chart is generally upside down compared to an actual tree, with the "root" at the top and the...

s to depict evolution
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...

 because such diagrams effectively convey the concept that speciation
Speciation
Speciation is the evolutionary process by which new biological species arise. The biologist Orator F. Cook seems to have been the first to coin the term 'speciation' for the splitting of lineages or 'cladogenesis,' as opposed to 'anagenesis' or 'phyletic evolution' occurring within lineages...

 occurs through the adaptive
Adaptation
An adaptation in biology is a trait with a current functional role in the life history of an organism that is maintained and evolved by means of natural selection. An adaptation refers to both the current state of being adapted and to the dynamic evolutionary process that leads to the adaptation....

 and random splitting of lineages. Over time, species classification has become less static and more dynamic.

Rooted tree

A rooted phylogenetic tree is a directed
Directed graph
A directed graph or digraph is a pair G= of:* a set V, whose elements are called vertices or nodes,...

 tree
Tree (data structure)
In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...

 with a unique node corresponding to the (usually imputed
Imputation (statistics)
In statistics, imputation is the substitution of some value for a missing data point or a missing component of a data point. Once all missing values have been imputed, the dataset can then be analysed using standard techniques for complete data...

) most recent common ancestor of all the entities at the leaves of the tree. The most common method for rooting trees is the use of an uncontroversial outgroup — close enough to allow inference from sequence or trait data, but far enough to be a clear outgroup.

Unrooted tree

Unrooted trees illustrate the relatedness of the leaf nodes without making assumptions about ancestry at all. While unrooted trees can always be generated from rooted ones by simply omitting the root, a root cannot be inferred from an unrooted tree without some means of identifying ancestry; this is normally done by including an outgroup in the input data or introducing additional assumptions about the relative rates of evolution on each branch, such as an application of the molecular clock
Molecular clock
The molecular clock is a technique in molecular evolution that uses fossil constraints and rates of molecular change to deduce the time in geologic history when two species or other taxa diverged. It is used to estimate the time of occurrence of events called speciation or radiation...

 hypothesis
Hypothesis
A hypothesis is a proposed explanation for a phenomenon. The term derives from the Greek, ὑποτιθέναι – hypotithenai meaning "to put under" or "to suppose". For a hypothesis to be put forward as a scientific hypothesis, the scientific method requires that one can test it...

. Figure 2 depicts an unrooted phylogenetic tree for myosin
Myosin
Myosins comprise a family of ATP-dependent motor proteins and are best known for their role in muscle contraction and their involvement in a wide range of other eukaryotic motility processes. They are responsible for actin-based motility. The term was originally used to describe a group of similar...

, a superfamily
Gene family
A gene family is a set of several similar genes, formed by duplication of a single original gene, and generally with similar biochemical functions...

 of protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

s.

Bifurcating tree

Both rooted and unrooted phylogenetic trees can be either bifurcating
Bifurcation theory
Bifurcation theory is the mathematical study of changes in the qualitative or topological structure of a given family, such as the integral curves of a family of vector fields, and the solutions of a family of differential equations...

 or multifurcating, and either labeled or unlabeled. A rooted bifurcating tree has exactly two descendants arising from each interior node (that is, it forms a binary tree
Binary tree
In computer science, a binary tree is a tree data structure in which each node has at most two child nodes, usually distinguished as "left" and "right". Nodes with children are parent nodes, and child nodes may contain references to their parents. Outside the tree, there is often a reference to...

), and an unrooted bifurcating tree takes the form of an unrooted binary tree
Unrooted binary tree
In mathematics and computer science, an unrooted binary tree is an unrooted tree in which each vertex has either one or three neighbors.-Definitions:...

, a free tree with exactly three neighbors at each internal node. In contrast, a rooted multifurcating tree may have more than two children at some nodes and an unrooted multifurcating tree may have more than three neighbors at some nodes. A labeled tree has specific values assigned to its leaves, while an unlabeled tree, sometimes called a tree shape, defines a topology only. The number of possible trees for a given number of leaf nodes depends on the specific type of tree, but there are always more multifurcating than bifurcating trees, more labeled than unlabeled trees, and more rooted than unrooted trees. The last distinction is the most biologically relevant; it arises because there are many places on an unrooted tree to put the root. For labeled bifurcating trees, there are
total rooted trees and
total unrooted trees, where represents the number of leaf nodes. Among labeled bifurcating trees, the number of unrooted trees with leaves is equal to the number of rooted trees with leaves.

Special tree types

A dendrogram
Dendrogram
A dendrogram is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering...

 is a broad term for the diagrammatic representation of a phylogenetic tree.

A cladogram
Cladogram
A cladogram is a diagram used in cladistics which shows ancestral relations between organisms, to represent the evolutionary tree of life. Although traditionally such cladograms were generated largely on the basis of morphological characters, DNA and RNA sequencing data and computational...

 is a phylogenetic tree formed using cladistic
Cladistics
Cladistics is a method of classifying species of organisms into groups called clades, which consist of an ancestor organism and all its descendants . For example, birds, dinosaurs, crocodiles, and all descendants of their most recent common ancestor form a clade...

 methods. This type of tree only represents a branching pattern, i.e., its branch lengths do not represent time or relative amount of character change.

A phylogram is a phylogenetic tree that has branch lengths proportional to the amount of character change.

A chronogram is a phylogenetic tree that explicitly represents evolutionary time through its branch lengths.

Construction

Phylogenetic trees among a nontrivial number of input sequences are constructed using computational phylogenetics
Computational phylogenetics
Computational phylogenetics is the application of computational algorithms, methods and programs to phylogenetic analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa...

 methods. Distance-matrix methods such as neighbor-joining
Neighbor-joining
In bioinformatics, neighbor joining is a bottom-up clustering method for the creation of phenetic trees , created by Naruya Saitou and Masatoshi Nei...

 or UPGMA
UPGMA
UPGMA is a simple agglomerative or hierarchical clustering method used in bioinformatics for the creation of phenetic trees...

, which calculate genetic distance
Genetic distance
Genetic distance refers to the genetic divergence between species or between populations within a species. It is measured by a variety of parameters. Smaller genetic distances indicate a close genetic relationship whereas large genetic distances indicate a more distant genetic relationship...

 from multiple sequence alignment
Multiple sequence alignment
A multiple sequence alignment is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor...

s, are simplest to implement, but do not invoke an evolutionary model. Many sequence alignment methods such as ClustalW also create trees by using the simpler algorithms (i.e. those based on distance) of tree construction. Maximum parsimony
Maximum parsimony
Parsimony is a non-parametric statistical method commonly used in computational phylogenetics for estimating phylogenies. Under parsimony, the preferred phylogenetic tree is the tree that requires the least evolutionary change to explain some observed data....

 is another simple method of estimating phylogenetic trees, but implies an implicit model of evolution (i.e. parsimony). More advanced methods use the optimality criterion
Optimality criterion
In statistics, an optimality criterion provides a measure of the fit of the data to a given hypothesis. The selection process is determined by the solution that optimizes the criteria used to evaluate the alternative hypotheses...

 of maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

, often within a Bayesian Framework
Bayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...

, and apply an explicit model of evolution to phylogenetic tree estimation. Identifying the optimal tree using many of these techniques is NP-hard
NP-hard
NP-hard , in computational complexity theory, is a class of problems that are, informally, "at least as hard as the hardest problems in NP". A problem H is NP-hard if and only if there is an NP-complete problem L that is polynomial time Turing-reducible to H...

, so heuristic
Heuristic
Heuristic refers to experience-based techniques for problem solving, learning, and discovery. Heuristic methods are used to speed up the process of finding a satisfactory solution, where an exhaustive search is impractical...

 search and optimization
Optimization (mathematics)
In mathematics, computational science, or management science, mathematical optimization refers to the selection of a best element from some set of available alternatives....

 methods are used in combination with tree-scoring functions to identify a reasonably good tree that fits the data.

Tree-building methods can be assessed on the basis of several criteria:
  • efficiency (how long does it take to compute the answer, how much memory does it need?)
  • power (does it make good use of the data, or is information being wasted?)
  • consistency (will it converge on the same answer repeatedly, if each time given different data for the same model problem?)
  • robustness (does it cope well with violations of the assumptions of the underlying model?)
  • falsifiability (does it alert us when it is not good to use, i.e. when assumptions are violated?)


Tree-building techniques have also gained the attention of mathematicians. Trees can also be built using T-theory
T-theory
T-theory is a branch of discrete mathematics dealing with analysis of trees and discrete metric spaces.-General history:As per Andreas Dress, T-theory originated from a question raised by Manfred Eigen, a recipient of the Nobel Prize in Chemistry, in the late seventies. He was trying to fit twenty...

.

Limitations

Although phylogenetic trees produced on the basis of sequenced gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

s or genomic
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

 data in different species can provide evolutionary insight, they have important limitations. They do not necessarily accurately represent the species evolutionary history. The data on which they are based is noisy; the analysis can be confounded by horizontal gene transfer
Horizontal gene transfer
Horizontal gene transfer , also lateral gene transfer , is any process in which an organism incorporates genetic material from another organism without being the offspring of that organism...

, hybridisation between species that were not nearest neighbors on the tree before hybridisation takes place, convergent evolution
Convergent evolution
Convergent evolution describes the acquisition of the same biological trait in unrelated lineages.The wing is a classic example of convergent evolution in action. Although their last common ancestor did not have wings, both birds and bats do, and are capable of powered flight. The wings are...

, and conserved sequence
Conserved sequence
In biology, conserved sequences are similar or identical sequences that occur within nucleic acid sequences , protein sequences, protein structures or polymeric carbohydrates across species or within different molecules produced by the same organism...

s.

Also, there are problems in basing the analysis on a single type of character, such as a single gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

 or protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

 or only on morphological analysis, because such trees constructed from another unrelated data source often differ from the first, and therefore great care is needed in inferring phylogenetic relationships among species. This is most true of genetic material that is subject to lateral gene transfer and recombination
Genetic recombination
Genetic recombination is a process by which a molecule of nucleic acid is broken and then joined to a different one. Recombination can occur between similar molecules of DNA, as in homologous recombination, or dissimilar molecules, as in non-homologous end joining. Recombination is a common method...

, where different haplotype
Haplotype
A haplotype in genetics is a combination of alleles at adjacent locations on the chromosome that are transmitted together...

 blocks can have different histories. In general, the output tree of a phylogenetic analysis is an estimate of the characters phylogeny (i.e. a gene tree) and not the phylogeny of the taxa (i.e. species tree) from which these characters were sampled, though ideally, both should be very close. For this reason, serious phylogenetic studies generally use a combination of genes that come from different genomic sources (e.g., from mitochondrial or plastid vs. nuclear genomes), or genes that would be expected to evolve under different selective regimes, so that homoplasy (false homology
Homology (biology)
Homology forms the basis of organization for comparative biology. In 1843, Richard Owen defined homology as "the same organ in different animals under every variety of form and function". Organs as different as a bat's wing, a seal's flipper, a cat's paw and a human hand have a common underlying...

) would be unlikely to result from natural selection.

When extinct species are included in a tree, they are terminal nodes, as it is unlikely that they are direct ancestors of any extant species. Scepticism might be applied when extinct species are included in trees that are wholly or partly based on DNA sequence data, due to the fact that little useful "ancient DNA
Ancient DNA
Ancient DNA is DNA isolated from ancient specimens. It can be also loosely described as any DNA recovered from biological samples that have not been preserved specifically for later DNA analyses...

" is preserved for longer than 100,000 years, and except in the most unusual circumstances no DNA sequences long enough for use in phylogenetic analyses have yet been recovered from material over 1 million years old.

In some organisms, endosymbiont
Endosymbiont
An endosymbiont is any organism that lives within the body or cells of another organism, i.e. forming an endosymbiosis...

s have an independent genetic history from the host.

Phylogenetic network
Phylogenetic network
A phylogenetic network is any graph used to visualize evolutionary relationships between nucleotide sequences, genes, chromosomes, genomes, or species . They are employed when reticulate events such as hybridization, horizontal gene transfer, recombination, or gene duplication and loss are...

s are used when bifurcating trees are not suitable, due to these complications which suggest a more reticulate evolutionary history of the organisms sampled..

The "tree of life"

  • Evolutionary history of life
    Evolutionary history of life
    The evolutionary history of life on Earth traces the processes by which living and fossil organisms have evolved since life on Earth first originated until the present day. Earth formed about 4.5 Ga and life appeared on its surface within one billion years...

     - An overview of the major time periods of life on earth
  • Life
    Life
    Life is a characteristic that distinguishes objects that have signaling and self-sustaining processes from those that do not, either because such functions have ceased , or else because they lack such functions and are classified as inanimate...

     - The top level for Wikipedia articles on living species, reflecting a diversity of classification systems.
  • Three-domain system
    Three-domain system
    The three-domain system is a biological classification introduced by Carl Woese in 1977 that divides cellular life forms into archaea, bacteria, and eukaryote domains. In particular, it emphasizes the separation of prokaryotes into two groups, originally called Eubacteria and Archaebacteria...

     (cell types)
  • Wikispecies
    Wikispecies
    Wikispecies is a wiki-based online project supported by the Wikimedia Foundation. Its aim is to create a comprehensive free content catalogue of all species and is directed at scientists, rather than at the general public...

     - An external Wikimedia Foundation project to construct a "tree of life" appropriate for use by scientists

Fields of study

  • Archaeopteryx
  • Cladistics
    Cladistics
    Cladistics is a method of classifying species of organisms into groups called clades, which consist of an ancestor organism and all its descendants . For example, birds, dinosaurs, crocodiles, and all descendants of their most recent common ancestor form a clade...

  • Comparative phylogenetics
  • Computational phylogenetics
    Computational phylogenetics
    Computational phylogenetics is the application of computational algorithms, methods and programs to phylogenetic analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa...

  • Evolutionary biology
  • Generalized tree alignment
    Generalized tree alignment
    In computational phylogenetics, generalized tree alignment is the problem of producing a multiple sequence alignment and a phylogenetic tree on a set of sequences simultaneously, as opposed to separately....

  • Phylogenetics
    Phylogenetics
    In biology, phylogenetics is the study of evolutionary relatedness among groups of organisms , which is discovered through molecular sequencing data and morphological data matrices...


Further reading

  • Schuh, R. T. and A. V. Z. Brower. 2009. Biological Systematics: principles and applications (2nd edn.) ISBN 978-0-8014-4799-0
  • MEGA
    MEGA, Molecular Evolutionary Genetics Analysis
    MEGA, Molecular Evolutionary Genetics Analysis, is a freely available software to aid scientists and students in making dendrograms, or phylogenetic trees using nucleotide or protein sequences...

    , a free software to draw phylogenetic tress.

Images


General

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK