Molecular phylogeny
Molecular phylogenetics
In biology, phylogenetics is the study of evolutionary relatedness among groups of organisms , which is discovered through molecular sequencing data and morphological data matrices...

(məˈlɛkjʊlər faɪlɵdʒɪˈnɛtɪks) is the analysis of hereditary molecular differences, mainly in DNA sequences, to gain information on an organism's evolutionary relationships. The result of a molecular phylogenetic
In biology, phylogenetics is the study of evolutionary relatedness among groups of organisms , which is discovered through molecular sequencing data and morphological data matrices...

 analysis is expressed in a phylogenetic tree
Phylogenetic tree
A phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the inferred evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical and/or genetic characteristics...

. Molecular phylogenetics is one aspect of molecular systematics
Biological systematics is the study of the diversification of terrestrial life, both past and present, and the relationships among living things through time. Relationships are visualized as evolutionary trees...

, a broader term that also includes the use of molecular data in taxonomy
Taxonomy is the science of identifying and naming species, and arranging them into a classification. The field of taxonomy, sometimes referred to as "biological taxonomy", revolves around the description and use of taxonomic units, known as taxa...

 and biogeography
Biogeography is the study of the distribution of species , organisms, and ecosystems in space and through geological time. Organisms and biological communities vary in a highly regular fashion along geographic gradients of latitude, elevation, isolation and habitat area...


History of molecular phylogenetics

The theoretical frameworks for molecular systematics were laid in the 1960s in the works of Emile Zuckerkandl
Emile Zuckerkandl
Emile Zuckerkandl is an Austrian-American biologist considered one of the founders of the field of molecular evolution. He is best known for introducing, with Linus Pauling, the concept of the molecular clock, which set the stage for the neutral theory of molecular evolution.- Life and work...

, Emanuel Margoliash
Emanuel Margoliash
Emanuel Margoliash was a biochemist who spent much of his career studying the protein cytochrome c. He is best known for his work on molecular evolution; with Walter Fitch, he devised Fitch-Margoliash method for constructing evolutionary trees based on protein sequences.He was a member of the...

, Linus Pauling
Linus Pauling
Linus Carl Pauling was an American chemist, biochemist, peace activist, author, and educator. He was one of the most influential chemists in history and ranks among the most important scientists of the 20th century...

, and Walter M. Fitch
Walter M. Fitch
Walter M. Fitch . Until his death he was professor of molecular evolution at the University of California, Irvine. He was also a member of the National Academy of Sciences, the American Philosophical Society, and the American Association for the Advancement of Science, and was a Foreign Member of...

. Applications of molecular systematics were pioneered by Charles G. Sibley
Charles Sibley
Charles Gald Sibley was an American ornithologist and molecular biologist. He had an immense influence on the scientific classification of birds, and the work that Sibley initiated has substantially altered our understanding of the evolutionary history of modern birds.Sibley's taxonomy has been a...

Birds are feathered, winged, bipedal, endothermic , egg-laying, vertebrate animals. Around 10,000 living species and 188 families makes them the most speciose class of tetrapod vertebrates. They inhabit ecosystems across the globe, from the Arctic to the Antarctic. Extant birds range in size from...

s), Herbert C. Dessauer (herpetology
Herpetology is the branch of zoology concerned with the study of amphibians and reptiles...

), and Morris Goodman
Morris Goodman
Morris Goodman was an American scientist known for his work in molecular evolution and molecular systematics...

A primate is a mammal of the order Primates , which contains prosimians and simians. Primates arose from ancestors that lived in the trees of tropical forests; many primate characteristics represent adaptations to life in this challenging three-dimensional environment...

s), followed by Allan C. Wilson
Allan Wilson
Allan Charles Wilson was a pioneer in the use of molecular approaches to understand evolutionary change and reconstruct phylogenies, and a contributor to the study of human evolution. He was one of the most controversial figures in post-war biology; his work attracted a great deal of attention...

, Robert K. Selander, and John C. Avise (who studied various groups). Work with protein electrophoresis
Protein electrophoresis
Protein electrophoresis is a method for analysing the proteins in a fluid or an extract. The electrophoresis may be performed with a small volume of sample in a number of alternative ways with or without a supporting medium: SDS polyacrylamide gel electrophoresis Protein electrophoresis is a method...

 began around 1956. Although the results were not quantitative and did not initially improve on morphological classification, they provided tantalizing hints that long-held notions of the classifications of bird
Birds are feathered, winged, bipedal, endothermic , egg-laying, vertebrate animals. Around 10,000 living species and 188 families makes them the most speciose class of tetrapod vertebrates. They inhabit ecosystems across the globe, from the Arctic to the Antarctic. Extant birds range in size from...

s, for example, needed substantial revision. In the period of 1974–1986, DNA-DNA hybridization was the dominant technique.

Techniques and applications

Every living organism
In biology, an organism is any contiguous living system . In at least some form, all organisms are capable of response to stimuli, reproduction, growth and development, and maintenance of homoeostasis as a stable whole.An organism may either be unicellular or, as in the case of humans, comprise...

 contains DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....

, and protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

s. In general, closely related organisms have a high degree of agreement in the molecular structure
Molecular structure
The molecular structure of a substance is described by the combination of nuclei and electrons that comprise its constitute molecules. This includes the molecular geometry , the electronic properties of the...

 of these substances, while the molecules of organisms distantly related usually show a pattern of dissimilarity. Conserved sequences, such as mitochondrial DNA, are expected to accumulate mutations over time, and assuming a constant rate of mutation provide a molecular clock
Molecular clock
The molecular clock is a technique in molecular evolution that uses fossil constraints and rates of molecular change to deduce the time in geologic history when two species or other taxa diverged. It is used to estimate the time of occurrence of events called speciation or radiation...

 for dating divergence. Molecular phylogeny uses such data to build a "relationship tree" that shows the probable evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...

 of various organisms. Not until recent decades, however, has it been possible to isolate and identify these molecular structures.

The most common approach is the comparison of homologous sequences for genes using sequence alignment
Sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are...

 techniques to identify similarity. Another application of molecular phylogeny is in DNA barcoding
DNA barcoding
DNA barcoding is a taxonomic method that uses a short genetic marker in an organism's DNA to identify it as belonging to a particular species. It differs from molecular phylogeny in that the main goal is not to determine classification but to identify an unknown sample in terms of a known...

, wherein the species of an individual organism is identified using small sections of mitochondrial DNA
Mitochondrial DNA
Mitochondrial DNA is the DNA located in organelles called mitochondria, structures within eukaryotic cells that convert the chemical energy from food into a form that cells can use, adenosine triphosphate...

. Another application of the techniques that make this possible can be seen in the very limited field of human genetics, such as the ever-more-popular use of genetic testing
Genetic testing
Genetic testing is among the newest and most sophisticated of techniques used to test for genetic disorders which involves direct examination of the DNA molecule itself. Other genetic tests include biochemical tests for such gene products as enzymes and other proteins and for microscopic...

 to determine a child's paternity
A parent is a caretaker of the offspring in their own species. In humans, a parent is of a child . Children can have one or more parents, but they must have two biological parents. Biological parents consist of the male who sired the child and the female who gave birth to the child...

, as well as the emergence of a new branch of criminal forensics
Forensic science is the application of a broad spectrum of sciences to answer questions of interest to a legal system. This may be in relation to a crime or a civil action...

 focused on evidence known as genetic fingerprinting
Genetic fingerprinting
DNA profiling is a technique employed by forensic scientists to assist in the identification of individuals by their respective DNA profiles. DNA profiles are encrypted sets of numbers that reflect a person's DNA makeup, which can also be used as the person's identifier...


Theoretical background

Early attempts at molecular systematics were also termed as chemotaxonomy
Chemotaxonomy , also called chemosystematics, is the attempt to classify and identify organisms , according to demonstrable differences and similarities in their biochemical compositions. The compounds studied in most of the cases are mostly proteins, amino acids and peptides...

 and made use of proteins, enzyme
Enzymes are proteins that catalyze chemical reactions. In enzymatic reactions, the molecules at the beginning of the process, called substrates, are converted into different molecules, called products. Almost all chemical reactions in a biological cell need enzymes in order to occur at rates...

s, carbohydrate
A carbohydrate is an organic compound with the empirical formula ; that is, consists only of carbon, hydrogen, and oxygen, with a hydrogen:oxygen atom ratio of 2:1 . However, there are exceptions to this. One common example would be deoxyribose, a component of DNA, which has the empirical...

s, and other molecules that were separated and characterized using techniques such as chromatography
Chromatography is the collective term for a set of laboratory techniques for the separation of mixtures....

. These have been replaced in recent times largely by DNA sequencing
DNA sequencing
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....

, which produces the exact sequences of nucleotides or bases in either DNA or RNA segments extracted using different techniques. In general, these are considered superior for evolutionary studies, since the actions of evolution are ultimately reflected in the genetic sequences. At present, it is still a long and expensive process to sequence the entire DNA of an organism (its genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

), and this has been done for only a few species. However, it is quite feasible to determine the sequence of a defined area of a particular chromosome
A chromosome is an organized structure of DNA and protein found in cells. It is a single piece of coiled DNA containing many genes, regulatory elements and other nucleotide sequences. Chromosomes also contain DNA-bound proteins, which serve to package the DNA and control its functions.Chromosomes...

. Typical molecular systematic analyses require the sequencing of around 1000 base pair
Base pair
In molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair...

s. At any location within such a sequence, the bases found in a given position may vary between organisms. The particular sequence found in a given organism is referred to as its haplotype
A haplotype in genetics is a combination of alleles at adjacent locations on the chromosome that are transmitted together...

. In principle, since there are four base types, with 1000 base pairs, we could have 41000 distinct haplotypes. However, for organisms within a particular species or in a group of related species, it has been found empirically that only a minority of sites show any variation at all and most of the variations that are found are correlated, so that the number of distinct haplotypes that are found is relatively small.

In a molecular systematic analysis, the haplotypes are determined for a defined area of genetic material; a substantial sample of individuals of the target species
In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring. While in many cases this definition is adequate, more precise or differing measures are...

 or other taxon
|thumb|270px|[[African elephants]] form a widely-accepted taxon, the [[genus]] LoxodontaA taxon is a group of organisms, which a taxonomist adjudges to be a unit. Usually a taxon is given a name and a rank, although neither is a requirement...

 is used, however many current studies are based on single individuals. Haplotypes of individuals of closely related, but different, taxa are also determined. Finally, haplotypes from a smaller number of individuals from a definitely different taxon are determined: These are referred to as an out group. The base sequences for the haplotypes are then compared. In the simplest case, the difference between two haplotypes is assessed by counting the number of locations where they have different bases: This is referred to as the number of substitutions (other kinds of differences between haplotypes can also occur, for example the insertion of a section of nucleic acid
Nucleic acid
Nucleic acids are biological molecules essential for life, and include DNA and RNA . Together with proteins, nucleic acids make up the most important macromolecules; each is found in abundance in all living things, where they function in encoding, transmitting and expressing genetic information...

 in one haplotype that is not present in another). The difference between organisms is usually re-expressed as a percentage divergence, by dividing the number of substitutions by the number of base pairs analysed: the hope is that this measure will be independent of the location and length of the section of DNA that is sequenced.

An older and superseded approach was to determine the divergences between the genotype
The genotype is the genetic makeup of a cell, an organism, or an individual usually with reference to a specific character under consideration...

s of individuals by DNA-DNA hybridisation
DNA-DNA hybridisation
DNA-DNA hybridization generally refers to a molecular biology technique that measures the degree of genetic similarity between pools of DNA sequences. It is usually used to determine the genetic distance between two species...

. The advantage claimed for using hybridisation rather than gene sequencing was that it was based on the entire genotype, rather than on particular sections of DNA. Modern sequence comparison techniques overcome this objection by the use of multiple sequences.

Once the divergences between all pairs of samples have been determined, the resulting triangular matrix
Triangular matrix
In the mathematical discipline of linear algebra, a triangular matrix is a special kind of square matrix where either all the entries below or all the entries above the main diagonal are zero...

 of differences is submitted to some form of statistical cluster analysis, and the resulting dendrogram
A dendrogram is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering...

 is examined in order to see whether the samples cluster in the way that would be expected from current ideas about the taxonomy of the group, or not. Any group of haplotypes that are all more similar to one another than any of them is to any other haplotype may be said to constitute a clade
A clade is a group consisting of a species and all its descendants. In the terms of biological systematics, a clade is a single "branch" on the "tree of life". The idea that such a "natural group" of organisms should be grouped together and given a taxonomic name is central to biological...

. Statistical techniques such as bootstrapping
Bootstrapping (statistics)
In statistics, bootstrapping is a computer-based method for assigning measures of accuracy to sample estimates . This technique allows estimation of the sample distribution of almost any statistic using only very simple methods...

 and jackknifing help in providing reliability estimates for the positions of haplotypes within the evolutionary trees.

Limitations of molecular systematics

Molecular systematics is an essentially cladistic
Cladistics is a method of classifying species of organisms into groups called clades, which consist of an ancestor organism and all its descendants . For example, birds, dinosaurs, crocodiles, and all descendants of their most recent common ancestor form a clade...

 approach: it assumes that classification must correspond to phylogenetic descent, and that all valid taxa must be monophyletic.

Molecular phylogenies can be affected by myriad problems, including long-branch attraction, saturation
Saturation or saturated may refer to:- Meteorology :* Dew point, which is a temperature that occurs when atmospheric humidity reaches 100% and the air is saturated with moisture- Physics :...

, and taxon
|thumb|270px|[[African elephants]] form a widely-accepted taxon, the [[genus]] LoxodontaA taxon is a group of organisms, which a taxonomist adjudges to be a unit. Usually a taxon is given a name and a rank, although neither is a requirement...

 sampling problems: This means that strikingly different results can be obtained by applying different models to the same dataset.

See also

  • molecular evolution
    Molecular evolution
    Molecular evolution is in part a process of evolution at the scale of DNA, RNA, and proteins. Molecular evolution emerged as a scientific field in the 1960s as researchers from molecular biology, evolutionary biology and population genetics sought to understand recent discoveries on the structure...

  • computational phylogenetics
    Computational phylogenetics
    Computational phylogenetics is the application of computational algorithms, methods and programs to phylogenetic analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa...

  • PhyloCode
    The International Code of Phylogenetic Nomenclature, known as the PhyloCode for short, is a developing draft for a formal set of rules governing phylogenetic nomenclature...

  • Microbial phylogenetics
    Microbial phylogenetics
    Microbial phylogenetics is the study of the evolutionary relatedness among various groups of microorganisms. The molecular approach to microbial phylogenetic analysis, pioneered by Carl Woese in the 1970s and leading to the three-domain model , revolutionized our thinking about evolution in the...

Further reading

  • Felsenstein, J.
    Joe Felsenstein
    Joseph "Joe" Felsenstein is Professor in the Departments of Genome Sciences and Biology and Adjunct Professor in the Departments of Computer Science and Statistics at the University of Washington in Seattle...

    2004. Inferring phylogenies. Sinauer Associates Incorporated. ISBN 0-87893-177-5.
  • Hillis, D. M. & Moritz, C. 1996. Molecular systematics. 2nd ed. Sinauer Associates Incorporated. ISBN 0-87893-282-8.
  • Page, R. D. M. & Holmes, E. C. 1998. Molecular evolution: a phylogenetic approach. Blackwell Science, Oxford. ISBN 0-86542-889-1.
  • Soltis, P.S., Soltis, D.E., and Doyle, J.J. (1992) Molecular systematics of plants. Chapman & Hall, New York. ISBN-0-41202-231-1.
  • Soltis, P.S., Soltis, D.E., and Doyle, J.J. (1998) Molecular Systematics of Plants II: DNA Sequencing. Kluwer Academic Publishers Boston, Dordrecht, London. ISBN-0-41211-131-4.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.