Molecular clock
Encyclopedia
The molecular clock is a technique in molecular evolution
that uses fossil constraints and rates of molecular change to deduce the time in geologic history
when two species
or other taxa
diverged
. It is used to estimate the time of occurrence of events called speciation
or radiation
. The molecular data used for such calculations is usually nucleotide
sequences
for DNA
or amino acid
sequences for protein
s. It is sometimes called a gene clock or evolutionary clock.
and Linus Pauling
who, in 1962, noticed that the number of amino acid
differences in hemoglobin
between different lineages changes roughly linearly
with time, as estimated from fossil evidence. They generalized this observation to assert that the rate of evolution
ary change of any specified protein
was approximately constant over time and over different lineages.
The genetic equidistance phenomenon was first noted in 1963 by E. Margoliash, who wrote: "It appears that the number of residue differences between cytochrome C
of any two species is mostly conditioned by the time elapsed since the lines of evolution leading to these two species originally diverged. If this is correct, the cytochrome c of all mammals should be equally different from the cytochrome c of all birds. Since fish diverges from the main stem of vertebrate evolution earlier than either birds or mammals, the cytochrome c of both mammals and birds should be equally different from the cytochrome c of fish. Similarly, all vertebrate cytochrome c should be equally different from the yeast protein." For example, the difference between the cytochrome C of a carp and a frog, turtle, chicken, rabbit, and horse is a very constant 13% to 14%. Similarly, the difference between the cytochrome C of a bacterium and yeast, wheat, moth, tuna, pigeon, and horse ranges from 64% to 69%. Together with the work of Emile Zuckerkandl and Linus Pauling, the genetic equidistance result directly led to the formal postulation of the molecular clock hypothesis in the early 1960s. Genetic equidistance has often been used to infer equal time of separation of different sister species from an outgroup.
Later Allan Wilson
and Vincent Sarich
built upon this work.
. Later, the work of Motoo Kimura
developed the neutral theory of molecular evolution
, which predicted a molecular clock. Let there be N individuals, and to keep this calculation simple, let the individuals be haploid
(i.e. have one copy of each gene). Let the rate of neutral mutation
s (i.e. mutations with no effect on fitness
) in a new individual be . The probability that this new mutation will become fixed
in the population is then 1/N, since each copy of the gene is as good as any other. Every generation, each individual can have new mutations, so there are N new neutral mutations in the population as a whole. That means that each generation, new neutral mutations will become fixed. If most changes seen during molecular evolution
are neutral, then fixations
in a population will accumulate at a clock-rate that is equal to the rate of neutral mutation
s in an individual.
against independent evidence about dates, such as the fossil
record. Alternatively, for viral phylogenetics and ancient DNA
studies, two areas of evolutionary biology where it is possible to sample sequences over an evolutionary timescale, the dates of the samples themselves can be used to calibrate the molecular clock.
showed divergence rates of 0.7-0.8% per Myr
in bacteria, mammals, invertebrates, and plants. In the same study, genomic regions experiencing very high negative or purifying selection (encoding rRNA) were considerably slower (1% per 50 Myr).
In addition to such variation in rate with genomic position, since the early 1990s, variation among taxa has proven fertile ground for research too, even over comparatively short periods of evolutionary time (for example mockingbird
s). Tube-nosed seabirds
have molecular clocks that on average run at half speed of many other birds, possibly due to long generation
times, and many turtles have a molecular clock running at one-eighth the speed it does in small mammals or even slower. Effects of small population size
are also likely to confound molecular clock analyses; cheetah
s for example, having gone through at least 2 population bottleneck
s, could not be adequately studied based on a molecular clock model alone. Researchers such as Ayala have more fundamentally challenged the molecular clock hypothesis. According to Ayala's 1999 study, 5 factors combine to limit the application of molecular clock models:
Molecular clock users have developed workaround solutions using a number of statistical approaches including maximum likelihood
techniques and later Bayesian modeling
. In particular, models that take into account rate variation across lineages have been proposed in order to obtain better estimates of divergence times. These models are called relaxed molecular clocks because they represent an intermediate position between the 'strict' molecular clock hypothesis and Felsenstein's many-rates model and are made possible through MCMC
techniques that explore a weighted range of tree topologies and simultaneously estimate parameters of the chosen substitution model. It must be remembered that divergence dates inferred using a molecular clock are based on statistical inference
and not on direct evidence
.
The molecular clock runs into particular challenges at very short and very long timescales. At long timescales, the problem is saturation
. When enough time has passed, many sites have undergone more than one change, but it is impossible to detect more than one. This means that the observed number of changes is no longer linear
with time, but instead flattens out.
At very short time scales, many differences between samples do not represent fixation
of different sequences in the different populations. Instead, they represent alternative alleles that were both present as part of a polymorphism in the common ancestor. The inclusion of differences that have not yet become fixed
leads to a potentially dramatic inflation of the apparent rate of the molecular clock at very short timescales.
information to determine the correct scientific classification of organisms or to study variation in selective forces.
Knowledge of approximately-constant rate of molecular evolution in particular sets of lineages also facilitates establishing the dates of phylogenetic events, including those not documented by fossils, such as the divergence of living taxa
and the formation of the phylogenetic tree
. But in these cases — especially over long stretches of time — the limitations of MCH (above) must be considered; such estimates may be off by 50% or more.
Molecular evolution
Molecular evolution is in part a process of evolution at the scale of DNA, RNA, and proteins. Molecular evolution emerged as a scientific field in the 1960s as researchers from molecular biology, evolutionary biology and population genetics sought to understand recent discoveries on the structure...
that uses fossil constraints and rates of molecular change to deduce the time in geologic history
Geologic time scale
The geologic time scale provides a system of chronologic measurement relating stratigraphy to time that is used by geologists, paleontologists and other earth scientists to describe the timing and relationships between events that have occurred during the history of the Earth...
when two species
Species
In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring. While in many cases this definition is adequate, more precise or differing measures are...
or other taxa
Taxon
|thumb|270px|[[African elephants]] form a widely-accepted taxon, the [[genus]] LoxodontaA taxon is a group of organisms, which a taxonomist adjudges to be a unit. Usually a taxon is given a name and a rank, although neither is a requirement...
diverged
Genetic divergence
Genetic divergence is the process in which two or more populations of an ancestral species accumulate independent genetic changes through time, often after the populations have become reproductively isolated for some period of time...
. It is used to estimate the time of occurrence of events called speciation
Speciation
Speciation is the evolutionary process by which new biological species arise. The biologist Orator F. Cook seems to have been the first to coin the term 'speciation' for the splitting of lineages or 'cladogenesis,' as opposed to 'anagenesis' or 'phyletic evolution' occurring within lineages...
or radiation
Evolutionary radiation
An evolutionary radiation is an increase in taxonomic diversity or morphological disparity, due to adaptive change or the opening of ecospace. Radiations may affect one clade or many, and be rapid or gradual; where they are rapid, and driven by a single lineage's adaptation to their environment,...
. The molecular data used for such calculations is usually nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...
sequences
DNA sequence
The sequence or primary structure of a nucleic acid is the composition of atoms that make up the nucleic acid and the chemical bonds that bond those atoms. Because nucleic acids, such as DNA and RNA, are unbranched polymers, this specification is equivalent to specifying the sequence of...
for DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
or amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...
sequences for protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
s. It is sometimes called a gene clock or evolutionary clock.
Early discovery and genetic equidistance
The notion of the existence of a so-called "molecular clock" was first attributed to Emile ZuckerkandlEmile Zuckerkandl
Emile Zuckerkandl is an Austrian-American biologist considered one of the founders of the field of molecular evolution. He is best known for introducing, with Linus Pauling, the concept of the molecular clock, which set the stage for the neutral theory of molecular evolution.- Life and work...
and Linus Pauling
Linus Pauling
Linus Carl Pauling was an American chemist, biochemist, peace activist, author, and educator. He was one of the most influential chemists in history and ranks among the most important scientists of the 20th century...
who, in 1962, noticed that the number of amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...
differences in hemoglobin
Hemoglobin
Hemoglobin is the iron-containing oxygen-transport metalloprotein in the red blood cells of all vertebrates, with the exception of the fish family Channichthyidae, as well as the tissues of some invertebrates...
between different lineages changes roughly linearly
Linear function
In mathematics, the term linear function can refer to either of two different but related concepts:* a first-degree polynomial function of one variable;* a map between two vector spaces that preserves vector addition and scalar multiplication....
with time, as estimated from fossil evidence. They generalized this observation to assert that the rate of evolution
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...
ary change of any specified protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
was approximately constant over time and over different lineages.
The genetic equidistance phenomenon was first noted in 1963 by E. Margoliash, who wrote: "It appears that the number of residue differences between cytochrome C
Cytochrome c
The Cytochrome complex, or cyt c is a small heme protein found loosely associated with the inner membrane of the mitochondrion. It belongs to the cytochrome c family of proteins. Cytochrome c is a highly soluble protein, unlike other cytochromes, with a solubility of about 100 g/L and is an...
of any two species is mostly conditioned by the time elapsed since the lines of evolution leading to these two species originally diverged. If this is correct, the cytochrome c of all mammals should be equally different from the cytochrome c of all birds. Since fish diverges from the main stem of vertebrate evolution earlier than either birds or mammals, the cytochrome c of both mammals and birds should be equally different from the cytochrome c of fish. Similarly, all vertebrate cytochrome c should be equally different from the yeast protein." For example, the difference between the cytochrome C of a carp and a frog, turtle, chicken, rabbit, and horse is a very constant 13% to 14%. Similarly, the difference between the cytochrome C of a bacterium and yeast, wheat, moth, tuna, pigeon, and horse ranges from 64% to 69%. Together with the work of Emile Zuckerkandl and Linus Pauling, the genetic equidistance result directly led to the formal postulation of the molecular clock hypothesis in the early 1960s. Genetic equidistance has often been used to infer equal time of separation of different sister species from an outgroup.
Later Allan Wilson
Allan Wilson
Allan Charles Wilson was a pioneer in the use of molecular approaches to understand evolutionary change and reconstruct phylogenies, and a contributor to the study of human evolution. He was one of the most controversial figures in post-war biology; his work attracted a great deal of attention...
and Vincent Sarich
Vincent Sarich
- Biography :Born in Chicago, he received a bachelor of science in chemistry from Illinois Institute of Technology and his masters and doctorate in anthropology from University of California, Berkeley...
built upon this work.
Relationship with neutral theory
The observation of a clock-like rate of molecular change was originally purely phenomenologicalPhenomenology (science)
The term phenomenology in science is used to describe a body of knowledge that relates empirical observations of phenomena to each other, in a way that is consistent with fundamental theory, but is not directly derived from theory. For example, we find the following definition in the Concise...
. Later, the work of Motoo Kimura
Motoo Kimura
was a Japanese biologist best known for introducing the neutral theory of molecular evolution in 1968. He became one of the most influential theoretical population geneticists. He is remembered in genetics for his innovative use of diffusion equations to calculate the probability of fixation of...
developed the neutral theory of molecular evolution
Neutral theory of molecular evolution
The neutral theory of molecular evolution states that the vast majority of evolutionary changes at the molecular level are caused by random drift of selectively neutral mutants . The theory was introduced by Motoo Kimura in the late 1960s and early 1970s...
, which predicted a molecular clock. Let there be N individuals, and to keep this calculation simple, let the individuals be haploid
Ploidy
Ploidy is the number of sets of chromosomes in a biological cell.Human sex cells have one complete set of chromosomes from the male or female parent. Sex cells, also called gametes, combine to produce somatic cells. Somatic cells, therefore, have twice as many chromosomes. The haploid number is...
(i.e. have one copy of each gene). Let the rate of neutral mutation
Mutation
In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus. They can be defined as sudden and spontaneous changes in the cell. Mutations are caused by radiation, viruses, transposons and mutagenic...
s (i.e. mutations with no effect on fitness
Fitness (biology)
Fitness is a central idea in evolutionary theory. It can be defined either with respect to a genotype or to a phenotype in a given environment...
) in a new individual be . The probability that this new mutation will become fixed
Fixation (population genetics)
In population genetics, fixation is the change in a gene pool from a situation where there exist at least two variants of a particular gene to a situation where only one of the alleles remains...
in the population is then 1/N, since each copy of the gene is as good as any other. Every generation, each individual can have new mutations, so there are N new neutral mutations in the population as a whole. That means that each generation, new neutral mutations will become fixed. If most changes seen during molecular evolution
Molecular evolution
Molecular evolution is in part a process of evolution at the scale of DNA, RNA, and proteins. Molecular evolution emerged as a scientific field in the 1960s as researchers from molecular biology, evolutionary biology and population genetics sought to understand recent discoveries on the structure...
are neutral, then fixations
Fixation (population genetics)
In population genetics, fixation is the change in a gene pool from a situation where there exist at least two variants of a particular gene to a situation where only one of the alleles remains...
in a population will accumulate at a clock-rate that is equal to the rate of neutral mutation
Mutation
In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus. They can be defined as sudden and spontaneous changes in the cell. Mutations are caused by radiation, viruses, transposons and mutagenic...
s in an individual.
Calibration
The molecular clock alone can only say that one time period is twice as long as another: it cannot assign concrete dates. To achieve this, the molecular clock must first be calibratedCalibration
Calibration is a comparison between measurements – one of known magnitude or correctness made or set with one device and another measurement made in as similar a way as possible with a second device....
against independent evidence about dates, such as the fossil
Fossil
Fossils are the preserved remains or traces of animals , plants, and other organisms from the remote past...
record. Alternatively, for viral phylogenetics and ancient DNA
Ancient DNA
Ancient DNA is DNA isolated from ancient specimens. It can be also loosely described as any DNA recovered from biological samples that have not been preserved specifically for later DNA analyses...
studies, two areas of evolutionary biology where it is possible to sample sequences over an evolutionary timescale, the dates of the samples themselves can be used to calibrate the molecular clock.
Non-constant rate of molecular clock
Sometimes only a single divergence date can be estimated from fossils, with all other dates inferred from that. Other sets of species have abundant fossils available, allowing the MCH of constant divergence rates to be tested. DNA sequences experiencing low levels of negative selectionNegative selection
Negative selection may refer to:*Negative selection , in natural selection it refers to the selective removal of rare alleles that are deleterious...
showed divergence rates of 0.7-0.8% per Myr
Myr
The symbol myr was formerly used in English-language geology, and remains as the standard usage in astronomy, as a unit of one million years.It is an abbreviation for 'million years' and lower case is used in geology, while upper case is used in astronomy....
in bacteria, mammals, invertebrates, and plants. In the same study, genomic regions experiencing very high negative or purifying selection (encoding rRNA) were considerably slower (1% per 50 Myr).
In addition to such variation in rate with genomic position, since the early 1990s, variation among taxa has proven fertile ground for research too, even over comparatively short periods of evolutionary time (for example mockingbird
Mockingbird
Mockingbirds are a group of New World passerine birds from the Mimidae family. They are best known for the habit of some species mimicking the songs of other birds and the sounds of insects and amphibians, often loudly and in rapid succession. There are about 17 species in three genera...
s). Tube-nosed seabirds
Procellariiformes
Procellariiformes is an order of seabirds that comprises four families: the albatrosses, petrels and shearwaters, storm petrels, and diving petrels...
have molecular clocks that on average run at half speed of many other birds, possibly due to long generation
Generation
Generation , also known as procreation in biological sciences, is the act of producing offspring....
times, and many turtles have a molecular clock running at one-eighth the speed it does in small mammals or even slower. Effects of small population size
Small population size
Small populations behave differently from larger populations. They often result in population bottlenecks, which have harmful consequences for the survival of that population.-Demographic effects:...
are also likely to confound molecular clock analyses; cheetah
Cheetah
The cheetah is a large-sized feline inhabiting most of Africa and parts of the Middle East. The cheetah is the only extant member of the genus Acinonyx, most notable for modifications in the species' paws...
s for example, having gone through at least 2 population bottleneck
Population bottleneck
A population bottleneck is an evolutionary event in which a significant percentage of a population or species is killed or otherwise prevented from reproducing....
s, could not be adequately studied based on a molecular clock model alone. Researchers such as Ayala have more fundamentally challenged the molecular clock hypothesis. According to Ayala's 1999 study, 5 factors combine to limit the application of molecular clock models:
- Changing generation times (If the rate of new mutations depends at least partly on the number of generations rather than the number of years)
- Population size (Genetic driftGenetic driftGenetic drift or allelic drift is the change in the frequency of a gene variant in a population due to random sampling.The alleles in the offspring are a sample of those in the parents, and chance has a role in determining whether a given individual survives and reproduces...
is stronger in small populations, and so more mutations are effectively neutral) - Species-specific differences (due to differing metabolism, ecology, evolutionary history,...)
- Change in function of the protein studied (can be avoided in closely related species by utilizing non-coding DNA sequences or emphasizing silent mutationSilent mutationSilent mutations are DNA mutations that do not result in a change to the amino acid sequence of a protein. They may occur in a non-coding region , or they may occur within an exon in a manner that does not alter the final amino acid sequence...
s) - Changes in the intensity of natural selection
Molecular clock users have developed workaround solutions using a number of statistical approaches including maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....
techniques and later Bayesian modeling
Bayesian statistics
Bayesian statistics is that subset of the entire field of statistics in which the evidence about the true state of the world is expressed in terms of degrees of belief or, more specifically, Bayesian probabilities...
. In particular, models that take into account rate variation across lineages have been proposed in order to obtain better estimates of divergence times. These models are called relaxed molecular clocks because they represent an intermediate position between the 'strict' molecular clock hypothesis and Felsenstein's many-rates model and are made possible through MCMC
Markov chain Monte Carlo
Markov chain Monte Carlo methods are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample of the...
techniques that explore a weighted range of tree topologies and simultaneously estimate parameters of the chosen substitution model. It must be remembered that divergence dates inferred using a molecular clock are based on statistical inference
Inference
Inference is the act or process of deriving logical conclusions from premises known or assumed to be true. The conclusion drawn is also called an idiomatic. The laws of valid inference are studied in the field of logic.Human inference Inference is the act or process of deriving logical conclusions...
and not on direct evidence
Evidence
Evidence in its broadest sense includes everything that is used to determine or demonstrate the truth of an assertion. Giving or procuring evidence is the process of using those things that are either presumed to be true, or were themselves proven via evidence, to demonstrate an assertion's truth...
.
The molecular clock runs into particular challenges at very short and very long timescales. At long timescales, the problem is saturation
Saturation (genetic)
Genetic saturation is the reduced appearance, which occurs over time, of sequence divergence rate that results from reverse mutations, homoplasies and other multiple changes occurring at single sites along two lineages....
. When enough time has passed, many sites have undergone more than one change, but it is impossible to detect more than one. This means that the observed number of changes is no longer linear
Linear function
In mathematics, the term linear function can refer to either of two different but related concepts:* a first-degree polynomial function of one variable;* a map between two vector spaces that preserves vector addition and scalar multiplication....
with time, but instead flattens out.
At very short time scales, many differences between samples do not represent fixation
Fixation (population genetics)
In population genetics, fixation is the change in a gene pool from a situation where there exist at least two variants of a particular gene to a situation where only one of the alleles remains...
of different sequences in the different populations. Instead, they represent alternative alleles that were both present as part of a polymorphism in the common ancestor. The inclusion of differences that have not yet become fixed
Fixation (population genetics)
In population genetics, fixation is the change in a gene pool from a situation where there exist at least two variants of a particular gene to a situation where only one of the alleles remains...
leads to a potentially dramatic inflation of the apparent rate of the molecular clock at very short timescales.
Uses
The molecular clock technique is an important tool in molecular systematics, the use of molecular geneticsMolecular genetics
Molecular genetics is the field of biology and genetics that studies the structure and function of genes at a molecular level. The field studies how the genes are transferred from generation to generation. Molecular genetics employs the methods of genetics and molecular biology...
information to determine the correct scientific classification of organisms or to study variation in selective forces.
Knowledge of approximately-constant rate of molecular evolution in particular sets of lineages also facilitates establishing the dates of phylogenetic events, including those not documented by fossils, such as the divergence of living taxa
Taxon
|thumb|270px|[[African elephants]] form a widely-accepted taxon, the [[genus]] LoxodontaA taxon is a group of organisms, which a taxonomist adjudges to be a unit. Usually a taxon is given a name and a rank, although neither is a requirement...
and the formation of the phylogenetic tree
Phylogenetic tree
A phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the inferred evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical and/or genetic characteristics...
. But in these cases — especially over long stretches of time — the limitations of MCH (above) must be considered; such estimates may be off by 50% or more.
See also
- Circadian clock
- Gene ordersGene ordersGene orders is the permutation of genome arrangement. So far a fair amount of work trying to describe whether gene orders evolve according to a molecular clock or in jumps ....
- Human mitochondrial molecular clockHuman mitochondrial molecular clockThe human mitochondrial molecular clock is the rate at which mutations have been accumulating in the mitochondrial genome of hominids during the course of human evolution. The archeological record of human activity from early periods in human prehistory is relatively limited and its interpretation...
- Mitochondrial EveMitochondrial EveIn the field of human genetics, Mitochondrial Eve refers to the matrilineal "MRCA" . In other words, she was the woman from whom all living humans today descend, on their mother's side, and through the mothers of those mothers and so on, back until all lines converge on one person...
and Y-chromosomal AdamY-chromosomal AdamIn human genetics, Y-chromosomal Adam is the theoretical most recent common ancestor from whom all living people are descended patrilineally . Many studies report that Y-chromosomal Adam lived as early as around 142,000 years ago and possibly as recently as 60,000 years ago... - Neutral theory of molecular evolutionNeutral theory of molecular evolutionThe neutral theory of molecular evolution states that the vast majority of evolutionary changes at the molecular level are caused by random drift of selectively neutral mutants . The theory was introduced by Motoo Kimura in the late 1960s and early 1970s...