Paleopolyploidy
Encyclopedia
Paleopolyploidy refers to ancient genome duplications which occurred at least several million years ago (mya). The genome doubling event could either be an autopolyploidy or an allopolyploidy. Due to functional redundancy
, genes are rapidly silenced and/or lost from the duplicated genomes. Most paleopolyploids, through evolutionary time, have lost their polyploid status through a process called diploidization, and are currently referred to as "diploids" (e.g. baker's yeast
, Arabidopsis, and perhaps humans).
Paleopolyploidy is extensively studied in plant lineages. It has been found that almost all flowering plants have undergone at least one round of genome duplication at some point during their evolutionary history. Ancient genome duplications are also found in the early ancestor of vertebrates (which includes the human lineage) and another near the origin of the bony fishes. Evidence suggests that baker's yeast (Saccharomyces cerevisiae
), which has a compact genome, experienced polyploidization during its evolutionary history.
, the grass family, had a genome duplication 50–70 mya. Subsequent genome doublings occurred in maize
, and twice in wheat
. A duplication which is shared by all eudicots
occurred 50-70 mya, and perhaps an earlier duplication affected the ancestor of all the world's flowering plants over 200 mya.
Furthermore, Arabidopsis thaliana
, which has a small genome for a plant, experienced at least two rounds of paleopolyploidy. The most recent event took place before the divergence of the Arabidopsis and Brassica lineages, 25–40 mya.
Compared with plants, paleopolyploidy is much rarer in the animal kingdom. It is identified mainly in the amphibians and bony fishes. Although some studies suggested one (some say two) common genome duplications are shared by all vertebrates (including humans), the evidence is not as strong as in the other cases, and it is still under debate. However, many researchers are interested in the reasons why animal lineages had fewer paleopolyploidization events than did plants.
Lastly, a well-supported paleopolyploidy has been found in baker's yeast (Saccharomyces cerevisiae), despite its small, compact genome (~13Mbp) after the divergence from K. waltii. Through genome streamlining, yeast has lost 90% of the duplicated genome over evolutionary time and is now recognized as a diploid organism.
. To distinguish between whole-genome duplication and a collection of single gene duplication
(which is a common phenomenon in the genome) events, the following rules are often applied:
In theory, the two duplicated genes should have the same "age"; that is, the divergence of the sequence should be equal between the two genes duplicated by paleopolyploidy (homeologs). Synonymous substitution
rate, Ks, is often used as a molecular clock to determine the time of gene duplication. Thus, paleopolyploidy is identified as a "peak" on the duplicate number vs. Ks graph (shown on the right).
Duplication events that occurred a long time ago in the history of various evolutionary lineages can be difficult to detect because of subsequent diploidization (such that a polyploid starts to behave cytogenetically as a diploid over time) as mutations and gene translations gradually make one copy of each chromosome unlike its counterpart. This usually results in a low confidence for identifying a very ancient paleopolyploidy.
. He reasoned that the vertebrate genome could not achieve its complexity without large scale whole-genome duplications. The "two rounds of genome duplication" hypothesis (2R hypothesis
) came about, and gained in popularity, especially among developmental biologists.
However, the 2R hypothesis has been questioned by many researchers. Based on the theory, the human genome should have a 4:1 gene ratio compared with invertebrate genomes. This is not supported by findings from the 48 vertebrate genome projects available in mid-2011, for example the human genome consists of ~21,000 protein coding genes according to June, 2011 counts at UCSC and Ensembl genome analysis centers while an average invertebrate genome size is about 15,000 genes. Further, the recent completion of the amphioxus
genome sequence does not support any such whole genome duplication with largescale retention, as predicted by the hypothesis. Additional arguments against 2R were based on the lack of the (AB)(CD) tree topology amongst four members of a gene family in vertebrates. However, if the two genome duplications occurred close together, we would not expect to find this topology.
These recent findings have largely supported the 2R hypothesis.
Gene redundancy
Gene redundancy is the existence of several genes in the genome of an organism that perform the same role to some extent. This is the case for many sets of paralogous genes...
, genes are rapidly silenced and/or lost from the duplicated genomes. Most paleopolyploids, through evolutionary time, have lost their polyploid status through a process called diploidization, and are currently referred to as "diploids" (e.g. baker's yeast
Baker's yeast
Baker's yeast is the common name for the strains of yeast commonly used as a leavening agent in baking bread and bakery products, where it converts the fermentable sugars present in the dough into carbon dioxide and ethanol...
, Arabidopsis, and perhaps humans).
Paleopolyploidy is extensively studied in plant lineages. It has been found that almost all flowering plants have undergone at least one round of genome duplication at some point during their evolutionary history. Ancient genome duplications are also found in the early ancestor of vertebrates (which includes the human lineage) and another near the origin of the bony fishes. Evidence suggests that baker's yeast (Saccharomyces cerevisiae
Saccharomyces cerevisiae
Saccharomyces cerevisiae is a species of yeast. It is perhaps the most useful yeast, having been instrumental to baking and brewing since ancient times. It is believed that it was originally isolated from the skin of grapes...
), which has a compact genome, experienced polyploidization during its evolutionary history.
Eukaryotes
Ancient genome duplications are widespread throughout eukaryotic lineages, particularly in plants. Almost all important cereal crops are paleopolyploids. Studies suggest that the common ancestor of PoaceaePoaceae
The Poaceae is a large and nearly ubiquitous family of flowering plants. Members of this family are commonly called grasses, although the term "grass" is also applied to plants that are not in the Poaceae lineage, including the rushes and sedges...
, the grass family, had a genome duplication 50–70 mya. Subsequent genome doublings occurred in maize
Maize
Maize known in many English-speaking countries as corn or mielie/mealie, is a grain domesticated by indigenous peoples in Mesoamerica in prehistoric times. The leafy stalk produces ears which contain seeds called kernels. Though technically a grain, maize kernels are used in cooking as a vegetable...
, and twice in wheat
Wheat
Wheat is a cereal grain, originally from the Levant region of the Near East, but now cultivated worldwide. In 2007 world production of wheat was 607 million tons, making it the third most-produced cereal after maize and rice...
. A duplication which is shared by all eudicots
Eudicots
Eudicots and Eudicotyledons are botanical terms introduced by Doyle & Hotton to refer to a monophyletic group of flowering plants that had been called tricolpates or non-Magnoliid dicots by previous authors...
occurred 50-70 mya, and perhaps an earlier duplication affected the ancestor of all the world's flowering plants over 200 mya.
Furthermore, Arabidopsis thaliana
Arabidopsis thaliana
Arabidopsis thaliana is a small flowering plant native to Europe, Asia, and northwestern Africa. A spring annual with a relatively short life cycle, arabidopsis is popular as a model organism in plant biology and genetics...
, which has a small genome for a plant, experienced at least two rounds of paleopolyploidy. The most recent event took place before the divergence of the Arabidopsis and Brassica lineages, 25–40 mya.
Compared with plants, paleopolyploidy is much rarer in the animal kingdom. It is identified mainly in the amphibians and bony fishes. Although some studies suggested one (some say two) common genome duplications are shared by all vertebrates (including humans), the evidence is not as strong as in the other cases, and it is still under debate. However, many researchers are interested in the reasons why animal lineages had fewer paleopolyploidization events than did plants.
Lastly, a well-supported paleopolyploidy has been found in baker's yeast (Saccharomyces cerevisiae), despite its small, compact genome (~13Mbp) after the divergence from K. waltii. Through genome streamlining, yeast has lost 90% of the duplicated genome over evolutionary time and is now recognized as a diploid organism.
Detection method
Duplicated genes can be identified through sequence homology on the DNA or protein level. Paleopolyploidy can be identified as massive gene duplication at one time using a molecular clockMolecular clock
The molecular clock is a technique in molecular evolution that uses fossil constraints and rates of molecular change to deduce the time in geologic history when two species or other taxa diverged. It is used to estimate the time of occurrence of events called speciation or radiation...
. To distinguish between whole-genome duplication and a collection of single gene duplication
Gene duplication
Gene duplication is any duplication of a region of DNA that contains a gene; it may occur as an error in homologous recombination, a retrotransposition event, or duplication of an entire chromosome.The second copy of the gene is often free from selective pressure — that is, mutations of it have no...
(which is a common phenomenon in the genome) events, the following rules are often applied:
- Duplicated genes are located in large duplicated blocks. Single gene duplication is a random process and tends to make duplicated genes scattered throughout the genome.
- Duplicated blocks are non-overlapping because they were created simultaneously. Segmental duplication within the genome can fulfill Rule #1; but multiple independent segmental duplications could overlap each other.
In theory, the two duplicated genes should have the same "age"; that is, the divergence of the sequence should be equal between the two genes duplicated by paleopolyploidy (homeologs). Synonymous substitution
Synonymous substitution
A synonymous substitution is the evolutionary substitution of one base for another in an exon of a gene coding for a protein, such that the produced amino acid sequence is not modified. Synonymous substitutions and mutations affecting noncoding DNA are collectively known as silent mutations...
rate, Ks, is often used as a molecular clock to determine the time of gene duplication. Thus, paleopolyploidy is identified as a "peak" on the duplicate number vs. Ks graph (shown on the right).
Duplication events that occurred a long time ago in the history of various evolutionary lineages can be difficult to detect because of subsequent diploidization (such that a polyploid starts to behave cytogenetically as a diploid over time) as mutations and gene translations gradually make one copy of each chromosome unlike its counterpart. This usually results in a low confidence for identifying a very ancient paleopolyploidy.
Evolutionary importance
Paleopolyploidization events lead to massive cellular changes, including doubling of the genetic material, changes in gene expression and increased cell size. Gene loss during diploidization is not completely random, but heavily selected. Genes from large gene families are duplicated. On the other hand, individual genes are not duplicated. Overall, paleopolyploidy can have both short-term and long-term evolutionary effects on an organism's fitness in the natural environment.- GenomeGenomeIn modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....
Diversity
- genome doubling provided the organism with redundant alleles that can evolve freely with little selection pressure. The duplicated genes can undergo neofunctionalization or subfunctionalizationSubfunctionalizationSubfunctionalization is a model that explains the process by which duplicated genes remain functional in a genome. Duplicated genes are frequently formed in eukaryotic genomes and are thought to be initially redundant in function. One of the extra copies is usually under relaxed selection and...
which could help the organism adapt to the new environment or survive different stress conditions.
- HeterosisHeterosisHeterosis, or hybrid vigor, or outbreeding enhancement, is the improved or increased function of any biological quality in a hybrid offspring. The adjective derived from heterosis is heterotic....
- polyploids often have larger cell sizes and even larger organs. Many important crops, including wheat, maize and cottonCottonCotton is a soft, fluffy staple fiber that grows in a boll, or protective capsule, around the seeds of cotton plants of the genus Gossypium. The fiber is almost pure cellulose. The botanical purpose of cotton fiber is to aid in seed dispersal....
, are paleopolyploids which were selected for domestication by ancient peoples.
- SpeciationSpeciationSpeciation is the evolutionary process by which new biological species arise. The biologist Orator F. Cook seems to have been the first to coin the term 'speciation' for the splitting of lineages or 'cladogenesis,' as opposed to 'anagenesis' or 'phyletic evolution' occurring within lineages...
- It has been suggested that many polyploidization events created new species, via a gain of adaptive traits, or by sexual incompatibility with their diploid counterparts. An example would be the recent speciation of allopolyploid Spartina — S. anglicaSpartina anglicaSpartina anglica is a species of cordgrass that originated in southern England in about 1870. It is an allotetraploid species derived from the hybrid Spartina × townsendii, which arose when the European native cordgrass Spartina maritima hybridised with the introduced American Spartina...
; the polyploid plant is so successful that it is listed as invasive species in many regions.
Human as paleopolyploid
The hypothesis of human paleopolyploidy originated as early as the 1970s, proposed by the biologist Susumu OhnoSusumu Ohno
was an Asian American geneticist and evolutionary biologist, and seminal researcher in the field of molecular evolution.- Biography :Susumu Ohno was born of Japanese parents in Seoul, Korea, on February 1, 1928. The second of five children, he was the son of the minister of education of the...
. He reasoned that the vertebrate genome could not achieve its complexity without large scale whole-genome duplications. The "two rounds of genome duplication" hypothesis (2R hypothesis
2R hypothesis
The 2R hypothesis or Ohno's hypothesis, first proposed by Susumu Ohno in 1970, is a contested hypothesis in genomics and molecular evolution suggesting that the genomes of the early vertebrate lineage underwent one or more complete genome duplications, and thus modern vertebrate genomes reflect...
) came about, and gained in popularity, especially among developmental biologists.
However, the 2R hypothesis has been questioned by many researchers. Based on the theory, the human genome should have a 4:1 gene ratio compared with invertebrate genomes. This is not supported by findings from the 48 vertebrate genome projects available in mid-2011, for example the human genome consists of ~21,000 protein coding genes according to June, 2011 counts at UCSC and Ensembl genome analysis centers while an average invertebrate genome size is about 15,000 genes. Further, the recent completion of the amphioxus
Lancelet
The lancelets , also known as amphioxus, are the modern representatives of the subphylum Cephalochordata, formerly thought to be the sister group of the craniates. They are usually found buried in sand in shallow parts of temperate or tropical seas. In Asia, they are harvested commercially as food...
genome sequence does not support any such whole genome duplication with largescale retention, as predicted by the hypothesis. Additional arguments against 2R were based on the lack of the (AB)(CD) tree topology amongst four members of a gene family in vertebrates. However, if the two genome duplications occurred close together, we would not expect to find this topology.
These recent findings have largely supported the 2R hypothesis.
See also
- Gene duplicationGene duplicationGene duplication is any duplication of a region of DNA that contains a gene; it may occur as an error in homologous recombination, a retrotransposition event, or duplication of an entire chromosome.The second copy of the gene is often free from selective pressure — that is, mutations of it have no...
- GenomicsGenomicsGenomics is a discipline in genetics concerning the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis,...
- KaryotypeKaryotypeA karyotype is the number and appearance of chromosomes in the nucleus of an eukaryotic cell. The term is also used for the complete set of chromosomes in a species, or an individual organism.p28...
- PloidyPloidyPloidy is the number of sets of chromosomes in a biological cell.Human sex cells have one complete set of chromosomes from the male or female parent. Sex cells, also called gametes, combine to produce somatic cells. Somatic cells, therefore, have twice as many chromosomes. The haploid number is...
- PolyploidyPolyploidyPolyploid is a term used to describe cells and organisms containing more than two paired sets of chromosomes. Most eukaryotic species are diploid, meaning they have two sets of chromosomes — one set inherited from each parent. However polyploidy is found in some organisms and is especially common...
- SpeciationSpeciationSpeciation is the evolutionary process by which new biological species arise. The biologist Orator F. Cook seems to have been the first to coin the term 'speciation' for the splitting of lineages or 'cladogenesis,' as opposed to 'anagenesis' or 'phyletic evolution' occurring within lineages...