Comparative genomics
Encyclopedia
Comparative genomics is the study of the relationship of genome
structure and function across different biological species
or strains
. Comparative genomics is an attempt to take advantage of the information provided by the signatures of selection to understand the function and evolutionary processes that act on genomes. While it is still a young field, it holds great promise to yield insights into many aspects of the evolution of modern species. The sheer amount of information contained in modern genomes (3.2 gigabases in the case of humans) necessitates that the methods of comparative genomics are automated. Gene finding is an important application of comparative genomics, as is discovery of new, non-coding functional elements of the genome.
Comparative genomics exploits both similarities and differences in the proteins, RNA
, and regulatory regions of different organisms to infer how selection
has acted upon these elements. Those elements that are responsible for similarities between different species
should be conserved through time (stabilizing selection
), while those elements responsible for differences among species should be divergent (positive selection). Finally, those elements that are unimportant to the evolutionary success of the organism will be unconserved (selection is neutral).
One of the important goals of the field is the identification of the mechanisms of eukaryotic genome evolution. It is however often complicated by the multiplicity of events that have taken place throughout the history of individual lineages, leaving only distorted and superimposed traces in the genome of each living organism. For this reason comparative genomics studies of small model organisms (for example yeast
) are of great importance to advance our understanding of general mechanisms of evolution.
Having come a long way from its initial use of finding functional proteins, comparative genomics is now concentrating on finding regulatory regions and siRNA
molecules. Recently, it has been discovered that distantly related species often share long conserved stretches of DNA
that do not appear to code for any protein (see conserved non-coding sequence
). One such ultra-conserved region
, that was stable from chicken to chimp has undergone a sudden burst of change in the human lineage, and is found to be active in the developing brain of the human embryo.
Computational approaches to genome comparison have recently become a common research topic in computer science. A public collection of case studies and demonstrations is growing, ranging from whole genome comparisons to gene expression analysis. This has increased the introduction of different ideas, including concepts from systems and control, information theory, strings analysis and data mining. It is anticipated that computational approaches will become and remain a standard topic for research and teaching, while multiple courses will begin training students to be fluent in both topics.
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....
structure and function across different biological species
Species
In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring. While in many cases this definition is adequate, more precise or differing measures are...
or strains
Strain (biology)
In biology, a strain is a low-level taxonomic rank used in three related ways.-Microbiology and virology:A strain is a genetic variant or subtype of a micro-organism . For example, a "flu strain" is a certain biological form of the influenza or "flu" virus...
. Comparative genomics is an attempt to take advantage of the information provided by the signatures of selection to understand the function and evolutionary processes that act on genomes. While it is still a young field, it holds great promise to yield insights into many aspects of the evolution of modern species. The sheer amount of information contained in modern genomes (3.2 gigabases in the case of humans) necessitates that the methods of comparative genomics are automated. Gene finding is an important application of comparative genomics, as is discovery of new, non-coding functional elements of the genome.
Comparative genomics exploits both similarities and differences in the proteins, RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....
, and regulatory regions of different organisms to infer how selection
Selection
In the context of evolution, certain traits or alleles of genes segregating within a population may be subject to selection. Under selection, individuals with advantageous or "adaptive" traits tend to be more successful than their peers reproductively—meaning they contribute more offspring to the...
has acted upon these elements. Those elements that are responsible for similarities between different species
Species
In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring. While in many cases this definition is adequate, more precise or differing measures are...
should be conserved through time (stabilizing selection
Stabilizing selection
-Description:Stabilizing or ambidirectional selection, , is a type of natural selection in which genetic diversity decreases as the population stabilizes on a particular trait value. This is probably the most common mechanism of action for natural selection...
), while those elements responsible for differences among species should be divergent (positive selection). Finally, those elements that are unimportant to the evolutionary success of the organism will be unconserved (selection is neutral).
One of the important goals of the field is the identification of the mechanisms of eukaryotic genome evolution. It is however often complicated by the multiplicity of events that have taken place throughout the history of individual lineages, leaving only distorted and superimposed traces in the genome of each living organism. For this reason comparative genomics studies of small model organisms (for example yeast
Yeast
Yeasts are eukaryotic micro-organisms classified in the kingdom Fungi, with 1,500 species currently described estimated to be only 1% of all fungal species. Most reproduce asexually by mitosis, and many do so by an asymmetric division process called budding...
) are of great importance to advance our understanding of general mechanisms of evolution.
Having come a long way from its initial use of finding functional proteins, comparative genomics is now concentrating on finding regulatory regions and siRNA
Sírna
Sírna Sáeglach , son of Dian mac Demal, son of Demal mac Rothechtaid, son of Rothechtaid mac Main, was, according to medieval Irish legend and historical tradition, a High King of Ireland...
molecules. Recently, it has been discovered that distantly related species often share long conserved stretches of DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
that do not appear to code for any protein (see conserved non-coding sequence
Conserved non-coding sequence
A conserved non-coding sequence is a DNA sequence of noncoding DNA that is evolutionarily conserved. These sequences are of interest for their potential to regulate gene production....
). One such ultra-conserved region
Conserved sequence
In biology, conserved sequences are similar or identical sequences that occur within nucleic acid sequences , protein sequences, protein structures or polymeric carbohydrates across species or within different molecules produced by the same organism...
, that was stable from chicken to chimp has undergone a sudden burst of change in the human lineage, and is found to be active in the developing brain of the human embryo.
Computational approaches to genome comparison have recently become a common research topic in computer science. A public collection of case studies and demonstrations is growing, ranging from whole genome comparisons to gene expression analysis. This has increased the introduction of different ideas, including concepts from systems and control, information theory, strings analysis and data mining. It is anticipated that computational approaches will become and remain a standard topic for research and teaching, while multiple courses will begin training students to be fluent in both topics.
See also
- EvolutionEvolutionEvolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...
- Molecular evolutionMolecular evolutionMolecular evolution is in part a process of evolution at the scale of DNA, RNA, and proteins. Molecular evolution emerged as a scientific field in the 1960s as researchers from molecular biology, evolutionary biology and population genetics sought to understand recent discoveries on the structure...
- Genetic driftGenetic driftGenetic drift or allelic drift is the change in the frequency of a gene variant in a population due to random sampling.The alleles in the offspring are a sample of those in the parents, and chance has a role in determining whether a given individual survives and reproduces...
- SelectionSelectionIn the context of evolution, certain traits or alleles of genes segregating within a population may be subject to selection. Under selection, individuals with advantageous or "adaptive" traits tend to be more successful than their peers reproductively—meaning they contribute more offspring to the...
- Molecular clockMolecular clockThe molecular clock is a technique in molecular evolution that uses fossil constraints and rates of molecular change to deduce the time in geologic history when two species or other taxa diverged. It is used to estimate the time of occurrence of events called speciation or radiation...
- Evolutionary biology
- Comparative anatomyComparative anatomyComparative anatomy is the study of similarities and differences in the anatomy of organisms. It is closely related to evolutionary biology and phylogeny .-Description:...
- Model organismModel organismA model organism is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the organism model will provide insight into the workings of other organisms. Model organisms are in vivo models and are widely used to...
- Homology
External links
- Genomes OnLine Database (GOLD)
- Genome News Network
- JCVI Comprehensive Microbial Resource
- Pathema: A Clade Specific Bioinformatics Resource Center
- CBS Genome Atlas Database
- The UCSC Genome Browser
- The U.S. National Human Genome Research Institute
- Ensembl The EnsemblEnsemblEnsembl is a joint scientific project between the European Bioinformatics Institute and the Wellcome Trust Sanger Institute, which was launched in 1999 in response to the imminent completion of the Human Genome Project...
Genome Browser - Genolevures, comparative genomics of the Hemiascomycetous yeasts
- Phylogenetically Inferred Groups (PhIGs), a recently developed method incorporates phylogenetic signals in building gene clusters for use in comparative genomics.
- Metazome, a resource for the phylogenomic exploration and analysis of Metazoan gene families.
- IMG The Integrated Microbial Genomes system, for comparative genome analysis by the DOE-JGI.
- Dcode.org Dcode.org Comparative Genomics Center.
- SUPERFAMILY Protein annotations for all completely sequenced organisms
- Comparative Genomics
- Blastology and Open Source: Needs and Deeds