Homologene
Encyclopedia
HomoloGene, a tool of the National Center for Biotechnology Information
(NCBI), is a system for automated detection of homologs
(similarity attributable to descent from a common ancestor) among the annotated genes of several completely sequenced eukaryotic genomes.
The HomoloGene processing consists of the protein analysis from the input organisms. Sequences are compared using blastphttp://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&ALIGNMENTS=250&ALIGNMENT_VIEW=Pairwise&CDD_SEARCH=on&CLIENT=web&DATABASE=nr&DESCRIPTIONS=500&ENTREZ_QUERY=%28none%29&EXPECT=10&FILTER=L&FORMAT_OBJECT=Alignment&FORMAT_TYPE=HTML&I_THRESH=0.005&MATRIX_NAME=BLOSUM62&NCBI_GI=on&PAGE=Proteins&PROGRAM=blastp&SERVICE=plain&SET_DEFAULTS.x=41&SET_DEFAULTS.y=5&SHOW_OVERVIEW=on&END_OF_HTTPGET=Yes&SHOW_LINKOUT=yes&GET_SEQUENCE=yes|blastp],then matched up and put into groups, using a taxonomic tree built from sequence similarity, where closer related organisms are matched up first, and then further organisms are added to the tree. The protein alignments are mapped back to their corresponding DNA sequences, and then distance metrics as molecular distances Jukes and Cantor (1969)
, Ka/Ks ratio
can be calculated.
The sequences are matched up by using a heuristic algorithm for maximizing the score globally, rather than locally, in a bipartite matching (see complete bipartite graph
). And then it calculates the statistical significance of each match. Cutoffs are made per position and Ks values are set to prevent false "orthologs" from being grouped together. “Paralogs”
are identified by finding sequences that are closer within species than other species.
, Rattus norvegicus, Arabidopsis thaliana
, Gallus gallus, Oryza sativa
, Anopheles gambiae
, Drosophila melanogaster
, Magnaporthe grisea
, Neurospora crassa
, Caenorhabditis elegans
, Saccharomyces cerevisiae
, Kluyveromyces lactis
, Eremothecium gossypii, Schizosaccharomyces pombe
and Plasmodium falciparum
.
As a result HomoloGene displays information about Genes, Proteins, Phenotypes, and Conserved Domains.
National Center for Biotechnology Information
The National Center for Biotechnology Information is part of the United States National Library of Medicine , a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper...
(NCBI), is a system for automated detection of homologs
Homology (biology)
Homology forms the basis of organization for comparative biology. In 1843, Richard Owen defined homology as "the same organ in different animals under every variety of form and function". Organs as different as a bat's wing, a seal's flipper, a cat's paw and a human hand have a common underlying...
(similarity attributable to descent from a common ancestor) among the annotated genes of several completely sequenced eukaryotic genomes.
The HomoloGene processing consists of the protein analysis from the input organisms. Sequences are compared using blastphttp://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&ALIGNMENTS=250&ALIGNMENT_VIEW=Pairwise&CDD_SEARCH=on&CLIENT=web&DATABASE=nr&DESCRIPTIONS=500&ENTREZ_QUERY=%28none%29&EXPECT=10&FILTER=L&FORMAT_OBJECT=Alignment&FORMAT_TYPE=HTML&I_THRESH=0.005&MATRIX_NAME=BLOSUM62&NCBI_GI=on&PAGE=Proteins&PROGRAM=blastp&SERVICE=plain&SET_DEFAULTS.x=41&SET_DEFAULTS.y=5&SHOW_OVERVIEW=on&END_OF_HTTPGET=Yes&SHOW_LINKOUT=yes&GET_SEQUENCE=yes|blastp],then matched up and put into groups, using a taxonomic tree built from sequence similarity, where closer related organisms are matched up first, and then further organisms are added to the tree. The protein alignments are mapped back to their corresponding DNA sequences, and then distance metrics as molecular distances Jukes and Cantor (1969)
Substitution model
In biology, a substitution model describes the process from which a sequence of characters changes into another set of traits. For example, in cladistics, each position in the sequence might correspond to a property of a species which can either be present or absent. The alphabet could then consist...
, Ka/Ks ratio
Ka/Ks ratio
In genetics, the Ka/Ks ratio , is the ratio of the number of non-synonymous substitutions per non-synonymous site to the number of synonymous substitutions per synonymous site , which can be used as an indicator of selective pressure acting on a protein-coding gene...
can be calculated.
The sequences are matched up by using a heuristic algorithm for maximizing the score globally, rather than locally, in a bipartite matching (see complete bipartite graph
Complete bipartite graph
In the mathematical field of graph theory, a complete bipartite graph or biclique is a special kind of bipartite graph where every vertex of the first set is connected to every vertex of the second set.- Definition :...
). And then it calculates the statistical significance of each match. Cutoffs are made per position and Ks values are set to prevent false "orthologs" from being grouped together. “Paralogs”
Homology (biology)
Homology forms the basis of organization for comparative biology. In 1843, Richard Owen defined homology as "the same organ in different animals under every variety of form and function". Organs as different as a bat's wing, a seal's flipper, a cat's paw and a human hand have a common underlying...
are identified by finding sequences that are closer within species than other species.
Input organisms
Homo sapiens, Pan troglodytes, Canis lupus familiaris, Bos taurus, Mus musculus, Danio rerioDanio rerio
The zebrafish, Danio rerio, is a tropical freshwater fish belonging to the minnow family of order Cypriniformes. It is a popular aquarium fish, frequently sold under the trade name zebra danio, and is an important vertebrate model organism in scientific research.-Taxonomy:The zebrafish are...
, Rattus norvegicus, Arabidopsis thaliana
Arabidopsis thaliana
Arabidopsis thaliana is a small flowering plant native to Europe, Asia, and northwestern Africa. A spring annual with a relatively short life cycle, arabidopsis is popular as a model organism in plant biology and genetics...
, Gallus gallus, Oryza sativa
Oryza sativa
Oryza sativa, commonly known as Asian rice, is the plant species most commonly referred to in English as rice. Oryza sativa is the cereal with the smallest genome, consisting of just 430Mb across 12 chromosomes...
, Anopheles gambiae
Anopheles gambiae
Anopheles gambiae is a complex of at least seven morphologically distinguishable species of mosquitoes in the genus Anopheles. This complex was recognised in the 1960s and includes the most important vectors of malaria in sub-Saharan Africa and the most efficient malaria vectors known.This species...
, Drosophila melanogaster
Drosophila melanogaster
Drosophila melanogaster is a species of Diptera, or the order of flies, in the family Drosophilidae. The species is known generally as the common fruit fly or vinegar fly. Starting from Charles W...
, Magnaporthe grisea
Magnaporthe grisea
Magnaporthe grisea, also known as rice blast fungus, rice rotten neck, rice seedling blight, blast of rice, oval leaf spot of graminea, pitting disease, ryegrass blast, and Johnson spot, is a plant-pathogenic fungus that causes an important disease affecting rice. It is now known that M...
, Neurospora crassa
Neurospora crassa
Neurospora crassa is a type of red bread mold of the phylum Ascomycota. The genus name, meaning "nerve spore" refers to the characteristic striations on the spores. The first published account of this fungus was from an infestation of French bakeries in 1843. N...
, Caenorhabditis elegans
Caenorhabditis elegans
Caenorhabditis elegans is a free-living, transparent nematode , about 1 mm in length, which lives in temperate soil environments. Research into the molecular and developmental biology of C. elegans was begun in 1974 by Sydney Brenner and it has since been used extensively as a model...
, Saccharomyces cerevisiae
Saccharomyces cerevisiae
Saccharomyces cerevisiae is a species of yeast. It is perhaps the most useful yeast, having been instrumental to baking and brewing since ancient times. It is believed that it was originally isolated from the skin of grapes...
, Kluyveromyces lactis
Kluyveromyces lactis
Kluyveromyces lactis is a Kluyveromyces yeast commonly used for genetic studies and industrial applications. Its name comes from the ability to assimilate lactose and convert it into lactic acid.- Use :...
, Eremothecium gossypii, Schizosaccharomyces pombe
Schizosaccharomyces pombe
Schizosaccharomyces pombe, also called "fission yeast", is a species of yeast. It is used as a model organism in molecular and cell biology. It is a unicellular eukaryote, whose cells are rod-shaped. Cells typically measure 3 to 4 micrometres in diameter and 7 to 14 micrometres in length...
and Plasmodium falciparum
Plasmodium falciparum
Plasmodium falciparum is a protozoan parasite, one of the species of Plasmodium that cause malaria in humans. It is transmitted by the female Anopheles mosquito. Malaria caused by this species is the most dangerous form of malaria, with the highest rates of complications and mortality...
.
Interface
The HomoloGene is linked to all Entrez databases and based on homology and phenotype information of these links:- Mouse Genome Informatics (MGI),
- Zebrafish Information Network (ZFIN),
- Saccharomyces Genome Database (SGD),
- Clusters of Orthologous Groups (COG),
- FlyBase,
- Online Mendelian Inheritance in Man (OMIM)
As a result HomoloGene displays information about Genes, Proteins, Phenotypes, and Conserved Domains.
External links
- HomoloGene at the National Center for Biotechnology InformationNational Center for Biotechnology InformationThe National Center for Biotechnology Information is part of the United States National Library of Medicine , a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper...
- Bioinformatic Harvester - Bioinformatic Harvester, a meta search engine that uses Homologene
- OMIM
- ZFIN
- SGD