OrthoDB
Encyclopedia
OrthoDB presents a catalog of eukaryotic orthologous
Homology (biology)
Homology forms the basis of organization for comparative biology. In 1843, Richard Owen defined homology as "the same organ in different animals under every variety of form and function". Organs as different as a bat's wing, a seal's flipper, a cat's paw and a human hand have a common underlying...

 protein-coding genes across vertebrates, arthropods, and fungi. Orthology refers to the last common ancestor of the species under consideration, and thus OrthoDB explicitly delineates orthologs at each radiation along the species phylogeny. The database of orthologs presents available protein descriptors, together with Gene Ontology
Gene Ontology
The Gene Ontology, or GO, is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species...

 and InterPro
InterPro
InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them....

 attributes, which serve to provide general descriptive annotations of the orthologous groups, and facilitate comprehensive orthology database querying.

Methodology

Orthology is defined relative to the last common ancestor of the species being considered, thereby determining the hierarchical nature of orthologous classifications. This is explicitly addressed in OrthoDB by application of the orthology delineation procedure at each radiation point of the considered phylogeny, empirically computed over the super-alignment of single-copy orthologs using a maximum-likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

 approach. The OrthoDB implementation employs a Best-Reciprocal-Hit (BRH) clustering algorithm based on all-against-all Smith–Waterman protein sequence comparisons. Gene set pre-processing selects the longest protein-coding transcript of alternatively spliced genes and of very similar gene copies. The procedure triangulates BRHs to progressively build the clusters and requires an overall minimum sequence alignment overlap to avoid domain walking. These core clusters are further expanded to include all more closely related within-species in-paralogs, and the previously identified very similar gene copies.

Data content

The database now contains over 100 species with 44 vertebrate genomes sourced from Ensembl
Ensembl
Ensembl is a joint scientific project between the European Bioinformatics Institute and the Wellcome Trust Sanger Institute, which was launched in 1999 in response to the imminent completion of the Human Genome Project...

, 46 fungal genomes from UniProt
UniProt
UniProt is a comprehensive, high-quality and freely accessible database of protein sequence and functional information, many of which are derived from genome sequencing projects...

and 25 arthropod genomes from several databases. The ever-increasing sampling of sequenced eukaryotic genomes brings a clearer account of the majority of gene genealogies that will facilitate informed hypotheses of gene function in newly sequenced genomes.

Examples of studies that have employed data from OrthoDB include comparative analyses of gene repertoire evolution, comparisons of fruit fly and mosquito developmental genes, analyses of bloodmeal- or infection-induced changes in gene expression in mosquitoes, and analysis of the evolution of mammalian milk production. Others studies citing OrthoDB can be found at PubMed.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK