Bioinformatic Harvester
Encyclopedia
The Bioinformatic Harvester is a bioinformatic meta search engine
at KIT Karlsruhe Institute of Technology
for gene
s and protein-associated information. Harvester currently works for human
, mouse
, rat
, zebrafish, drosophila
and arabidopsis thaliana
based information. Harvester cross-links >50 popular bioinformatic resources and allows cross searches. Harvester serves 10.000s of pages every day to scientists and physicians.
and gene databases along with information from so called "prediction servers." Prediction server e.g. provide online sequence analysis for a single protein. Harvesters search index is based on the IPI
and UniProt
protein information collection. The collections consists of:
]
s. Iframes are transparent windows within a HTML
pages. The iframe windows allows up-to-date viewing of the "iframed," linked databases. Several such iframes are combined on a Harvester protein page. This method allows convenient comparison of information from several databases.
Search Examples:
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...
at KIT Karlsruhe Institute of Technology
Karlsruhe Institute of Technology
The Karlsruhe Institute of Technology is a German academic research and education institution with university status resulting from a merger of the university and the research center of the city of Karlsruhe. The university, also known as Fridericiana, was founded in 1825...
for gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...
s and protein-associated information. Harvester currently works for human
Human
Humans are the only living species in the Homo genus...
, mouse
Mouse
A mouse is a small mammal belonging to the order of rodents. The best known mouse species is the common house mouse . It is also a popular pet. In some places, certain kinds of field mice are also common. This rodent is eaten by large birds such as hawks and eagles...
, rat
Rat
Rats are various medium-sized, long-tailed rodents of the superfamily Muroidea. "True rats" are members of the genus Rattus, the most important of which to humans are the black rat, Rattus rattus, and the brown rat, Rattus norvegicus...
, zebrafish, drosophila
Drosophila
Drosophila is a genus of small flies, belonging to the family Drosophilidae, whose members are often called "fruit flies" or more appropriately pomace flies, vinegar flies, or wine flies, a reference to the characteristic of many species to linger around overripe or rotting fruit...
and arabidopsis thaliana
Arabidopsis thaliana
Arabidopsis thaliana is a small flowering plant native to Europe, Asia, and northwestern Africa. A spring annual with a relatively short life cycle, arabidopsis is popular as a model organism in plant biology and genetics...
based information. Harvester cross-links >50 popular bioinformatic resources and allows cross searches. Harvester serves 10.000s of pages every day to scientists and physicians.
How Harvester works
Harvester collects information from proteinProtein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
and gene databases along with information from so called "prediction servers." Prediction server e.g. provide online sequence analysis for a single protein. Harvesters search index is based on the IPI
International Protein Index
The International Protein Index is database that was created to give the proteomics community a resource that enables* accession numbers from a variety of bioinformatics databases to be mapped* a complete set of proteins for a species i.e...
and UniProt
UniProt
UniProt is a comprehensive, high-quality and freely accessible database of protein sequence and functional information, many of which are derived from genome sequencing projects...
protein information collection. The collections consists of:
- ~72.000 human, ~57.000 mouse, ~41.000 rat, ~51.000 zebrafish, ~35.000 arabidopsis protein pages, which cross-link ~50 major bioinfiormatic resources.
]
Text based information
...from the following databases:- UniProtUniProtUniProt is a comprehensive, high-quality and freely accessible database of protein sequence and functional information, many of which are derived from genome sequencing projects...
, world largest protein database - SOURCESourceSource may refer to:-Research:* Source text, in research , a source of information referred to by citation** Primary source, firsthand written evidence of history made at the time of the event by someone who was present...
, convenient gene information overview - Simple Modular Architecture Research ToolSimple Modular Architecture Research ToolSimple Modular Architecture Research Tool is a classification scheme used in the identification and analysis of protein domains....
(SMART), - SOSUISOSUISOSUI is a free online tool that predicts a part of the secondary structure of proteins from a given amino acid sequence . The main objective is to determine whether the protein in question is a soluble or a transmembrane protein.-History:...
, predicts transmembrane domains - PSORTPSORTPSORT is a bioinformatics tool used for the prediction of protein localisation sites in cells. It receives the information of an amino acid sequence and its species of origin, e.g. Gram-negative bacteria as inputs. Then it analyses the input sequence by applying the stored rules for various...
, predicts protein localisation - HomoloGeneHomologeneHomoloGene, a tool of the National Center for Biotechnology Information , is a system for automated detection of homologs among the annotated genes of several completely sequenced eukaryotic genomes.The HomoloGene processing consists of the protein analysis from the input organisms...
, compares proteins from different species - gfp-cdnaGfp-cdnaThe GFP-cDNA project documents the localisation of proteins to subcellular compartments of the eukaryotic cell applying fluorescence microscopy. Experimental data are complemented with bioinformatic analyses and published online in a database. A search function allows the finding of proteins...
, protein localisation with fluorescence microscopy - International Protein IndexInternational Protein IndexThe International Protein Index is database that was created to give the proteomics community a resource that enables* accession numbers from a variety of bioinformatics databases to be mapped* a complete set of proteins for a species i.e...
(IPI).
Databases rich in graphical elements
...are not collected, but crosslinked via iframeIFrame
iFrame can be:* I-frames, in video compression; see video compression picture types* iFrame * The HTML iframe element....
s. Iframes are transparent windows within a HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....
pages. The iframe windows allows up-to-date viewing of the "iframed," linked databases. Several such iframes are combined on a Harvester protein page. This method allows convenient comparison of information from several databases.
- NCBI-BLASTBLASTIn bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences...
, an algorithm for comparing biological sequences from the NCBINational Center for Biotechnology InformationThe National Center for Biotechnology Information is part of the United States National Library of Medicine , a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper...
. - EnsemblEnsemblEnsembl is a joint scientific project between the European Bioinformatics Institute and the Wellcome Trust Sanger Institute, which was launched in 1999 in response to the imminent completion of the Human Genome Project...
, automatic gene annotation by the EMBL-EBIEuropean Bioinformatics InstituteThe European Bioinformatics Institute is a centre for research and services in bioinformatics, and is part of European Molecular Biology Laboratory...
and Sanger InstituteSanger InstituteThe Wellcome Trust Sanger Institute is a non-profit, British genomics and genetics research institute, primarily funded by the Wellcome Trust.... - FlyBaseFlyBaseFlyBase is an online bioinformatics database and the primary repository of genetic and molecular data for the insect family Drosophilidae. For the most extensively studied species and model organism, Drosophila melanogaster, a wide range of data are presented in different formats...
is a database of model organism Drosophila melanogasterDrosophila melanogasterDrosophila melanogaster is a species of Diptera, or the order of flies, in the family Drosophilidae. The species is known generally as the common fruit fly or vinegar fly. Starting from Charles W...
. - GoPubMedGoPubMedGoPubMed is a knowledge-based search engine for biomedical texts. TheGene Ontology and Medical Subject Headings serve as "Table of contents" in order to structure the millions of articles of the MEDLINE database. The search engine allows its users to find relevant search results significantly...
is a knowledge-based search engine for biomedical texts. - iHOPInformation Hyperlinked over ProteinsInformation Hyperlinked over Proteins is an online service that provides a gene-guided network to access PubMed abstracts. By using genes and proteins as hyperlinks between sentences and abstracts, the information in PubMed can be converted into one navigable resource.Navigating across...
, information hyperlinked over proteins via gene/protein synonyms - Mendelian Inheritance in ManMendelian Inheritance in ManOnline Mendelian Inheritance in Man is a database that catalogues all the known diseases with a genetic component, and—when possible—links them to the relevant genes in the human genome and provides references for further research and tools for genomic analysis of a catalogued gene. OMIM is one...
project catalogues all the known diseases. - RZPD, German resources Center for genome research in Berlin/Heidelberg.
- STRINGSTRINGIn molecular biology, STRING is a database and web resource of known and predicted protein-protein interactions....
, Search Tool for the Retrieval of Interacting Genes/Proteins, developed by EMBL, SIBSwiss Institute of BioinformaticsThe Swiss Institute of Bioinformatics is an academic not-for-profit foundation which federates bioinformatics activities throughout Switzerland...
and UZHUniversity of ZurichThe University of Zurich , located in the city of Zurich, is the largest university in Switzerland, with over 25,000 students. It was founded in 1833 from the existing colleges of theology, law, medicine and a new faculty of philosophy....
. - Zebrafish Information NetworkZebrafish Information NetworkThe Zebrafish Information Network is an online biological database of information about the zebrafish . The zebrafish is a widely used model organism for genetic, genomic, and developmental studies, and ZFIN provides an integrated interface for querying and displaying the large volume of data...
. - LOCATE subcellular localization database (mouse).
"linkouts"
- Genome browserGenome browserA genome browser is a graphical interface for display of information from a biological database for genomic data. Genome browsers enable researchers to visualize and browse entire genomes with annotated data including gene prediction and structure, proteins, expression, regulation, variation,...
, working draft assemblies for genomes UCSC - Google ScholarGoogle ScholarGoogle Scholar is a freely accessible web search engine that indexes the full text of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes most peer-reviewed online journals of Europe and America's largest...
- MitocheckMitocheckMitoCheck was an integrated research project which brought together several European research groups to study systematically the regulation of mitosis in human cells....
- PolyMeta, meta search engine for Google, Yahoo, MSN, Ask, Exalead, AllTheWeb, GigaBlast
What one can find
Harvester allows a combination of different search terms and single words.Search Examples:
- Gene-name: "golga3"
- Gene-alias: "ADAP-S ADAS ADHAPS ADPS" (one gene name is sufficient)
- Gene-Ontologies: "Enzyme linked receptor protein signaling pathway"
- UnigeneUniGeneUniGene is an NCBI database of the transcriptome and thus, despite the name, not primarily a database for genes. Each entry is a set of transcripts that appear to stem from the same transcription locus...
-Cluster: "Hs.449360"
- Go-annotation: "intra-Golgi transport"
- Molecular function: "protein kinase binding"
- Protein: "Q9NPD3"
- Protein domain: "SH2 sar"
- Protein Localisation: "endoplasmic reticulum"
- Chromosome: "2q31"
- Disease relevant: use the word "diseaselink"
- Combinations: "golgi diseaselink" (finds all golgi proteins associated with a disease)
- mRNA: "AL136897"
- Word: "Cancer"
- Comment: "highly expressed in heart"
- Author: "Merkel, Schmidt"
- Publication or project: "cDNA sequencing project"
See also
- Biological databaseBiological databaseBiological databases are libraries of life sciences information, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analyses. They contain information from research areas including genomics, proteomics, metabolomics, microarray...
s - EntrezEntrezThe Entrez Global Query Cross-Database Search System is a powerful federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information website...
- European Bioinformatics InstituteEuropean Bioinformatics InstituteThe European Bioinformatics Institute is a centre for research and services in bioinformatics, and is part of European Molecular Biology Laboratory...
- Human Protein Reference DatabaseHPRDThe Human Protein Reference Database is a protein database accessible through the internet.The HPRD is a result of an international collaborative effort between the in Bangalore, India and the at Johns Hopkins University in Baltimore, USA. HPRD contains manually curated scientific information...
- MetadataMetadataThe term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
- Sequence profiling toolSequence profiling toolA sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to...
External links
- http://harvester.kit.edu Bioinformatic Harvester V at KIT Karlsruhe Institute of TechnologyKarlsruhe Institute of TechnologyThe Karlsruhe Institute of Technology is a German academic research and education institution with university status resulting from a merger of the university and the research center of the city of Karlsruhe. The university, also known as Fridericiana, was founded in 1825...
- Harvester42 at KIT - integrating 50 general search engines