FlyBase
Encyclopedia
FlyBase is an online bioinformatics
database
and the primary repository of genetic and molecular data for the insect family Drosophilidae
. For the most extensively studied species and model organism
, Drosophila melanogaster
, a wide range of data are presented in different formats. Information in FlyBase originates from a variety of sources ranging from large-scale genome projects to the primary research literature. These data types include mutant phenotypes,molecular characterization of mutant alleles and other deviations, cytological maps, wild-type expression patterns, anatomical images, transgenic constructs and insertions, sequence-level gene models and molecular classification of gene product functions. Query tools allow navigation of FlyBase through DNA or protein sequence, by gene or mutant name, or through terms from the several ontologies used to capture functional, phenotypic, and anatomical data. The database offers several different query tools in order to provide efficient access to the data available and facilitate the discovery of significant relationships within the database. Links between FlyBase and external databases, such as BDBG or modENCODE, provide opportunity for further exploration into other model organism databases and other resources of biological and molecular information. The FlyBase project is carried out by a consortium of Drosophila researchers and computer scientists at Harvard University
and Indiana University
in the United States, and University of Cambridge
in the United Kingdom.
FlyBase is one of the organizations contributing to the Generic Model Organism Database
(GMOD).
. FlyBase also receives support from the Medical Research Council, London. In 1998, the FlyBase consortium integrated the information into a single Drosophila genomics server.
genome that is updated several times per year. It also includes a searchable bibliography of research on Drosophila genetics in the last century. Information on current researchers, and a partial pedigree of relationships between current researchers, is searchable, based on registration of the participating scientist. The site also provides a large database of images illustrating the full genome, and several movies detailing embryogenesis
.
Search Strategies - Gene reports for genes from all twelve sequenced Drosophila genomes are available in FlyBase. There are four main ways this data can be browsed: Precomputed Files, BLAST, Gbrowse, and Gene Report Pages. Gbrowse and precomputed files are for genome wide analysis, bioinformatics, and comparative genomics. BLAST and gene report pages are for a specific gene, protein, or region across the species.
When looking for cytology there are two main tools available. Use Cytosearch when looking for cytologically-mapped genes or deficiencies, that haven’t been molecularly mapped to the sequence. Use Gbrowse when looking for molecularly mapped sequences, insertions, or Affymetrix probes.
There are two main query tools in FlyBase. The first main query tool is called Jump to Gene (J2G). This is found in the top right of the blue navigation bar on every page of FlyBase. This tool is useful when you know exactly what you are looking for and want to go to the report page with that data. The second main query tool is called QuickSearch. This is located on the FlyBase homepage. This tool is most useful when you want to look up something quickly that you may only know a little about. Searching can be performed within D. melanogaster only or within all species. Data other than genes can be searched using the ‘data class’ menu.
FlyBase has a very useful Site Map to help navigate through the content of the website.
1. The first is a study of expressed genes from alate Toxoptera citricida
, more commonly known as the brown citrus aphid. The brown citrus aphid, is considered the primary vector of citrus tristeza virus
, a severe pathogen which causes losses to citrus industries worldwide. The winged form of this aphid can fly long distances with the wind, enabling them to spread the citrus tristeza virus in citrus growing regions. To better understand the biology of the brown citrus aphid and the emergence of genes expressed during wing development, researchers undertook a large-scale 5′ end sequencing project of cDNA clones from winged aphids. Similar large-scale expressed sequence tag (EST) sequencing projects from other insects have provided a vehicle for answering biological questions relating to development and physiology. Although there is a growing database in GenBank of ESTs from insects, most are from Drosophila melanogaster, with relatively few specifically derived from aphids.The researchers were able to provide a large data set of ESTs from the alate (winged) brown citrus aphid and have begun to analyze this valuable resource. They were able to do this with the help of information on Drosophila melanogaster in FlyBase. Putative sequence identity was determined using BLAST searches. Sequence matches with E-value scores ≤ −10 were considered significant and were categorized according to the Gene Ontology (GO) classification system based on annotation of the 5 ‘best hit’ matches in BLASTX searches. All D. melanogaster matches were cataloged using FlyBase. Nearly all of these ‘best hit’ matches were characterized with respect to the functionally annotated genes in D. melanogaster using FlyBase. Genetic information is crucial to advancing the understanding of aphid biology, and will play a major role in the development of future non-chemical, gene-based control strategies against these insect pests.
2.Enhancing Drosophila Gene Ontology Annotation: What gene products do and where they do it are important questions for biologists. The Gene Ontology project was established 13 years ago in order to summarize this data consistently across different databases by using a common set of defined vocabulary terms. They also encode relationships between terms. The Gene Ontology Project is a major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases. The project also provides gene product annotation data from GO consortium members. />.) This is where FlyBase comes in. FlyBase was one of the three founding members of the Gene Ontology Consortium. GO annotation comprises at least three components: a GO term that describes molecular function, biological role or subcellular location; an ‘evidence code’ that describes the type of analysis used to support the GO term; and an attribution to a specific reference. GO annotation is useful for both small-scale and large-scale analyses. It can provide a first indication of the nature of a gene product and, in conjunction with evidence codes, point directly to papers with pertinent experimental data. The current priorities for annotation are: homologs of human disease genes, genes that are highly conserved across species, genes involved in biochemical/signaling pathways, and topical genes shown to be of significant interest in recent publications. FlyBase has been contributing GO annotations to the project since it started in August 2006. GO annotations appear on the Gene Report page in FlyBase. GO data are searchable in FlyBase using both TermLink and QueryBuilder. The GO is dynamic and can change on a daily basis, for example the addition of new terms. To keep up, FlyBase loads a new version of the GO every one or two releases of FlyBase. The GO annotation set is submitted to the GOC at the same time as a new version of FlyBase is released.
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
and the primary repository of genetic and molecular data for the insect family Drosophilidae
Drosophilidae
Drosophilidae is a diverse, cosmopolitan family of flies, which includes fruit flies. Another family of flies called Tephritidae also includes fruit flies. The best known species of Drosophilidae is Drosophila melanogaster, within the genus Drosophila, and this species Is used extensively for...
. For the most extensively studied species and model organism
Model organism
A model organism is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the organism model will provide insight into the workings of other organisms. Model organisms are in vivo models and are widely used to...
, Drosophila melanogaster
Drosophila melanogaster
Drosophila melanogaster is a species of Diptera, or the order of flies, in the family Drosophilidae. The species is known generally as the common fruit fly or vinegar fly. Starting from Charles W...
, a wide range of data are presented in different formats. Information in FlyBase originates from a variety of sources ranging from large-scale genome projects to the primary research literature. These data types include mutant phenotypes,molecular characterization of mutant alleles and other deviations, cytological maps, wild-type expression patterns, anatomical images, transgenic constructs and insertions, sequence-level gene models and molecular classification of gene product functions. Query tools allow navigation of FlyBase through DNA or protein sequence, by gene or mutant name, or through terms from the several ontologies used to capture functional, phenotypic, and anatomical data. The database offers several different query tools in order to provide efficient access to the data available and facilitate the discovery of significant relationships within the database. Links between FlyBase and external databases, such as BDBG or modENCODE, provide opportunity for further exploration into other model organism databases and other resources of biological and molecular information. The FlyBase project is carried out by a consortium of Drosophila researchers and computer scientists at Harvard University
Harvard University
Harvard University is a private Ivy League university located in Cambridge, Massachusetts, United States, established in 1636 by the Massachusetts legislature. Harvard is the oldest institution of higher learning in the United States and the first corporation chartered in the country...
and Indiana University
Indiana University
Indiana University is a multi-campus public university system in the state of Indiana, United States. Indiana University has a combined student body of more than 100,000 students, including approximately 42,000 students enrolled at the Indiana University Bloomington campus and approximately 37,000...
in the United States, and University of Cambridge
University of Cambridge
The University of Cambridge is a public research university located in Cambridge, United Kingdom. It is the second-oldest university in both the United Kingdom and the English-speaking world , and the seventh-oldest globally...
in the United Kingdom.
FlyBase is one of the organizations contributing to the Generic Model Organism Database
Generic Model Organism Database
The Generic Model Organism Database Project began as an effort to create reusable software tools for developing Model Organism Databases . MODs describe genome and other information about important experimental organisms in the life sciences...
(GMOD).
Background
Drosophila melanogaster has been an experimental organism since the early 1900s, and has since been placed at the forefront of many areas of research. In 1992, data on the genetics and genomics of D.melanogaster and related species were electronically available over the Internet through the funded FlyBase, BDGP (Berkeley Drosophila Genome Project) and EDGP (European Drosophila Genome Project) informatics groups. These groups recognized that most genome project and community data types overlapped. They decided it would be of great value to present the scientific community with an integrated view of the data. In October 1992, the National Center for Human Genome Research of the NIH funded the FlyBase project with the objective of designing, building and releasing a database of genetic and molecular information concerning Drosophila melanogasterDrosophila melanogaster
Drosophila melanogaster is a species of Diptera, or the order of flies, in the family Drosophilidae. The species is known generally as the common fruit fly or vinegar fly. Starting from Charles W...
. FlyBase also receives support from the Medical Research Council, London. In 1998, the FlyBase consortium integrated the information into a single Drosophila genomics server.
Contents
FlyBase contains a complete annotation of the Drosophila melanogasterDrosophila melanogaster
Drosophila melanogaster is a species of Diptera, or the order of flies, in the family Drosophilidae. The species is known generally as the common fruit fly or vinegar fly. Starting from Charles W...
genome that is updated several times per year. It also includes a searchable bibliography of research on Drosophila genetics in the last century. Information on current researchers, and a partial pedigree of relationships between current researchers, is searchable, based on registration of the participating scientist. The site also provides a large database of images illustrating the full genome, and several movies detailing embryogenesis
Embryogenesis
Embryogenesis is the process by which the embryo is formed and develops, until it develops into a fetus.Embryogenesis starts with the fertilization of the ovum by sperm. The fertilized ovum is referred to as a zygote...
.
Search Strategies - Gene reports for genes from all twelve sequenced Drosophila genomes are available in FlyBase. There are four main ways this data can be browsed: Precomputed Files, BLAST, Gbrowse, and Gene Report Pages. Gbrowse and precomputed files are for genome wide analysis, bioinformatics, and comparative genomics. BLAST and gene report pages are for a specific gene, protein, or region across the species.
When looking for cytology there are two main tools available. Use Cytosearch when looking for cytologically-mapped genes or deficiencies, that haven’t been molecularly mapped to the sequence. Use Gbrowse when looking for molecularly mapped sequences, insertions, or Affymetrix probes.
There are two main query tools in FlyBase. The first main query tool is called Jump to Gene (J2G). This is found in the top right of the blue navigation bar on every page of FlyBase. This tool is useful when you know exactly what you are looking for and want to go to the report page with that data. The second main query tool is called QuickSearch. This is located on the FlyBase homepage. This tool is most useful when you want to look up something quickly that you may only know a little about. Searching can be performed within D. melanogaster only or within all species. Data other than genes can be searched using the ‘data class’ menu.
FlyBase has a very useful Site Map to help navigate through the content of the website.
Related Research
The following is only two of many examples of research that is related to or uses FlyBase:1. The first is a study of expressed genes from alate Toxoptera citricida
Toxoptera citricida
Toxoptera citricida is a species of aphid known by the common names brown citrus aphid, black citrus aphid, and oriental citrus aphid. It is a pest of citrus and vector for the pathogenic plant virus citrus tristeza virus...
, more commonly known as the brown citrus aphid. The brown citrus aphid, is considered the primary vector of citrus tristeza virus
Citrus tristeza virus
Citrus tristeza virus is a viral species of the Closterovirus genus that causes the most economically damaging disease to its namesake plant genus, Citrus. The disease has led to the death of millions of Citrus trees all over the world and has rendered other millions useless for production...
, a severe pathogen which causes losses to citrus industries worldwide. The winged form of this aphid can fly long distances with the wind, enabling them to spread the citrus tristeza virus in citrus growing regions. To better understand the biology of the brown citrus aphid and the emergence of genes expressed during wing development, researchers undertook a large-scale 5′ end sequencing project of cDNA clones from winged aphids. Similar large-scale expressed sequence tag (EST) sequencing projects from other insects have provided a vehicle for answering biological questions relating to development and physiology. Although there is a growing database in GenBank of ESTs from insects, most are from Drosophila melanogaster, with relatively few specifically derived from aphids.The researchers were able to provide a large data set of ESTs from the alate (winged) brown citrus aphid and have begun to analyze this valuable resource. They were able to do this with the help of information on Drosophila melanogaster in FlyBase. Putative sequence identity was determined using BLAST searches. Sequence matches with E-value scores ≤ −10 were considered significant and were categorized according to the Gene Ontology (GO) classification system based on annotation of the 5 ‘best hit’ matches in BLASTX searches. All D. melanogaster matches were cataloged using FlyBase. Nearly all of these ‘best hit’ matches were characterized with respect to the functionally annotated genes in D. melanogaster using FlyBase. Genetic information is crucial to advancing the understanding of aphid biology, and will play a major role in the development of future non-chemical, gene-based control strategies against these insect pests.
2.Enhancing Drosophila Gene Ontology Annotation: What gene products do and where they do it are important questions for biologists. The Gene Ontology project was established 13 years ago in order to summarize this data consistently across different databases by using a common set of defined vocabulary terms. They also encode relationships between terms. The Gene Ontology Project is a major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases. The project also provides gene product annotation data from GO consortium members. />.) This is where FlyBase comes in. FlyBase was one of the three founding members of the Gene Ontology Consortium. GO annotation comprises at least three components: a GO term that describes molecular function, biological role or subcellular location; an ‘evidence code’ that describes the type of analysis used to support the GO term; and an attribution to a specific reference. GO annotation is useful for both small-scale and large-scale analyses. It can provide a first indication of the nature of a gene product and, in conjunction with evidence codes, point directly to papers with pertinent experimental data. The current priorities for annotation are: homologs of human disease genes, genes that are highly conserved across species, genes involved in biochemical/signaling pathways, and topical genes shown to be of significant interest in recent publications. FlyBase has been contributing GO annotations to the project since it started in August 2006. GO annotations appear on the Gene Report page in FlyBase. GO data are searchable in FlyBase using both TermLink and QueryBuilder. The GO is dynamic and can change on a daily basis, for example the addition of new terms. To keep up, FlyBase loads a new version of the GO every one or two releases of FlyBase. The GO annotation set is submitted to the GOC at the same time as a new version of FlyBase is released.
External links
- Flybase A Database of Drosophila Genes & Genomes