Genenetwork
Encyclopedia
GeneNetwork is a database and open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

 bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

 software resource for systems genetics. This resource is used to study gene regulatory network
Gene regulatory network
A gene regulatory network or genetic regulatory network is a collection of DNA segments in a cell whichinteract with each other indirectly and with other substances in the cell, thereby governing the rates at which genes in the network are transcribed into mRNA.In general, each mRNA molecule goes...

s that link DNA sequence variants to corresponding differences in gene and protein expression and to differences in traits such as health and disease risk. Data sets in GeneNetwork are typically made up of large collections of genotypes (e.g., SNPs) and phenotypes that are obtained from groups of related individuals, including human families, experimental crosses of strains of mice and rats, and organisms as diverse as Drosophila melanogaster
Drosophila melanogaster
Drosophila melanogaster is a species of Diptera, or the order of flies, in the family Drosophilidae. The species is known generally as the common fruit fly or vinegar fly. Starting from Charles W...

, Arabidopsis thaliana
Arabidopsis thaliana
Arabidopsis thaliana is a small flowering plant native to Europe, Asia, and northwestern Africa. A spring annual with a relatively short life cycle, arabidopsis is popular as a model organism in plant biology and genetics...

, and barley
Barley
Barley is a major cereal grain, a member of the grass family. It serves as a major animal fodder, as a base malt for beer and certain distilled beverages, and as a component of various health foods...

. The inclusion of genotypes for all individuals makes it practical to carry out web-based gene mapping
Gene mapping
Gene mapping, also called genome mapping, is the creation of a genetic map assigning DNA fragments to chromosomes.When a genome is first investigated, this map is nonexistent. The map improves with the scientific progress and is perfect when the genomic DNA sequencing of the species has been...

 to discover those regions of the genome that contribute to differences in gene expression, cell function, anatomy, physiology, and behavior among individuals.

History

GeneNetwork was created at the University of Tennessee Health Science Center, Memphis USA in 2000-2001. It was initially developed as a web-adapted version of Kenneth F. Manly's Map Manager QT and QTX programs and was called WebQTL. Gene mapping data were incorporated for several mouse recombinant inbred strain
Recombinant inbred strain
A recombinant inbred strain is an organism with chromosomes that incorporate an essentially permanent set of recombination events between chromosomes inherited from two or more inbred strains...

s. By early 2003, the first large Affymetrix
Affymetrix
Affymetrix is a company that manufactures DNA microarrays; it is based in Santa Clara, California, United States. The company was founded by Dr. Stephen Fodor in 1992. It began as a unit in Affymax N.V...

 gene expression data sets (whole mouse brain mRNA and hematopoietic stem cells) were incorporated and the system was renamed. GeneNetwork is now developed by an international group of developers and has mirror and development sites in Europe, Asia, and Australia. The production service is hosted on the Amazon Elastic Compute Cloud
Amazon Elastic Compute Cloud
Amazon Elastic Compute Cloud is a central part of Amazon.com's cloud computing platform, Amazon Web Services . EC2 allows users to rent virtual computers on which to run their own computer applications...

.

Organization and Use

GeneNetwork consists of two major components:
  • Massive collections of genetic, genomic, and phenotype data for large families
  • Sophisticated statistical analysis and gene mapping software that enable analysis of regulatory networks and genotype-to-phenotype relations


Four levels of data are usually obtained for each family or population:
  1. DNA sequences and genotype
    Genotype
    The genotype is the genetic makeup of a cell, an organism, or an individual usually with reference to a specific character under consideration...

    s
  2. Gene expression
    Gene expression
    Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as ribosomal RNA , transfer RNA or small nuclear RNA genes, the product is a functional RNA...

     values using microarray, RNA-seq
    RNA-Seq
    RNA-seq, also called "Whole Transcriptome Shotgun Sequencing" and dubbed "a revolutionary tool for transcriptomics", refers to the use of high-throughput sequencing technologies to sequence cDNA in order to get information about a sample's RNA content, a technique that is quickly becoming...

    , or proteomic methods (molecular phenotypes)
  3. Standard phenotype
    Phenotype
    A phenotype is an organism's observable characteristics or traits: such as its morphology, development, biochemical or physiological properties, behavior, and products of behavior...

    s of the type that are part of a typical medical record (e.g., blood chemistry, body weight)
  4. Annotation files and metadata
    Metadata
    The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...



The combined data types are housed together in a single relational database, but are conceptually organized and divided by species and family. The system is implemented as a LAMP (software bundle)
LAMP (software bundle)
LAMP is an acronym for a solution stack of free, open source software, referring to the first letters of Linux , Apache HTTP Server, MySQL and PHP , principal components to build a viable general purpose web server.The GNU project is advocating people to use the term "GLAMP" since what is known as...

 stack. Code and a simplified version of the MySQL
MySQL
MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...

 database are available at Sourceforge.net/projects/genenetwork/.

GeneNetwork is primarily used by researchers but has also been adopted successfully for undergraduate courses in genetics (see YouTube example), bioinformatics, physiology, and psychology. Researchers and students typically retrieve sets of genotypes and phenotypes from one or more families and use built-in statistical and mapping functions to explore relations among variables and to assemble networks of associations. Key steps include the analysis of these factors:
  1. The range of variation of traits
  2. Covariation among traits (scatterplots and correlations)
  3. Architecture of larger networks of traits
  4. Quantitative trait locus
    Quantitative trait locus
    Quantitative traits refer to phenotypes that vary in degree and can be attributed to polygenic effects, i.e., product of two or more genes, and their environment. Quantitative trait loci are stretches of DNA containing or linked to the genes that underlie a quantitative trait...

     mapping and causal models of the linkage between sequence differences and phenotype differences

Data Sources

Massive expression data sets are submitted by researchers directly or are extracted from repositories such as National Center for Biotechnology Information
National Center for Biotechnology Information
The National Center for Biotechnology Information is part of the United States National Library of Medicine , a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper...

 Gene Expression Omnibus. A wide variety of cells and tissues are included—from single cell populations of the immune system, specific tissues (retina, prefrontal cortex), to entire systems (whole brain, lung, muscle, heart, fat, kidney, flower, even whole plant embryos). A typical data set is often based on hundreds of fully genotyped individuals and may also include biological replicates. Genotypes and phenotypes are taken from peer-reviewed papers. GeneNetwork includes annotation files for several RNA profiling platforms (Affymetrix, Illumina, and Agilent). RNA-seq data are also available for BXD recombinant inbred mice. Content and nomenclature are reviewed and edited by curators. Updates on coverage of species, families, tissues and measurement types are available at this site: http://www.genenetwork.org/whats_new.html.

Topics of annotation include the following:
  • DNA sequence
    DNA sequence
    The sequence or primary structure of a nucleic acid is the composition of atoms that make up the nucleic acid and the chemical bonds that bond those atoms. Because nucleic acids, such as DNA and RNA, are unbranched polymers, this specification is equivalent to specifying the sequence of...

     (SNPs, CNVs, indels)
  • transcriptome
    Transcriptome
    The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non-coding RNA produced in one or a population of cells.-Scope:...

    s (arrays, RNA-seq)
  • gene regulatory network
    Gene regulatory network
    A gene regulatory network or genetic regulatory network is a collection of DNA segments in a cell whichinteract with each other indirectly and with other substances in the cell, thereby governing the rates at which genes in the network are transcribed into mRNA.In general, each mRNA molecule goes...

    s
  • phenome
    Phenome
    A phenome is the set of all phenotypes expressed by a cell, tissue, organ, organism, or species.Just as the genome and proteome signify all of an organism's genes and proteins, the phenome represents the sum total of its phenotypic traits. Examples of human phenotypic traits are skin color, eye...


Tools and Features

There are tools on the site for a wide range of functions that range from simple graphical displays of variation in gene expression or other phenotypes, scatter plots of pairs of traits (Pearson or rank order), construction of both simple and complex network graphs, analysis of principal components and synthetic traits, QTL mapping using marker regression, interval mapping, and pair scans for epistatic interactions. Most functions work with up to 100 traits and several functions work with an entire transcriptome
Transcriptome
The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non-coding RNA produced in one or a population of cells.-Scope:...

.

The database can be browsed and searched at the main search page. An on-line tutorial is available. Users can also download the primary data sets as text files, Excel, or in the case of network graphs, as SBML
SBML
The Systems Biology Markup Language is a representation format, based on XML, for communicating and storing computational models of biological processes. It is a free and open standard with widespread software support and a community of users and developers...

.

Code

GeneNetwork is an open source project released under the Affero General Public License
Affero General Public License
The Affero General Public License, often abbreviated as Affero GPL and AGPL , refers to two distinct, though historically related, free software licenses:...

 (AGPLv3). The majority of code is written in Python, but includes modules and other code written in C and JavaScript. GeneNetwork also calls statistical procedures written in the R programming language
R (programming language)
R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....

. The source code and a compact database are available on GeneNetwork sites and at SourceForge.

See also

  • Computational genomics
    Computational genomics
    Computational genomics refers to the use of computational analysis to decipher biology from genome sequences and related data , including both DNA and RNA sequence as well as other "post-genomic" data...

  • KEGG (The Kyoto Encyclopedia of Genes and Genomes)
  • WikiPathways
    WikiPathways
    WikiPathways is a community resource for biological pathways.-What is WikiPathways:WikiPathways was established to facilitate the contribution and maintenance of pathway information by the biology community. WikiPathways represents a new model for pathway databases that enhances complementary...

  • Reactome
    Reactome
    Reactome is a database of biological pathways. There are several Reactomes that concentrate on a specific organism, the largest of these is focused on human biology, but includes pathway steps inferred to exist in humans based on experimental data from model organisms and pathways computationally...

  • Cytoscape
    Cytoscape
    Cytoscape is an open source bioinformatics software platform for visualizing molecular interaction networks and integrating with gene expression profiles and other state data. Additional features are available as plugins...

  • GeneNetwork, Netherlands

External links


Related resources
Other systems genetics and network databases
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK