Biological database
Encyclopedia
Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analyses. They contain information from research areas including genomics
, proteomics
, metabolomics
, microarray
gene expression, and phylogenetics
. Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures.
Relational database
concepts of computer science
and Information retrieval
concepts of digital libraries
are important for understanding biological databases. Biological database design, development, and long-term management is a core area of the discipline of bioinformatics
. Data contents include gene sequences, textual descriptions, attributes and ontology classifications, citations, and tabular data. These are often described as semi-structured data, and can be represented as tables, key delimited records, and XML structures. Cross-references among databases are common, using database accession
numbers.
s and their interaction, to the whole metabolism
of organisms and to understanding the evolution
of species
. This knowledge helps facilitate the fight against diseases, assists in the development of medication
s and in discovering basic relationships amongst species in the history of life.
Biological knowledge is distributed amongst many different general and specialized databases. This sometimes makes it difficult to ensure the consistency of information. Biological databases cross-reference other databases with accession numbers
as one way of linking their related knowledge together.
An important resource for finding biological databases is a special yearly issue of the journal Nucleic Acids Research
(NAR). The Database Issue of NAR is freely available, and categorizes many of the publicly available online databases related to biology
and bioinformatics
.
and Caenorhabditis briggsae
.
Genomics
Genomics is a discipline in genetics concerning the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis,...
, proteomics
Proteomics
Proteomics is the large-scale study of proteins, particularly their structures and functions. Proteins are vital parts of living organisms, as they are the main components of the physiological metabolic pathways of cells. The term "proteomics" was first coined in 1997 to make an analogy with...
, metabolomics
Metabolomics
Metabolomics is the scientific study of chemical processes involving metabolites. Specifically, metabolomics is the "systematic study of the unique chemical fingerprints that specific cellular processes leave behind", the study of their small-molecule metabolite profiles...
, microarray
Microarray
A microarray is a multiplex lab-on-a-chip. It is a 2D array on a solid substrate that assays large amounts of biological material using high-throughput screening methods.Types of microarrays include:...
gene expression, and phylogenetics
Phylogenetics
In biology, phylogenetics is the study of evolutionary relatedness among groups of organisms , which is discovered through molecular sequencing data and morphological data matrices...
. Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures.
Relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
concepts of computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
and Information retrieval
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...
concepts of digital libraries
Digital library
A digital library is a library in which collections are stored in digital formats and accessible by computers. The digital content may be stored locally, or accessed remotely via computer networks...
are important for understanding biological databases. Biological database design, development, and long-term management is a core area of the discipline of bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
. Data contents include gene sequences, textual descriptions, attributes and ontology classifications, citations, and tabular data. These are often described as semi-structured data, and can be represented as tables, key delimited records, and XML structures. Cross-references among databases are common, using database accession
Accession number (bioinformatics)
An accession number in bioinformatics is a unique identifier given to a DNA or protein sequence record to allow for tracking of different versions of that sequence record and the associated sequence over time in a single data repository...
numbers.
Overview
Biological databases are an important tool in assisting scientists to understand and explain a host of biological phenomena from the structure of biomoleculeBiomolecule
A biomolecule is any molecule that is produced by a living organism, including large polymeric molecules such as proteins, polysaccharides, lipids, and nucleic acids as well as small molecules such as primary metabolites, secondary metabolites, and natural products...
s and their interaction, to the whole metabolism
Metabolism
Metabolism is the set of chemical reactions that happen in the cells of living organisms to sustain life. These processes allow organisms to grow and reproduce, maintain their structures, and respond to their environments. Metabolism is usually divided into two categories...
of organisms and to understanding the evolution
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...
of species
Species
In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring. While in many cases this definition is adequate, more precise or differing measures are...
. This knowledge helps facilitate the fight against diseases, assists in the development of medication
Medication
A pharmaceutical drug, also referred to as medicine, medication or medicament, can be loosely defined as any chemical substance intended for use in the medical diagnosis, cure, treatment, or prevention of disease.- Classification :...
s and in discovering basic relationships amongst species in the history of life.
Biological knowledge is distributed amongst many different general and specialized databases. This sometimes makes it difficult to ensure the consistency of information. Biological databases cross-reference other databases with accession numbers
Accession number (bioinformatics)
An accession number in bioinformatics is a unique identifier given to a DNA or protein sequence record to allow for tracking of different versions of that sequence record and the associated sequence over time in a single data repository...
as one way of linking their related knowledge together.
An important resource for finding biological databases is a special yearly issue of the journal Nucleic Acids Research
Nucleic Acids Research
Nucleic Acids Research is a peer-reviewed scientific journal published by Oxford University Press. It covers research on nucleic acids, such as DNA and RNA, and related work. Some of its content is available under an open access license. According to the Journal Citation Reports, the journal's 2010...
(NAR). The Database Issue of NAR is freely available, and categorizes many of the publicly available online databases related to biology
Biology
Biology is a natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy. Biology is a vast subject containing many subdivisions, topics, and disciplines...
and bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
.
Output
Biological data comes in many formats. These formats include text, sequence data, protein structure and links. Each of these can be found from certain sources, for example:- Text formats are provided by PubMedPubMedPubMed is a free database accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine at the National Institutes of Health maintains the database as part of the Entrez information retrieval system...
and OMIM. - Sequence data are provide by GenBankGenBankThe GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced and maintained by the National Center for Biotechnology Information as part of the International Nucleotide Sequence...
, in terms of DNA, and UniProtUniProtUniProt is a comprehensive, high-quality and freely accessible database of protein sequence and functional information, many of which are derived from genome sequencing projects...
, in terms of protein. - Protein structures are provided by PDBProtein Data BankThe Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids....
, SCOPSCOPSCOP may refer to:* Structural Classification of Proteins* Suprachiasmatic nucleus circadian oscillatory protein, a member of the leucine-rich repeat protein family* Société coopérative, a type of corporation in France...
, and CATHCATHThe CATH Protein Structure Classification is a semi-automatic, hierarchical classification of protein domains published in 1997 by Christine Orengo, Janet Thornton and their colleagues....
.
Problems associated with protein databases
Since discovery in the area of protein structure has not evolved quite as quickly as discoveries in the area sequence data, due to the 3D nature of protein structure, less information is available for it. Nonetheless, data can be accessed through members of the wwPDB (PDBe, PDBj and RCSB PDB, SCOP-Structural Classification of Proteins- at (http://scop.berkeley.edu/), and CATH at (http://www.cathdb.info/).Species-specific databases
Species-specific databases are available for some species, mainly those that are often used in research. For example, Colibase (http://colibase.bham.ac.uk/) is an E. coli database. Other popular species specific databases include, Flybase (http://flybase.bio.indiana.edu/) for Drosophila, and WormBase (http://www.wormbase.org/) for the nematodes Caenorhabditis elegansCaenorhabditis elegans
Caenorhabditis elegans is a free-living, transparent nematode , about 1 mm in length, which lives in temperate soil environments. Research into the molecular and developmental biology of C. elegans was begun in 1974 by Sydney Brenner and it has since been used extensively as a model...
and Caenorhabditis briggsae
Caenorhabditis briggsae
Caenorhabditis briggsae is a small nematode, closely related to Caenorhabditis elegans. The differences between the two species are subtle. The male tail in C. briggsae has a slightly different morphology than C. elegans. Other differences include changes in vulval precursor competence and the...
.
See also
- List of biological databases
- BiobankBiobankA biobank is a cryogenic storage facility used to archive biological samples for use in research and experiments. Ranging in size from individual refrigerators to warehouses, biobanks are maintained by institutions such as hospitals, universities, nonprofit organizations, and pharmaceutical...
- Gene bankGene bankGene banks help preserve genetic material, be it plant or animal. In plants, this could be by freezing cuts from the plant, or stocking the seeds. In animals, this is the freezing of sperm and eggs in zoological freezers until further need. With corals, fragments are taken which are stored in water...
- NCBINational Center for Biotechnology InformationThe National Center for Biotechnology Information is part of the United States National Library of Medicine , a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper...
- dbSNPDbSNPThe Single Nucleotide Polymorphism Database is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information in collaboration with the National Human Genome Research Institute...
- PubMedPubMedPubMed is a free database accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine at the National Institutes of Health maintains the database as part of the Entrez information retrieval system...
- InteractomeInteractomeInteractome is defined as the whole set of molecular interactions in cells. It is usually displayed as a directed graph. Molecular interactions can occur between molecules belonging to different biochemical families and also within a given family...
- Biological dataBiological dataBiological data are data or measurements collected from biological sources, which are often stored or exchanged in a digital form. Biological data are commonly stored in files or databases...
- MetaBaseMetaBaseMetaBase is a user-contributed database of biological databases, listing all the biological databases currently available on the internet. The initial release of MetaBase was derived entirely from the content of the Nucleic Acids Research 2007 Database Issue...
- QuertleQuertleQuertle is a semantic search engine for life and chemical science literature and information. It covers a wide variety of information sources.-How Quertle Works:...
- SnpstrSnpstrA SNPSTR is a compound genetic marker composed of one or more SNPs and one microsatellite . SNPSTRs were first described by MOUNTAIN et al. who developed experimental protocols for autosomal SNPSTRs which contain a SNP and a microsatellite within 500 base pairs of one another...
External links
- Wiki of biological databases
- Interactive list of biological databases, classified by categories, from Nucleic Acids ResearchNucleic Acids ResearchNucleic Acids Research is a peer-reviewed scientific journal published by Oxford University Press. It covers research on nucleic acids, such as DNA and RNA, and related work. Some of its content is available under an open access license. According to the Journal Citation Reports, the journal's 2010...
, 2010 - Genome Proteome Search Engine to search across biological databases
- DBD: Database of Biological Databases
- CAMERA Cyberinfrastructure for Metagenomics, free data repository and bioinformatics tools for metagenomics.