Biodiversity Informatics
Encyclopedia
Biodiversity Informatics is the application of informatics
Informatics (academic field)
Informatics is the science of information, the practice of information processing, and the engineering of information systems. Informatics studies the structure, algorithms, behavior, and interactions of natural and artificial systems that store, process, access and communicate information...

 techniques to biodiversity information
Biodiversity
Biodiversity is the degree of variation of life forms within a given ecosystem, biome, or an entire planet. Biodiversity is a measure of the health of ecosystems. Biodiversity is in part a function of climate. In terrestrial habitats, tropical regions are typically rich whereas polar regions...

 for improved management, presentation, discovery, exploration and analysis. It typically builds on a foundation of taxonomic
Biological classification
Biological classification, or scientific classification in biology, is a method to group and categorize organisms by biological type, such as genus or species. Biological classification is part of scientific taxonomy....

, biogeographic
Biogeography
Biogeography is the study of the distribution of species , organisms, and ecosystems in space and through geological time. Organisms and biological communities vary in a highly regular fashion along geographic gradients of latitude, elevation, isolation and habitat area...

, or ecological
Ecology
Ecology is the scientific study of the relations that living organisms have with respect to each other and their natural environment. Variables of interest to ecologists include the composition, distribution, amount , number, and changing states of organisms within and among ecosystems...

 information stored in digital form, which, with the application of modern computer techniques, can yield new ways to view and analyse existing information, as well as predictive models for information that does not yet exist (see niche modelling
Environmental niche modelling
Environmental niche modelling, alternatively known as species distribution modelling, niche modelling, and climate envelope modelling refers to the process of using computer algorithms to predict the distribution of species in geographic space on the basis of a mathematical representation of their...

). Biodiversity informatics is a relatively young discipline (the term was coined in or around 1992) but has hundreds of practitioners worldwide, including the numerous individuals involved with the design and construction of taxonomic database
Taxonomic database
A taxonomic database is a database created to hold information related to biological taxa - for example groups of organisms organized by species name or other taxonomic identifier - for efficient data management and information retrieval as required...

s. The term "Biodiversity Informatics" is generally used in the broad sense to apply to computerized handling of any biodiversity information; the somewhat broader term "bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

" is often used synonymously with the computerized handling of data in the specialized area of molecular biology
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...

.

Overview

Biodiversity informatics has been defined as "the creation, integration, analysis, and understanding of information regarding biological diversity", and "[the] field that brings information science and technologies to bear on the data and information generated by the study of organisms, their genes, and their interactions". Broadly speaking, it seeks to draw upon and integrate information held in various taxonomic database
Taxonomic database
A taxonomic database is a database created to hold information related to biological taxa - for example groups of organisms organized by species name or other taxonomic identifier - for efficient data management and information retrieval as required...

s and other digital sources to answer biodiversity questions at scales ranging from global to local. Such questions might range from "How many described species exist in the world?" (answer: still not known for certain, as all the relevant data are not currently compiled in any coherent manner) to "Predict the effects of a global temperature rise of X degrees C. on the geographic range of species Y", a question which involves not only biodiversity in the basic sense but related domains of ecology, geographic distributions of environmental parameters, global climate models, and more. In addition to handling formally named taxa, biodiversity informatics may also have to cope with managing information from unnamed taxa such as that produced by environmental sampling and sequencing of mixed-field samples. The term biodiversity informatics is also used to cover the computational problem
Computational problem
In theoretical computer science, a computational problem is a mathematical object representing a collection of questions that computers might want to solve. For example, the problem of factoring...

s specific to the names of biological entities, such as the development of algorithms to cope with variant representations of identifiers such as species names and authorities, and the multiple classification schemes within which these entities may reside according to the preferences of different workers in the field, as well as the syntax and semantics by which the content in taxonomic databases can be made machine queryable and interoperable for biodiversity informatics purposes.

History of the discipline of Biodiversity Informatics

Biodiversity Informatics can be considered to have commenced with the construction of the first computerized taxonomic database
Taxonomic database
A taxonomic database is a database created to hold information related to biological taxa - for example groups of organisms organized by species name or other taxonomic identifier - for efficient data management and information retrieval as required...

s in the early 1970s, and progressed through subsequent developing of distributed search tools towards the late 1990s including the Species Analyst from Kansas University, the North American Biodiversity Information Network NABIN, CONABIO in Mexico, and others, the establishment of the Global Biodiversity Information Facility
Global Biodiversity Information Facility
The Global Biodiversity Information Facility is an international organisation that focuses on making scientific data on biodiversity available via the Internet using web services. The data are provided by many institutions from around the world; GBIF's information architecture makes these data...

 in 2001, and the parallel development of a variety of niche modelling and other tools to operate on digitized biodiversity data from the mid 1980s onwards (e.g. see ). In September 2000, the U.S. journal Science
Science (journal)
Science is the academic journal of the American Association for the Advancement of Science and is one of the world's top scientific journals....

 devoted a special issue to "Bioinformatics for Biodiversity", the journal "Biodiversity Informatics" commenced publication in 2004, and several international conferences through the 2000s have brought together Biodiversity Informatics practitioners, most recently the London e-Biosphere conference in June 2009. A recent supplement to the journal BMC Bioinformatics
BMC Bioinformatics
BMC Bioinformatics is an online open access scientific journal that publishes original, peer-reviewed research in bioinformatics. The journal is part of a series of BMC journals published by the UK-based publisher BioMed Central....

 (Volume 10 Suppl 14) published in November 2009 also deals with Biodiversity Informatics.

History of the term "Biodiversity Informatics"

According to correspondence reproduced by Walter Berendsohn, the term "Biodiversity Informatics" was coined by John Whiting in 1992 to cover the activities of an entity known as the Canadian Biodiversity Informatics Consortium, a group involved with fusing basic biodiversity
Biodiversity
Biodiversity is the degree of variation of life forms within a given ecosystem, biome, or an entire planet. Biodiversity is a measure of the health of ecosystems. Biodiversity is in part a function of climate. In terrestrial habitats, tropical regions are typically rich whereas polar regions...

 information with environmental economics
Environmental economics
Environmental economics is a subfield of economics concerned with environmental issues. Quoting from the National Bureau of Economic Research Environmental Economics program:...

 and geospatial information in the form of GPS and GIS. Subsequently it appears to have lost any obligate connection with the GPS/GIS world and be associated with the computerized management of any aspects of biodiversity information (e.g. see ).

Global list of all species

One major issue for biodiversity informatics at a global scale is the present absence of a machine queryable (or even non-digital) master list of currently recognised species of the world, although this is an aim of the Catalogue of Life
Catalogue of Life
The Catalogue of Life, started in June 2001 by Species 2000 and Integrated Taxonomic Information System , is planned to become a comprehensive catalogue of all known species of organisms on Earth by the year 2011. 66 taxonomic databases with contributions from more than 3,000 specialists from...

 project which has been quoted as aiming to achieve this goal (for extant species only) by 2012; in its 2009 Annual Checklist edition a total of 1.16 million valid species names and 0.76 million synonyms were included, out of an estimated target 1.8 million extant described species. A similar effort for fossil taxa, the Paleobiology Database documents some 100,000+ names for fossil species, out of an unknown total number.

Problems with genus and species scientific names as unique and persistent identifiers

Application of the Linnaean system of binomial nomenclature for species
Species
In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring. While in many cases this definition is adequate, more precise or differing measures are...

, and uninomials for genera
Genus
In biology, a genus is a low-level taxonomic rank used in the biological classification of living and fossil organisms, which is an example of definition by genus and differentia...

 and higher ranks, has led to many advantages but also problems with homonyms (the same name being used for multiple taxa, either inadvertently or legitimately across multiple kingdoms), synonyms
Synonym (taxonomy)
In scientific nomenclature, a synonym is a scientific name that is or was used for a taxon of organisms that also goes by a different scientific name. For example, Linnaeus was the first to give a scientific name to the Norway spruce, which he called Pinus abies...

 (multiple names for the same taxon), as well as variant representations of the same name due to orthographic differences, minor spelling errors, variation in the manner of citation of author names and dates, and more. In addition, names can change through time on account of changing taxonomic opinions (for example, the correct generic placement of a species, or the elevation of a subspecies to species rank or vice versa), and also the circumscription of a taxon can change according to different authors' taxonomic concepts. One proposed solution to this problem is the usage of Life Science Identifiers (LSID
LSID
Life Science Identifiers are a way to name and locate pieces of information on the web. Essentially, an LSID is a unique identifier for some data, and the LSID protocol specifies a standard way to locate the data...

s) for machine-machine communication purposes, although there are both proponents and opponents of this approach.

Achieving a consensus classification of organisms

Organisms can be classified in a multitude of ways (see main page Biological classification
Biological classification
Biological classification, or scientific classification in biology, is a method to group and categorize organisms by biological type, such as genus or species. Biological classification is part of scientific taxonomy....

), which can create design problems for Biodiversity Informatics systems aimed at incorporating either a single or multiple classification to suit the needs of users, or to guide them towards a single "preferred" system. Whether a single consensus classification system can ever be achieved is probably an open question, however in an attempt to provide at least a degree of consensus, the Catalogue of Life
Catalogue of Life
The Catalogue of Life, started in June 2001 by Species 2000 and Integrated Taxonomic Information System , is planned to become a comprehensive catalogue of all known species of organisms on Earth by the year 2011. 66 taxonomic databases with contributions from more than 3,000 specialists from...

 project has recently released a document that attempts to list some of the issues in this area, and may lead to a more coherent classification that can be promoted via that project's future products at least.

Mobilizing primary biodiversity information

"Primary" biodiversity information can be considered the basic data on the occurrence and diversity of species (or indeed, any recognizable taxa), commonly in association with information regarding their distribution in either space, time, or both. Such information may be in the form of retained specimens and associated information, for example as assembled in the natural history collections of museum
Museum
A museum is an institution that cares for a collection of artifacts and other objects of scientific, artistic, cultural, or historical importance and makes them available for public viewing through exhibits that may be permanent or temporary. Most large museums are located in major cities...

s and herbaria
Herbarium
In botany, a herbarium – sometimes known by the Anglicized term herbar – is a collection of preserved plant specimens. These specimens may be whole plants or plant parts: these will usually be in a dried form, mounted on a sheet, but depending upon the material may also be kept in...

, or as observational records, for example either from formal faunal or floristic surveys undertaken by professional biologists and students, or as amateur and other planned or unplanned observations including those increasingly coming under the scope of citizen science
Citizen science
Citizen science is a term used for the systematic collection and analysis of data; development of technology; testing of natural phenomena; and the dissemination of these activities by researchers on a primarily avocational basis...

. Providing online, coherent digital access to this vast collection of disparate primary data is a core Biodiversity Informatics function that is at the heart of regional and global biodiversity data networks, examples of the latter including OBIS and GBIF.

As a secondary source of biodiversity data, relevant scientific literature
Scientific literature
Scientific literature comprises scientific publications that report original empirical and theoretical work in the natural and social sciences, and within a scientific field is often abbreviated as the literature. Academic publishing is the process of placing the results of one's research into the...

 can be parsed either by humans or (potentially) by specialized information retrieval algorithms to extract the relevant primary biodiversity information that is reported therein, sometimes in aggregated / summary form but frequently as primary observations in narrative or tabular form. Elements of such activity (such as extracting key taxonomic identifiers, keywording / index terms, etc.) have been practiced for many years at a higher level by selected academic databases and search engines. However, for the maximum Biodiversity Informatics value, the actual primary occurrence data should ideally be retrieved and then made available in a standardized form or forms; for example both the Plazi
Plazi
Plazi is a Swiss based international non-profit association supporting and promoting the development of persistent and openly accessible digital bio-taxonomic literature...

 and INOTAXA projects are transforming taxonomic literature into XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 formats that can then be read by client applications, the former using TaxonX-XML and the latter using the taXMLit format. The Biodiversity Heritage Library
Biodiversity Heritage Library
The Biodiversity Heritage Library is a project for the digitization of literature on biodiversity. It was founded in 2005 and was initially formed by ten United States and British libraries....

 is also making significant progress in its aim to digitize substantial portions of the out-of-copyright taxonomic literature, which is then subjected to OCR
Optical character recognition
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...

 (optical character recognition) so as to be amenable to further processing using Biodiversity Informatics tools.

Biodiversity Informatics standards and protocols

In common with other data-related disciplines, Biodiversity Informatics benefits from the adoption of appropriate standards and protocols in order to support machine-machine transmission and interoperability of information within its particular domain. Examples of relevant standards include the Darwin Core
Darwin Core
Darwin Core is a body of data standards which function as an extension of Dublin Core for biodiversity informatics applications, establishing a vocabulary of terms to facilitate the discovery, retrieval, and integration of information about organisms, their spatiotemporal occurrence, and...

 XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 schema
XML schema
An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself...

 for specimen- and observation-based biodiversity data developed from 1998 onwards, plus extensions of the same, Taxonomic Concept Transfer Schema, plus standards for Structured Descriptive Data and Access to Biological Collection Data (ABCD); while data retrieval and transfer protocols include DiGIR (now mostly superseded) and TAPIR (TDWG Access Protocol for Information Retrieval). Many of these standards and protocols are currently maintained, and their development overseen, by the Taxonomic Databases Working Group (TDWG).

Current Biodiversity Informatics activities

At the recent (2009), large scale e-Biosphere conference in the U.K., contributions (e.g. as posters) were grouped into the following themes, which is indicative of a broad range of current Biodiversity Informatics activities and how they might be categorized:
  • Application: Conservation / Agriculture / Fisheries / Industry / Forestry
  • Application: Invasive Alien Species
  • Application: Systematic and Evolutionary Biology
  • Application: Taxonomy and Identification Systems
  • New Tools, Services and Standards for Data Management and Access
    • New Modeling Tools
    • New Tools for Data Integration
    • New Approaches to Biodiversity Infrastructure
    • New Approaches to Species Identification
    • New Approaches to Mapping Biodiversity
  • National and Regional Biodiversity Databases and Networks


A post-conference workshop of key persons with current significant Biodiversity Informatics roles also resulted in a Workshop Resolution that stressed, among other aspects, the need to create durable, global registries for the resources that are basic to biodiversity informatics (e.g., repositories, collections); complete the construction of a solid taxonomic infrastructure; and create ontologies for biodiversity data.

Biodiversity Informatics projects of the world

Among current significant global scale biodiversity informatics projects can be included the following:
  • The Global Biodiversity Information Facility
    Global Biodiversity Information Facility
    The Global Biodiversity Information Facility is an international organisation that focuses on making scientific data on biodiversity available via the Internet using web services. The data are provided by many institutions from around the world; GBIF's information architecture makes these data...

     (GBIF), and the Ocean Biogeographic Information System
    Ocean Biogeographic Information System
    The Ocean Biogeographic Information System is a web-based access point to information about the distribution and abundance of living species in the ocean.-History:...

      (OBIS) (for marine species)
  • The Species 2000, ITIS
    Itis
    Itis may refer to* Integrated Taxonomic Information System, a partnership designed to provide consistent and reliable information on the taxonomy of biological species...

     (Integrated Taxonomic Information System), and Catalogue of Life
    Catalogue of Life
    The Catalogue of Life, started in June 2001 by Species 2000 and Integrated Taxonomic Information System , is planned to become a comprehensive catalogue of all known species of organisms on Earth by the year 2011. 66 taxonomic databases with contributions from more than 3,000 specialists from...

     projects
  • EOL
    Encyclopedia of Life
    The Encyclopedia of Life is a free, online collaborative encyclopedia intended to document all of the 1.9 million living species known to science. It is compiled from existing databases and from contributions by experts and non-experts throughout the world...

    , The Encyclopedia of Life project
  • The Consortium for the Barcode of Life
    Consortium for the Barcode of Life
    Consortium for the Barcode of Life runs the International Barcode of Life project, a collaborative effort which aims to use DNA barcoding to generate a unique genetic barcode for every species of life on earth...

     project
  • The uBio Universal Biological Indexer and Organizer, from the Woods Hole Marine Biological Laboratory
    Marine Biological Laboratory
    The Marine Biological Laboratory is an international center for research and education in biology, biomedicine and ecology. Founded in 1888, the MBL is the oldest independent marine laboratory in the Americas, taking advantage of a coastal setting in the Cape Cod village of Woods Hole, Massachusetts...

  • The Index to Organism Names (ION) from Thomson Reuters, providing access to scientific names of taxa from numerous journals as indexed in the Zoological Record
  • ZooBank
    ZooBank
    ZooBank is an open access website intended to be the official International Commission on Zoological Nomenclature registry of zoological nomenclature. It was officially proposed in 2005 by the executive secretary of ICZN...

    , the registry for nomenclatural acts and relevant systematic literature in zoology
    Zoology
    Zoology |zoölogy]]), is the branch of biology that relates to the animal kingdom, including the structure, embryology, evolution, classification, habits, and distribution of all animals, both living and extinct...

  • The Index Nominum Genericorum, compilation of generic names published for organisms covered by the International Code of Botanical Nomenclature, maintained at the Smithsonian Institution
    Smithsonian Institution
    The Smithsonian Institution is an educational and research institute and associated museum complex, administered and funded by the government of the United States and by funds from its endowment, contributions, and profits from its retail operations, concessions, licensing activities, and magazines...

     in the U.S.A.
  • The International Plant Names Index
  • MycoBank
    MycoBank
    MycoBank is an online database, documenting new mycological names and combinations, eventually combined with descriptions and illustrations. It is run by the Centraalbureau voor Schimmelcultures fungal biodiversity center in Utrecht....

    , documenting new names and combinations for fungi
  • The List of Prokaryotic names with Standing in Nomenclature (LPSN
    LPSN
    List of Prokaryotic names with Standing in Nomenclature is an online database that maintains and provides accurate name and related information of prokaryotes according to the International Code of Nomenclature of Bacteria , curated by prof...

    ) - Official register of valid names for bacteria
    Bacteria
    Bacteria are a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...

     and archaea
    Archaea
    The Archaea are a group of single-celled microorganisms. A single individual or species from this domain is called an archaeon...

    , as governed by the International Code of Nomenclature of Bacteria
    International Code of Nomenclature of Bacteria
    The International Code of Nomenclature of Bacteria or Bacteriological Code governs the scientific names for bacteria, including Archaea. It denotes the rules for naming taxa of bacteria, according to their relative rank...

  • The Biodiversity Heritage Library
    Biodiversity Heritage Library
    The Biodiversity Heritage Library is a project for the digitization of literature on biodiversity. It was founded in 2005 and was initially formed by ten United States and British libraries....

     project - digitising biodiversity literature
  • Wikispecies
    Wikispecies
    Wikispecies is a wiki-based online project supported by the Wikimedia Foundation. Its aim is to create a comprehensive free content catalogue of all species and is directed at scientists, rather than at the general public...

    , open source (community-editable) compilation of taxonomic information, companion project to Wikipedia
  • TaxonConcept.org, a Linked Data
    Linked Data
    In computing, linked data describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a...

     project that connects disparate species databases
  • Instituto de Ciencias Naturales. Universidad Nacional de Colombia. Virtual Collections and Biodiversity Informatics Unit
  • ANTABIF. The Antarctic Biodiversity Infromation Facility gives free and open access to Antarctic Biodiversity data, in the spirit of the Antarctic Treaty.

Notable regional and national scale syntheses include the following:


  • LifeWatch is proposed by ESFRI as a pan-European research (e-)infrastructure to support Biodiversity research and policy-making.


A listing of over 600 current biodiversity informatics related activities can be found at the TDWG "Biodiversity Information Projects of the World" database.

See also

  • Biodiversity
    Biodiversity
    Biodiversity is the degree of variation of life forms within a given ecosystem, biome, or an entire planet. Biodiversity is a measure of the health of ecosystems. Biodiversity is in part a function of climate. In terrestrial habitats, tropical regions are typically rich whereas polar regions...

  • Taxonomic database
    Taxonomic database
    A taxonomic database is a database created to hold information related to biological taxa - for example groups of organisms organized by species name or other taxonomic identifier - for efficient data management and information retrieval as required...

  • Web-based taxonomy
    Web-based taxonomy
    Web-based taxonomy is the effort by taxonomists to use the World Wide Web in order to create unified, consensus taxonomies of life on Earth.In his 2002 paper on the subject, H. Charles J...

  • List of biodiversity databases

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK