SBML
Encyclopedia
The Systems Biology Markup Language (SBML) is a representation format, based on XML
, for communicating and storing computational models
of biological processes. It is a free and open standard with widespread software support and a community of users and developers. SBML can represent many different classes of biological phenomena, including metabolic networks
, cell-signaling
pathways, regulatory networks
, infectious diseases
, and many others. It is the de facto standard for representing computational models in systems biology today.
and John C. Doyle assembled a small team of researchers to work on developing better software infrastructure for computational modeling in systems biology
. Hamid Bolouri was the leader of the development team, which consisted of Andrew Finney, Herbert Sauro, and Michael Hucka. Bolouri identified the need for a framework to enable interoperability and sharing between the different simulation software systems for biology in existence during the late 1990's, and he organized an informal workshop in December 1999 at the California Institute of Technology
to discuss the matter. In attendance at that workshop were the groups responsible for the development of DBSolve, E-Cell, Gepasi, Jarnac, StochSim and The Virtual Cell. Separately, earlier in 1999, some members of these groups also had discussed the creation of a portable file format for metabolic network models in the BioThermoKinetics (BTK) group. The same groups who attended the first Caltech workshop met again on April 28-29, 2000, at the first of a newly-created meeting series called Workshop on Software Platforms for Systems Biology. It became clear during the second workshop that a common model representation format was needed to enable the exchange of models between software tools as part of any functioning interoperability framework, and the workshop attendees decided the format should be encoded in XML
.
The Caltech ERATO team developed a proposal for this XML-based format and circulated the draft definition to the attendees of the 2nd Workshop on Software Platforms for Systems Biology in August, 2000. This draft underwent extensive discussion over mailing lists and during the 2nd Workshop on Software Platforms for Systems Biology, held in Tokyo
, Japan, in November 2000 as a satellite workshop of the ICSB 2000 conference. After further revisions, discussions and software implementations, the Caltech team issued a specification for SBML Level 1, Version 1 in March, 2001.
SBML Level 2 was conceived at the 5th Workshop on Software Platforms for Systems Biology, held in July 2002, at the University of Hertfordshire
, UK. By this time, far more people were involved than the original group of SBML collaborators and the continued evolution of SBML became a larger community effort, with many new tools having been enhanced to support SBML. The workshop participants in 2002 collectively decided to revise the form of SBML in Level 2. The first draft of the Level 2 Version 1
specification was released in August 2002, and the final set of features was finalized in May 2003 at the 7th Workshop on Software Platforms for Systems Biology in Ft. Lauderdale, Florida.
The next iteration of SBML took two years in part because software developers requested time to absorb and understand the larger and more complex SBML Level 2. The inevitable discovery of limitations and errors led to the development of
SBML Level 2 Version 2, issued in September 2006. By this time, the team of SBML Editors (who reconcile proposals for changes and write a coherent final specification document) had changed and now consisted of Andrew Finney, Michael Hucka and Nicolas Le Novère.
SBML Level 2 Version 3 was published in 2007 after countless contributions by and discussions with the SBML community. 2007 also saw the election of two more SBML Editors as part of the introduction of the modern SBML Editor organization in the context of the SBML development process.
SBML Level 2 Version 4 was published in 2008 after certain changes in Level 2 were requested by popular demand. (For example, an electronic vote by the SBML community in late 2007 indicated a majority preferred not to require strict unit consistency before an SBML model is considered valid.) Version 4 was finalized after the SBML Forum meeting held in Gothenburg
, Sweden, as a satellite workshop of ICSB 2008 in the fall of 2008.
SBML Level 3 Version 1 Core was published in final form in 2010, after prolonged discussion and revision by the SBML Editors and the SBML community. It contains numerous significant changes in syntax and constructs from Level 2 Version 4, but also represents a new modular base for continued expansion of SBML's features and capabilities going into the future.
.
SBML is not an attempt to define a universal language for quantitative models. SBML's purpose is to serve as a lingua franca
—an exchange format used by different present-day software tools to communicate the essential aspects of a computational model.
A software package can read an SBML model description and translate it into its own internal format for model analysis. For example, a package might provide the ability to simulate the model by constructing differential equations and then perform numerical time integration on the equations to explore the model's dynamic behavior. Or, alternatively, a package might construct a discrete stochastic
representation of the model and use a Monte Carlo
simulation method such as the Gillespie algorithm
.
SBML allows models of arbitrary complexity to be represented. Each type of component in a model is described using a specific type of data structure that organizes the relevant information. The data structures determine how the resulting model is encoded in XML.
In addition to the elements above, another important feature of SBML is that every entity can have machine-readable annotations attached to it. These annotations can be used to express relationships between the entities in a given model and entities in external resources such as databases. A good example of the value of this is in BioModels Database, where every model is annotated and linked to relevant data resources such as publications, databases of compounds and pathways, controlled vocabularies, and more. With annotations, a model becomes more than simply a rendition of a mathematical construct—it becomes a semantically-enriched framework for communicating knowledge.
There are currently three Levels of SBML defined. The current Versions within those Levels are the following:
Open-source software infrastructure such as libSBML and JSBML allows developers to support all Levels of SBML their software with a minimum amount of effort.
The SBML Team maintains a public issue tracker where readers may report errors or other issues in the SBML specification documents. Reported issues are eventually put on the list of official errata associated with each specification release. (An example is the list of errata for Level 2 Version 4.)
SBML has been and continues to be developed by the community of people making software platforms for systems biology, through active email discussion lists and biannual workshops. The meetings are often held in conjunction with other biology conferences, especially the International Conference on Systems Biology (ICSB). The community effort is coordinated by an elected editorial board made up of five members. Each editor is elected for a 3-year non-renewable term.
Tools such as an on-line model validator as well as open-source libraries for incorporating SBML into software programmed in the C, C++
, Java
, Python
, Mathematica
, MATLAB
and other languages are developed partly by the SBML Team and partly by the broader SBML community.
SBML is an official IETF MIME
type, specified by RFC 3823.
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
, for communicating and storing computational models
Computational model
A computational model is a mathematical model in computational science that requires extensive computational resources to study the behavior of a complex system by computer simulation. The system under study is often a complex nonlinear system for which simple, intuitive analytical solutions are...
of biological processes. It is a free and open standard with widespread software support and a community of users and developers. SBML can represent many different classes of biological phenomena, including metabolic networks
Metabolic network
A metabolic network is the complete set of metabolic and physical processes that determine the physiological and biochemical properties of a cell...
, cell-signaling
Cell signaling
Cell signaling is part of a complex system of communication that governs basic cellular activities and coordinates cell actions. The ability of cells to perceive and correctly respond to their microenvironment is the basis of development, tissue repair, and immunity as well as normal tissue...
pathways, regulatory networks
Gene regulatory network
A gene regulatory network or genetic regulatory network is a collection of DNA segments in a cell whichinteract with each other indirectly and with other substances in the cell, thereby governing the rates at which genes in the network are transcribed into mRNA.In general, each mRNA molecule goes...
, infectious diseases
Infectious disease
Infectious diseases, also known as communicable diseases, contagious diseases or transmissible diseases comprise clinically evident illness resulting from the infection, presence and growth of pathogenic biological agents in an individual host organism...
, and many others. It is the de facto standard for representing computational models in systems biology today.
History
Late in the year 1999 through early 2000, with funding from the Japan Science and Technology Corporation (JST), Hiroaki KitanoHiroaki Kitano
is a Japanese scientist who currently works for Sony Computer Science Laboratories and is best known for developing AIBO, and the robotic world cup tournament known as Robocup. He graduated from International Christian University in 1984 and received a Ph.D. in engineering from Kyoto University in...
and John C. Doyle assembled a small team of researchers to work on developing better software infrastructure for computational modeling in systems biology
Systems biology
Systems biology is a term used to describe a number of trends in bioscience research, and a movement which draws on those trends. Proponents describe systems biology as a biology-based inter-disciplinary study field that focuses on complex interactions in biological systems, claiming that it uses...
. Hamid Bolouri was the leader of the development team, which consisted of Andrew Finney, Herbert Sauro, and Michael Hucka. Bolouri identified the need for a framework to enable interoperability and sharing between the different simulation software systems for biology in existence during the late 1990's, and he organized an informal workshop in December 1999 at the California Institute of Technology
California Institute of Technology
The California Institute of Technology is a private research university located in Pasadena, California, United States. Caltech has six academic divisions with strong emphases on science and engineering...
to discuss the matter. In attendance at that workshop were the groups responsible for the development of DBSolve, E-Cell, Gepasi, Jarnac, StochSim and The Virtual Cell. Separately, earlier in 1999, some members of these groups also had discussed the creation of a portable file format for metabolic network models in the BioThermoKinetics (BTK) group. The same groups who attended the first Caltech workshop met again on April 28-29, 2000, at the first of a newly-created meeting series called Workshop on Software Platforms for Systems Biology. It became clear during the second workshop that a common model representation format was needed to enable the exchange of models between software tools as part of any functioning interoperability framework, and the workshop attendees decided the format should be encoded in XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
.
The Caltech ERATO team developed a proposal for this XML-based format and circulated the draft definition to the attendees of the 2nd Workshop on Software Platforms for Systems Biology in August, 2000. This draft underwent extensive discussion over mailing lists and during the 2nd Workshop on Software Platforms for Systems Biology, held in Tokyo
Tokyo
, ; officially , is one of the 47 prefectures of Japan. Tokyo is the capital of Japan, the center of the Greater Tokyo Area, and the largest metropolitan area of Japan. It is the seat of the Japanese government and the Imperial Palace, and the home of the Japanese Imperial Family...
, Japan, in November 2000 as a satellite workshop of the ICSB 2000 conference. After further revisions, discussions and software implementations, the Caltech team issued a specification for SBML Level 1, Version 1 in March, 2001.
SBML Level 2 was conceived at the 5th Workshop on Software Platforms for Systems Biology, held in July 2002, at the University of Hertfordshire
University of Hertfordshire
The University of Hertfordshire is a new university based largely in Hatfield, in the county of Hertfordshire, England, from which the university takes its name. It has more than 27,500 students, over 2500 staff, with a turnover of over £181m...
, UK. By this time, far more people were involved than the original group of SBML collaborators and the continued evolution of SBML became a larger community effort, with many new tools having been enhanced to support SBML. The workshop participants in 2002 collectively decided to revise the form of SBML in Level 2. The first draft of the Level 2 Version 1
specification was released in August 2002, and the final set of features was finalized in May 2003 at the 7th Workshop on Software Platforms for Systems Biology in Ft. Lauderdale, Florida.
The next iteration of SBML took two years in part because software developers requested time to absorb and understand the larger and more complex SBML Level 2. The inevitable discovery of limitations and errors led to the development of
SBML Level 2 Version 2, issued in September 2006. By this time, the team of SBML Editors (who reconcile proposals for changes and write a coherent final specification document) had changed and now consisted of Andrew Finney, Michael Hucka and Nicolas Le Novère.
SBML Level 2 Version 3 was published in 2007 after countless contributions by and discussions with the SBML community. 2007 also saw the election of two more SBML Editors as part of the introduction of the modern SBML Editor organization in the context of the SBML development process.
SBML Level 2 Version 4 was published in 2008 after certain changes in Level 2 were requested by popular demand. (For example, an electronic vote by the SBML community in late 2007 indicated a majority preferred not to require strict unit consistency before an SBML model is considered valid.) Version 4 was finalized after the SBML Forum meeting held in Gothenburg
Gothenburg
Gothenburg is the second-largest city in Sweden and the fifth-largest in the Nordic countries. Situated on the west coast of Sweden, the city proper has a population of 519,399, with 549,839 in the urban area and total of 937,015 inhabitants in the metropolitan area...
, Sweden, as a satellite workshop of ICSB 2008 in the fall of 2008.
SBML Level 3 Version 1 Core was published in final form in 2010, after prolonged discussion and revision by the SBML Editors and the SBML community. It contains numerous significant changes in syntax and constructs from Level 2 Version 4, but also represents a new modular base for continued expansion of SBML's features and capabilities going into the future.
The language
SBML is sometimes incorrectly assumed to be limited in scope only to biochemical network models because the original publications and early software focused on this domain. In reality, although the central features of SBML are indeed oriented towards representing chemical reaction-like processes that act on participants, this same formalism serves analogously for many other types of processes; moreover, SBML has language features supporting the direct expression of mathematical formulas and discontinuous events separate from reaction processes, allowing SBML to represent much more than only biochemical reactions. Evidence for SBML's ability to be used for more than merely descriptions of biochemistry can be seen in the variety of models available from BioModels DatabaseBioModels Database
BioModels Database is a free and open-source database for storing, exchanging and retrieving published quantitative models of biological interest...
.
Purposes
SBML has three main purposes:- enabling the use of multiple software tools without having to rewrite models to conform to every tool's idiosyncratic file format;
- enabling models to be shared and published in a form that other researchers can use even when working with different software environments;
- ensuring the survival of models beyond the lifetime of the software used to create them.
SBML is not an attempt to define a universal language for quantitative models. SBML's purpose is to serve as a lingua franca
Lingua franca
A lingua franca is a language systematically used to make communication possible between people not sharing a mother tongue, in particular when it is a third language, distinct from both mother tongues.-Characteristics:"Lingua franca" is a functionally defined term, independent of the linguistic...
—an exchange format used by different present-day software tools to communicate the essential aspects of a computational model.
Main capabilities
SBML can encode models consisting of entities (called species in SBML) acted upon by processes (called reactions). An important principle is that models are decomposed into explicitly-labeled constituent elements, the set of which resembles a verbose rendition of chemical reaction equations (if the model uses reactions) together with optional explicit equations (again, if the model uses these); the SBML representation deliberately does not cast the model directly into a set of differential equations or other specific interpretation of the model. This explicit, modeling-framework-agnostic decomposition makes it easier for a software tool to interpret the model and translate the SBML form into whatever internal form the tool actually uses.A software package can read an SBML model description and translate it into its own internal format for model analysis. For example, a package might provide the ability to simulate the model by constructing differential equations and then perform numerical time integration on the equations to explore the model's dynamic behavior. Or, alternatively, a package might construct a discrete stochastic
Stochastic
Stochastic refers to systems whose behaviour is intrinsically non-deterministic. A stochastic process is one whose behavior is non-deterministic, in that a system's subsequent state is determined both by the process's predictable actions and by a random element. However, according to M. Kac and E...
representation of the model and use a Monte Carlo
Dynamic Monte Carlo method
In chemistry, dynamic Monte Carlo is a method for modeling the dynamic behaviors of molecules by comparing the rates of individual steps with random numbers...
simulation method such as the Gillespie algorithm
Gillespie algorithm
In probability theory, the Gillespie algorithm generates a statistically correct trajectory of a stochastic equation. It was created by Joseph L...
.
SBML allows models of arbitrary complexity to be represented. Each type of component in a model is described using a specific type of data structure that organizes the relevant information. The data structures determine how the resulting model is encoded in XML.
In addition to the elements above, another important feature of SBML is that every entity can have machine-readable annotations attached to it. These annotations can be used to express relationships between the entities in a given model and entities in external resources such as databases. A good example of the value of this is in BioModels Database, where every model is annotated and linked to relevant data resources such as publications, databases of compounds and pathways, controlled vocabularies, and more. With annotations, a model becomes more than simply a rendition of a mathematical construct—it becomes a semantically-enriched framework for communicating knowledge.
Levels and versions
SBML is defined in Levels: upward-compatible specifications that add features and expressive power. Software tools that do not need or cannot support the complexity of higher Levels can go on using lower Levels; tools that can read higher Levels are assured of also being able to interpret models defined in the lower Levels. Thus new Levels do not supersede previous ones. However, each Level can have multiple Versions within it, and new Versions of a Level do supersede old Versions of that same Level.There are currently three Levels of SBML defined. The current Versions within those Levels are the following:
- Level 3 Version 1 Core, for which the final Release 1 specification was issued 6 October 2010
- Level 2 Version 4 Release 1
- Level 1 Version 2
Open-source software infrastructure such as libSBML and JSBML allows developers to support all Levels of SBML their software with a minimum amount of effort.
The SBML Team maintains a public issue tracker where readers may report errors or other issues in the SBML specification documents. Reported issues are eventually put on the list of official errata associated with each specification release. (An example is the list of errata for Level 2 Version 4.)
Structure
A model definition in SBML Levels 2 and 3 consists of lists of one or more of the following components:- Function definition: A named mathematical function that may be used throughout the rest of a model.
- Unit definition: A named definition of a new unit of measure, or a redefinition of an existing SBML default unit. Named units can be used in the expression of quantities in a model.
- Compartment Type (only in SBML Level 2): A type of location where reacting entities such as chemical substances may be located.
- Species type (only in SBML Level 2): A type of entity that can participate in reactions. Examples of species types include ions such as Ca2+, molecules such as glucose or ATP, binding sites on a protein, and more.
- Compartment: A well-stirred container of a particular type and finite size where species may be located. A model may contain multiple compartments of the same compartment type. Every species in a model must be located in a compartment.
- Species: A pool of entities of the same species type located in a specific compartment.
- Parameter: A quantity with a symbolic name. In SBML, the term parameter is used in a generic sense to refer to named quantities regardless of whether they are constants or variables in a model.
- Initial Assignment: A mathematical expression used to determine the initial conditions of a model. This type of structure can only be used to define how the value of a variable can be calculated from other values and variables at the start of simulated time.
- Rule: A mathematical expression used in combination with the differential equations constructed based on the set of reactions in a model. It can be used to define how a variable's value can be calculated from other variables, or used to define the rate of change of a variable. The set of rules in a model can be used with the reaction rate equations to determine the behavior of the model with respect to time. The set of rules constrains the model for the entire duration of simulated time.
- Constraint: A mathematical expression that defines a constraint on the values of model variables. The constraint applies at all instants of simulated time. The set of constraints in model should not be used to determine the behavior of the model with respect to time.
- Reaction: A statement describing some transformation, transport or binding process that can change the amount of one or more species. For example, a reaction may describe how certain entities (reactants) are transformed into certain other entities (products). Reactions have associated kinetic rate expressions describing how quickly they take place.
- Event: A statement describing an instantaneous, discontinuous change in a set of variables of any type (species concentration, compartment size or parameter value) when a triggering condition is satisfied.
Community
As of December, 2010, more than 200 software systems advertise support for SBML. A current list is available in the form of the SBML Software Guide, hosted at sbml.org.SBML has been and continues to be developed by the community of people making software platforms for systems biology, through active email discussion lists and biannual workshops. The meetings are often held in conjunction with other biology conferences, especially the International Conference on Systems Biology (ICSB). The community effort is coordinated by an elected editorial board made up of five members. Each editor is elected for a 3-year non-renewable term.
Tools such as an on-line model validator as well as open-source libraries for incorporating SBML into software programmed in the C, C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
, Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
, Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
, Mathematica
Mathematica
Mathematica is a computational software program used in scientific, engineering, and mathematical fields and other areas of technical computing...
, MATLAB
MATLAB
MATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...
and other languages are developed partly by the SBML Team and partly by the broader SBML community.
SBML is an official IETF MIME
MIME
Multipurpose Internet Mail Extensions is an Internet standard that extends the format of email to support:* Text in character sets other than ASCII* Non-text attachments* Message bodies with multiple parts...
type, specified by RFC 3823.
See also
- BioModels DatabaseBioModels DatabaseBioModels Database is a free and open-source database for storing, exchanging and retrieving published quantitative models of biological interest...
- BioPAXBioPAXBioPAX is a RDF/OWL-basedstandard language to represent biological pathwaysat the molecular and cellular level. Its major use is to facilitate the exchange of pathway data....
- CellMLCellMLCellML is an XML based markup language for describing mathematical models. Although it could theoretically describe any mathematical model, it was originally created with the Physiome Project in mind, and hence used primarily to describe models relevant to the field of biology...
- MIASE
- MIRIAMMIRIAMMIRIAM , is an effort to standardize the annotation and curation process of quantitative models of biological systems...
- Systems Biology Ontology (SBO)
- Systems Biology Graphical NotationSystems Biology Graphical NotationThe Systems Biology Graphical Notation is a standard graphical representation crafted over several years by a community of biochemists, modelers and computer scientists....
(SBGN)
External links
- The SBML home page
- Recent presentations and posters about SBML available from Nature Precedings
- The COmputational Modeling in BIology NEtwork home page