Ontology based data integration
Encyclopedia
Ontology based Data Integration involves the use of ontology
(s) to effectively combine data or information from multiple heterogeneous sources . It is one of the multiple data integration
approaches and may be classified as Local-As-View (LAV). The effectiveness of ontology based data integration is closely tied to the consistency and expressivity of the ontology used in the integration process.
Ontologies
, as formal models of representation with explicitly defined concepts and named relationships linking them, are used to address the issue of semantic heterogeneity in data sources. In domains like bioinformatics
and biomedicine
, the rapid development, adoption and public availability of ontologies http://www.bioontology.org/repositories.html#obo has made it possible for the data integration
community to leverage them for semantic integration
of data and information.
The ontology enables accurate interpretation of data from multiple sources through the explicit definition of terms and relationships in the ontology.
In some systems like SIMS , the query is formulated using the ontology as a global query schema.
The ontology verifies the mappings used to integrate data from multiple sources. These mappings may either be user specified or generated by a system.
Single ontology approach: A single ontology is used as a global reference model in the system. This is the simplest approach as it can be simulated by other approaches. SIMS is a prominent example of this approach.
Multiple ontologies: Multiple ontologies, each modeling an individual data source, are used in combination for integration. Though, this approach is more flexible than the single ontology approach, it requires creation of mappings between the multiple ontologies. Ontology mapping is a challenging issue and is focus of large number of research efforts in computer science
http://www.ontologymatching.org/. The OBSERVER system is an example of this approach.
Hybrid approaches: The hybrid approach involves the use of multiple ontologies that subscribe to a common, top-level vocabulary. The top-level vocabulary defines the basic terms of the domain. Thus, the hybrid approach makes it easier to use multiple ontologies for integration in presence of the common vocabulary.
Ontology (computer science)
In computer science and information science, an ontology formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts. It can be used to reason about the entities within that domain and may be used to describe the domain.In theory, an ontology is...
(s) to effectively combine data or information from multiple heterogeneous sources . It is one of the multiple data integration
Data integration
Data integration involves combining data residing in different sources and providing users with a unified view of these data.This process becomes significant in a variety of situations, which include both commercial and scientific domains...
approaches and may be classified as Local-As-View (LAV). The effectiveness of ontology based data integration is closely tied to the consistency and expressivity of the ontology used in the integration process.
Background
Data from multiple sources are characterized by multiple types of heterogeneity. The following hierarchy is often used :- Syntactic Heterogeneity: is a result of differences in representation format of data
- Schematic or Structural Heterogeneity: the native model or structure to store data differ in data sourceData sourceA data source is any of the following types of sources for digitized data:*a database** in the Java software platform, datasource is a special name for the connection set up to a database from a server*a computer file*a data stream...
s leading to structural heterogeneity. Schematic heterogeneity that particularly appears in structured databases is also an aspect of structural heterogeneity . - Semantic Heterogeneity: differences in interpretation of the 'meaning' of data are source of semantic heterogeneity
- System Heterogeneity: use of different operating systemOperating systemAn operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
, hardware platforms lead to system heterogeneity
Ontologies
Ontology (computer science)
In computer science and information science, an ontology formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts. It can be used to reason about the entities within that domain and may be used to describe the domain.In theory, an ontology is...
, as formal models of representation with explicitly defined concepts and named relationships linking them, are used to address the issue of semantic heterogeneity in data sources. In domains like bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
and biomedicine
Biomedicine
Biomedicine is a branch of medical science that applies biological and other natural-science principles to clinical practice,. Biomedicine, i.e. medical research, involves the study of physiological processes with methods from biology, chemistry and physics. Approaches range from understanding...
, the rapid development, adoption and public availability of ontologies http://www.bioontology.org/repositories.html#obo has made it possible for the data integration
Data integration
Data integration involves combining data residing in different sources and providing users with a unified view of these data.This process becomes significant in a variety of situations, which include both commercial and scientific domains...
community to leverage them for semantic integration
Semantic integration
Semantic integration is the process of interrelating information from diverse sources, for example calendars and to do lists; email archives; physical, psychological, and social presence information; documents of all sorts; contacts ; search results; and advertising and marketing relevance derived...
of data and information.
The Role of Ontologies
Ontologies enable the unambiguous identification of entities in heterogeneous information systems and assertion of applicable named relationships that connect these entities together. Specifically, ontologies play the following roles:- Content Explication
The ontology enables accurate interpretation of data from multiple sources through the explicit definition of terms and relationships in the ontology.
- Query Model
In some systems like SIMS , the query is formulated using the ontology as a global query schema.
- Verification
The ontology verifies the mappings used to integrate data from multiple sources. These mappings may either be user specified or generated by a system.
Approaches using ontologies for data Integration
There are three main architectures that are implemented in ontology-based data integration applications, namely,Single ontology approach: A single ontology is used as a global reference model in the system. This is the simplest approach as it can be simulated by other approaches. SIMS is a prominent example of this approach.
Multiple ontologies: Multiple ontologies, each modeling an individual data source, are used in combination for integration. Though, this approach is more flexible than the single ontology approach, it requires creation of mappings between the multiple ontologies. Ontology mapping is a challenging issue and is focus of large number of research efforts in computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
http://www.ontologymatching.org/. The OBSERVER system is an example of this approach.
Hybrid approaches: The hybrid approach involves the use of multiple ontologies that subscribe to a common, top-level vocabulary. The top-level vocabulary defines the basic terms of the domain. Thus, the hybrid approach makes it easier to use multiple ontologies for integration in presence of the common vocabulary.
See also
- Schema matchingSchema matchingThe terms schema matching and mapping are often used interchangeably. For this article, we differentiate the two as follows: Schema matching is the process of identifying that two objects are semantically related while mapping refers to the transformations between the objects...
- Data IntegrationData integrationData integration involves combining data residing in different sources and providing users with a unified view of these data.This process becomes significant in a variety of situations, which include both commercial and scientific domains...
- Enterprise application integrationEnterprise application integrationEnterprise Application Integration is defined as the use of software and computer systems architectural principles to integrate a set of enterprise computer applications.- Overview :...
- Enterprise Information IntegrationEnterprise Information IntegrationEnterprise Information Integration , is a process of information integration, using data abstraction to provide a unified interface for viewing all the data within an organization, and a single set of structures and naming conventions to represent this data; the goal of EII is to get a large set of...
- Data mappingData mappingData mapping is the process of creating data element mappings between two distinct data models. Data mapping is used as a first step for a wide variety of data integration tasks including:...
- Ontology mapping
- Semantic IntegrationSemantic integrationSemantic integration is the process of interrelating information from diverse sources, for example calendars and to do lists; email archives; physical, psychological, and social presence information; documents of all sorts; contacts ; search results; and advertising and marketing relevance derived...