Relationship extraction
Encyclopedia
A relationship extraction task requires the detection and classification of semantic relationship mentions within a set of artifacts
, typically from text
or XML
documents. The task is very similar to that of information extraction
(IE), but IE additionally requires the removal of repeated relations (disambiguation) and generally refers to the extraction of many different relationships.
.
Another approach involves visual detection of meaningful relationships in parametric values of objects listed on a data table that shift positions as the table is permuted automatically as controlled by the software user. The poor coverage, rarity and development cost related to structured resources such as semantic lexicons (e.g. WordNet, UMLS) and domain ontologies (e.g. the Gene Ontology
) has given rise to new approaches based on broad, dynamic background knowledge on the Web. For instance, the ARCHILES technique uses only Wikipedia and search engine page count for acquiring coarse-grained relations to construct lightweight ontologies.
The relationships can be represented using a variety of formalisms/languages. One such representation language for data on the Web is RDF
.
Document
The term document has multiple meanings in ordinary language and in scholarship. WordNet 3.1. lists four meanings :* document, written document, papers...
, typically from text
Plain text
In computing, plain text is the contents of an ordinary sequential file readable as textual material without much processing, usually opposed to formatted text....
or XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
documents. The task is very similar to that of information extraction
Information extraction
Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...
(IE), but IE additionally requires the removal of repeated relations (disambiguation) and generally refers to the extraction of many different relationships.
Applications
Application domains where relationship extraction is useful include gene-disease relationships, protein-protein interaction etc.Approaches
One approach to this problem involves the use of domain ontologiesOntology (computer science)
In computer science and information science, an ontology formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts. It can be used to reason about the entities within that domain and may be used to describe the domain.In theory, an ontology is...
.
Another approach involves visual detection of meaningful relationships in parametric values of objects listed on a data table that shift positions as the table is permuted automatically as controlled by the software user. The poor coverage, rarity and development cost related to structured resources such as semantic lexicons (e.g. WordNet, UMLS) and domain ontologies (e.g. the Gene Ontology
Gene Ontology
The Gene Ontology, or GO, is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species...
) has given rise to new approaches based on broad, dynamic background knowledge on the Web. For instance, the ARCHILES technique uses only Wikipedia and search engine page count for acquiring coarse-grained relations to construct lightweight ontologies.
The relationships can be represented using a variety of formalisms/languages. One such representation language for data on the Web is RDF
Resource Description Framework
The Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model...
.
See also
- Text analyticsText analyticsThe term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation. The term is roughly synonymous with text mining;...
- Semantic analyticsSemantic analyticsSemantic analytics is the use of ontologies to analyze content in web resources. This field of research combines text analytics and semantic web technologies like RDF....
- Semantic role labelingSemantic Role LabelingSemantic role labeling is a task in natural language processing consisting of the detection of the semantic arguments associated with the predicate or verb of a sentence and their classification into their specific roles.-References:...
- Information extractionInformation extractionInformation extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...
- Business Intelligence 2.0Business Intelligence 2.0Business Intelligence 2.0 is a term that refers to new tools and software for business intelligence, beginning in the mid-2000s, that enable, among other things, dynamic querying of real-time corporate data by employees, and a more web- and browser-based approached to such data, as opposed to the...
External References
- Gabor Melli's page on Relation Recognition Algorithms describes the algorithmic approaches and performance measures of the task.