Ontology learning
Encyclopedia
Ontology learning is a subtask of information extraction
. The goal of ontology
learning is to semi-automatically extract relevant concepts and relations from a given corpus
or other kinds of data sets to form an ontology.
The automatic creation of ontologies is a task that involves many disciplines. Typically, the process starts by extracting terms and concepts or noun phrase from plain text using a method from terminology extraction
. This usually involves linguistic processors (e.g. part of speech tagging
, phrase chunking
). Then statistical
or symbolic
techniques are used to extract relation signatures.The intentional aspects of domain are formalized by Ontology.Extensional part is commanded by the knowledge based on instances of concepts and relations on the basis of ontology. For instance, these approaches try to detect that "to eat" denotes a relation between a concept denoted by "animal" and a concept denoted by "food". Recently, a graph-based approach has been proposed which extracts a domain taxonomy
- i.e., the backbone of an ontology - from scratch.
Information extraction
Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...
. The goal of ontology
Ontology (computer science)
In computer science and information science, an ontology formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts. It can be used to reason about the entities within that domain and may be used to describe the domain.In theory, an ontology is...
learning is to semi-automatically extract relevant concepts and relations from a given corpus
Text corpus
In linguistics, a corpus or text corpus is a large and structured set of texts...
or other kinds of data sets to form an ontology.
The automatic creation of ontologies is a task that involves many disciplines. Typically, the process starts by extracting terms and concepts or noun phrase from plain text using a method from terminology extraction
Terminology extraction
Terminology mining, term extraction, term recognition, or glossary extraction, is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus....
. This usually involves linguistic processors (e.g. part of speech tagging
Part-of-speech tagging
In corpus linguistics, part-of-speech tagging , also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e...
, phrase chunking
Phrase chunking
Phrase chunking is a natural language process that separates and segments a sentence into its subconstituents, such as noun, verb, and prepositional phrases.-External links:**...
). Then statistical
or symbolic
techniques are used to extract relation signatures.The intentional aspects of domain are formalized by Ontology.Extensional part is commanded by the knowledge based on instances of concepts and relations on the basis of ontology. For instance, these approaches try to detect that "to eat" denotes a relation between a concept denoted by "animal" and a concept denoted by "food". Recently, a graph-based approach has been proposed which extracts a domain taxonomy
Taxonomy
Taxonomy is the science of identifying and naming species, and arranging them into a classification. The field of taxonomy, sometimes referred to as "biological taxonomy", revolves around the description and use of taxonomic units, known as taxa...
- i.e., the backbone of an ontology - from scratch.
See also
- Information extractionInformation extractionInformation extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...
- Semantic WebSemantic WebThe Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...
- Computational linguisticsComputational linguisticsComputational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....
- Natural language processingNatural language processingNatural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
- Domain Ontology
- TaxonomyTaxonomyTaxonomy is the science of identifying and naming species, and arranging them into a classification. The field of taxonomy, sometimes referred to as "biological taxonomy", revolves around the description and use of taxonomic units, known as taxa...
- GlossaryGlossaryA glossary, also known as an idioticon, vocabulary, or clavis, is an alphabetical list of terms in a particular domain of knowledge with the definitions for those terms...
- Text simplificationText simplificationText simplification is an operation used in natural language processing to modify, enhance, classify or otherwise process an existing corpus of human-readable text in such a way that the grammar and structure of the prose is greatly simplified, while the underlying meaning and information remains...
- Text miningText miningText mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as...