Semantic search
Encyclopedia
Semantic search seeks to improve search accuracy by understanding searcher intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results. Author Seth Grimes lists "11 approaches that join semantics to search", and
Hildebrand et al. provide an overview that lists semantic search systems and identifies other uses of semantics in the search process.
Guha et al.
distinguish two major forms of search: Navigational and Research. In navigational search, the user is using the search engine as a navigation tool to navigate to a particular intended document. Semantic Search is not applicable to navigational searches. In Research Search, the user provides the search engine with a phrase which is intended to denote an object about which the user is trying to gather/research information. There is no particular document which the user knows about that s/he is trying to get to. Rather, the user is trying to locate a number of documents which together will give him/her the information s/he is trying to find. Semantic Search lends itself well here.
Rather than using ranking algorithms such as Google's PageRank to predict relevancy, Semantic Search uses semantics
, or the science of meaning in language, to produce highly relevant search results. In most cases, the goal is to deliver the information queried by a user rather than have a user sort through a list of loosely related keyword results.
Other authors primarily regard semantic search as a set of techniques for retrieving knowledge from richly structured data sources like ontologies as found on the Semantic Web
. Such technologies enable the formal articulation of domain knowledge
at a high level of expressiveness and could enable the user to specify his intent in more detail at query time.
must occur. When a term is ambiguous, meaning it can have several meanings (for example, if one considers the lemma "bark
", which can be understood as "the sound of a dog," "the skin of a tree," or "a three-masted sailing ship"), the disambiguation process is started, thanks to which the most probable meaning is chosen from all those possible.
Such processes make use of other information present in a semantic analysis system and takes into account the meanings of other words present in the sentence and in the rest of the text. The determination of every meaning, in substance, influences the disambiguation of the others, until a situation of maximum plausibility and coherence is reached for the sentence. All the fundamental information for the disambiguation process, that is, all the knowledge used by the system, is represented in the form of a semantic network, organized on a conceptual basis.
In a structure of this type, every lexical concept coincides therefore with a semantic network node and is linked to others by specific semantic relationships in a hierarchical and hereditary structure. In this way, each concept is enriched with the characteristics and meaning of the nearby nodes.
Every node of the network (called Synset) groups a set of synonyms which represent the same lexical concept (called Synsets) and can contain:
The semantic relationships (links), which identify the semantic relationships between the synsets, are the order principals for the organization of the semantic network concepts.
Hildebrand et al. provide an overview that lists semantic search systems and identifies other uses of semantics in the search process.
Guha et al.
distinguish two major forms of search: Navigational and Research. In navigational search, the user is using the search engine as a navigation tool to navigate to a particular intended document. Semantic Search is not applicable to navigational searches. In Research Search, the user provides the search engine with a phrase which is intended to denote an object about which the user is trying to gather/research information. There is no particular document which the user knows about that s/he is trying to get to. Rather, the user is trying to locate a number of documents which together will give him/her the information s/he is trying to find. Semantic Search lends itself well here.
Rather than using ranking algorithms such as Google's PageRank to predict relevancy, Semantic Search uses semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
, or the science of meaning in language, to produce highly relevant search results. In most cases, the goal is to deliver the information queried by a user rather than have a user sort through a list of loosely related keyword results.
Other authors primarily regard semantic search as a set of techniques for retrieving knowledge from richly structured data sources like ontologies as found on the Semantic Web
Semantic Web
The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...
. Such technologies enable the formal articulation of domain knowledge
Domain knowledge
Domain knowledge is that valid knowledge used to refer to an area of human endeavour, an autonomous computer activity, or other specialized discipline.Specialists and experts use and develop their own domain knowledge...
at a high level of expressiveness and could enable the user to specify his intent in more detail at query time.
Disambiguation
In order to understand what a user is searching for, word sense disambiguationWord sense disambiguation
In computational linguistics, word-sense disambiguation is an open problem of natural language processing, which governs the process of identifying which sense of a word is used in a sentence, when the word has multiple meanings...
must occur. When a term is ambiguous, meaning it can have several meanings (for example, if one considers the lemma "bark
Bark
Bark is the outermost layers of stems and roots of woody plants. Plants with bark include trees, woody vines and shrubs. Bark refers to all the tissues outside of the vascular cambium and is a nontechnical term. It overlays the wood and consists of the inner bark and the outer bark. The inner...
", which can be understood as "the sound of a dog," "the skin of a tree," or "a three-masted sailing ship"), the disambiguation process is started, thanks to which the most probable meaning is chosen from all those possible.
Such processes make use of other information present in a semantic analysis system and takes into account the meanings of other words present in the sentence and in the rest of the text. The determination of every meaning, in substance, influences the disambiguation of the others, until a situation of maximum plausibility and coherence is reached for the sentence. All the fundamental information for the disambiguation process, that is, all the knowledge used by the system, is represented in the form of a semantic network, organized on a conceptual basis.
In a structure of this type, every lexical concept coincides therefore with a semantic network node and is linked to others by specific semantic relationships in a hierarchical and hereditary structure. In this way, each concept is enriched with the characteristics and meaning of the nearby nodes.
Every node of the network (called Synset) groups a set of synonyms which represent the same lexical concept (called Synsets) and can contain:
- single lemmata ('seat', 'vacation'; 'work', 'quick'; 'quickly', 'more', etc.)
- compounds ('non-stop', 'abat-jour', 'policeman')
- collocationCollocationIn corpus linguistics, collocation defines a sequence of words or terms that co-occur more often than would be expected by chance. In phraseology, collocation is a sub-type of phraseme. An example of a phraseological collocation is the expression strong tea...
s ('credit card', 'university degree', 'treasury stock', 'go forward', etc.).
The semantic relationships (links), which identify the semantic relationships between the synsets, are the order principals for the organization of the semantic network concepts.
Commonly used searching methodologies
Mäkelä describes five mainly used methodologies:- RDFResource Description FrameworkThe Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model...
Path Traversal - traversing the net formed by the RDF data format. - Keyword to Concept Mapping
- Graph Patterns - used to formulate patterns for locating interesting connecting paths between resources. Also commonly used in data visualization.
- Logics - by using inferenceInferenceInference is the act or process of deriving logical conclusions from premises known or assumed to be true. The conclusion drawn is also called an idiomatic. The laws of valid inference are studied in the field of logic.Human inference Inference is the act or process of deriving logical conclusions...
based on OWLWeb Ontology LanguageThe Web Ontology Language is a family of knowledge representation languages for authoring ontologies.The languages are characterised by formal semantics and RDF/XML-based serializations for the Semantic Web... - Fuzzy ConceptsFuzzy conceptA fuzzy concept is a concept of which the content, value, or boundaries of application can vary according to context or conditions, instead of being fixed once and for all....
, Fuzzy Relations, Fuzzy Logics
Semantic search portals
- GoPubMedGoPubMedGoPubMed is a knowledge-based search engine for biomedical texts. TheGene Ontology and Medical Subject Headings serve as "Table of contents" in order to structure the millions of articles of the MEDLINE database. The search engine allows its users to find relevant search results significantly...
- first semantic search engine on the internet - launched in 2002 - Research Gate - The professional network for scientists
- GoogleGoogleGoogle Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...
- HakiaHakiahakia is an Internet search engine. The company has invented QDEXing technology, an alternative new infrastructure to indexing that uses SemanticRank algorithm, a solution mix from the disciplines of ontological semantics, fuzzy logic, computational linguistics, and mathematics...
- iGlueIGlueiGlue is an experimental database with detailed search options, containing entities and information editing tool. It organizes interrelated images, videos, individuals, institutions, objects, websites, geographical locations into cohesive data structures....
- semantic search engine with realtime annotator plugin/bookmarklet which adds a smart layer to every website - KosmixKosmixKosmix was an American privately held company in Mountain View, California. Their website earned revenue from advertising related to its categorization engine. The engine organizes the Internet into topic pages allowing users to explore the Web by topic, "presenting a dashboard of relevant videos,...
– social media semantic search - LexxeLexxeLexxe is an internet search engine that applies Natural Language Processingin its semantic search technology. Founded in 2005 by Dr. Hong Liang Qiao,Lexxe is based in Sydney, Australia....
– (beta in early 2011) - SwoogleSwoogleSwoogle is a search engine for Semantic Web ontologies, documents, terms and data published on the Web. Swoogle employs a system of crawlers to discover RDF documents and HTMLdocuments with embedded RDF content...
- YummlyYummlyYummly is a semantic web search engine for food, cooking and recipes. It ‘understands’ food on a variety of levels, allows users to search by ingredient, diet, allergy, nutrition, price, cuisine, time, taste, meal courses and sources, and ‘learns’ about users based on their likes and dislikes....
- food & recipe semantic search - Bing
Enterprise semantic search engines
- EndecaEndeca Technologies Inc.Endeca is a software company headquartered in Cambridge, MA, that sells enterprise search and business intelligence applications. Endeca was founded in 1999 and was a privately-held company, backed by venture capital investment from Bessemer, Venrock, Intel, and SAP...
- ExaleadExaleadExalead is a software company that provides search platforms and search-based applications for consumer and business users. The company is headquartered in Paris, France, and is a subsidiary of Dassault Systèmes .- CloudView Platform :...
- Invention MachineInvention MachineInvention Machine Corporation is software company headquartered in Boston, Massachusetts with offices in Paris, London, Frankfurt, Tokyo, and Minsk and a global network of partners and resellers.-Company Information:...
- SinequaSinequaSinequa is a French search engine.Sinequa provides an enterprise search solution that targets unified search, expertise location , enterprise 2.0 and e-enterprise enablement...
- SmartlogicSmartlogic Semaphore LimitedSmartlogic Semaphore Limited is a software company. Smartlogic specialises in information retrieval software concentrating on adding Semantic capabilities to enterprise search and content management systems. The company's trading name is Smartlogic...
- SophiaSophia Search Limited' is an Irish software company, founded in 2007.Sophia builds contextually aware Enterprise Search solution products, that enable organizations to intelligently discover knowledge from unstructured content based on context so that it can be recovered, consolidated and optimized...
- OpenText
- InbentaInbentaInbenta is a company based in Barcelona, Spain which focuses on data search technologies. Inbenta has also offices in Madrid, and partnership presence in other countries around the world...
See also
- List of Semantic Search Engines
- Semantic webSemantic WebThe Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...
- Semantic UnificationSemantic unificationSemantic unification, in philosophy, linguistics, and computer science, is the process of unifying lexically different concept representations that are judged to have the same semantic content ....
- Resource Description FrameworkResource Description FrameworkThe Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model...
- Natural language search engine