Semantic search - AbsoluteAstronomy.com

Semantic search seeks to improve search accuracy by understanding searcher intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results. Author Seth Grimes lists "11 approaches that join semantics to search", and
Hildebrand et al. provide an overview that lists semantic search systems and identifies other uses of semantics in the search process.

Guha et al.
distinguish two major forms of search: Navigational and Research. In navigational search, the user is using the search engine as a navigation tool to navigate to a particular intended document. Semantic Search is not applicable to navigational searches. In Research Search, the user provides the search engine with a phrase which is intended to denote an object about which the user is trying to gather/research information. There is no particular document which the user knows about that s/he is trying to get to. Rather, the user is trying to locate a number of documents which together will give him/her the information s/he is trying to find. Semantic Search lends itself well here.

Rather than using ranking algorithms such as Google's PageRank to predict relevancy, Semantic Search uses semantics

Semantics

Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....

, or the science of meaning in language, to produce highly relevant search results. In most cases, the goal is to deliver the information queried by a user rather than have a user sort through a list of loosely related keyword results.

Other authors primarily regard semantic search as a set of techniques for retrieving knowledge from richly structured data sources like ontologies as found on the Semantic Web

Semantic Web

The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...

. Such technologies enable the formal articulation of domain knowledge

Domain knowledge

Domain knowledge is that valid knowledge used to refer to an area of human endeavour, an autonomous computer activity, or other specialized discipline.Specialists and experts use and develop their own domain knowledge...

at a high level of expressiveness and could enable the user to specify his intent in more detail at query time.

Disambiguation

In order to understand what a user is searching for, word sense disambiguation

Word sense disambiguation

In computational linguistics, word-sense disambiguation is an open problem of natural language processing, which governs the process of identifying which sense of a word is used in a sentence, when the word has multiple meanings...

must occur. When a term is ambiguous, meaning it can have several meanings (for example, if one considers the lemma "bark

Bark

Bark is the outermost layers of stems and roots of woody plants. Plants with bark include trees, woody vines and shrubs. Bark refers to all the tissues outside of the vascular cambium and is a nontechnical term. It overlays the wood and consists of the inner bark and the outer bark. The inner...

", which can be understood as "the sound of a dog," "the skin of a tree," or "a three-masted sailing ship"), the disambiguation process is started, thanks to which the most probable meaning is chosen from all those possible.

Such processes make use of other information present in a semantic analysis system and takes into account the meanings of other words present in the sentence and in the rest of the text. The determination of every meaning, in substance, influences the disambiguation of the others, until a situation of maximum plausibility and coherence is reached for the sentence. All the fundamental information for the disambiguation process, that is, all the knowledge used by the system, is represented in the form of a semantic network, organized on a conceptual basis.

In a structure of this type, every lexical concept coincides therefore with a semantic network node and is linked to others by specific semantic relationships in a hierarchical and hereditary structure. In this way, each concept is enriched with the characteristics and meaning of the nearby nodes.

Every node of the network (called Synset) groups a set of synonyms which represent the same lexical concept (called Synsets) and can contain:

single lemmata ('seat', 'vacation'; 'work', 'quick'; 'quickly', 'more', etc.)
compounds ('non-stop', 'abat-jour', 'policeman')
collocation
Collocation
In corpus linguistics, collocation defines a sequence of words or terms that co-occur more often than would be expected by chance. In phraseology, collocation is a sub-type of phraseme. An example of a phraseological collocation is the expression strong tea...

s ('credit card', 'university degree', 'treasury stock', 'go forward', etc.).

The semantic relationships (links), which identify the semantic relationships between the synsets, are the order principals for the organization of the semantic network concepts.

Commonly used searching methodologies

Mäkelä describes five mainly used methodologies:

RDF
Resource Description Framework
The Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model...

Path Traversal - traversing the net formed by the RDF data format.
Keyword to Concept Mapping
Graph Patterns - used to formulate patterns for locating interesting connecting paths between resources. Also commonly used in data visualization.
Logics - by using inference
Inference
Inference is the act or process of deriving logical conclusions from premises known or assumed to be true. The conclusion drawn is also called an idiomatic. The laws of valid inference are studied in the field of logic.Human inference Inference is the act or process of deriving logical conclusions...

based on OWL
Web Ontology Language
The Web Ontology Language is a family of knowledge representation languages for authoring ontologies.The languages are characterised by formal semantics and RDF/XML-based serializations for the Semantic Web...
Fuzzy Concepts
Fuzzy concept
A fuzzy concept is a concept of which the content, value, or boundaries of application can vary according to context or conditions, instead of being fixed once and for all....

, Fuzzy Relations, Fuzzy Logics

Semantic search portals

GoPubMed
GoPubMed
GoPubMed is a knowledge-based search engine for biomedical texts. TheGene Ontology and Medical Subject Headings serve as "Table of contents" in order to structure the millions of articles of the MEDLINE database. The search engine allows its users to find relevant search results significantly...

- first semantic search engine on the internet - launched in 2002
Research Gate - The professional network for scientists
Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...
Hakia
Hakia
hakia is an Internet search engine. The company has invented QDEXing technology, an alternative new infrastructure to indexing that uses SemanticRank algorithm, a solution mix from the disciplines of ontological semantics, fuzzy logic, computational linguistics, and mathematics...
iGlue
IGlue
iGlue is an experimental database with detailed search options, containing entities and information editing tool. It organizes interrelated images, videos, individuals, institutions, objects, websites, geographical locations into cohesive data structures....

- semantic search engine with realtime annotator plugin/bookmarklet which adds a smart layer to every website
Kosmix
Kosmix
Kosmix was an American privately held company in Mountain View, California. Their website earned revenue from advertising related to its categorization engine. The engine organizes the Internet into topic pages allowing users to explore the Web by topic, "presenting a dashboard of relevant videos,...

– social media semantic search
Lexxe
Lexxe
Lexxe is an internet search engine that applies Natural Language Processingin its semantic search technology. Founded in 2005 by Dr. Hong Liang Qiao,Lexxe is based in Sydney, Australia....

– (beta in early 2011)
Swoogle
Swoogle
Swoogle is a search engine for Semantic Web ontologies, documents, terms and data published on the Web. Swoogle employs a system of crawlers to discover RDF documents and HTMLdocuments with embedded RDF content...
Yummly
Yummly
Yummly is a semantic web search engine for food, cooking and recipes. It ‘understands’ food on a variety of levels, allows users to search by ingredient, diet, allergy, nutrition, price, cuisine, time, taste, meal courses and sources, and ‘learns’ about users based on their likes and dislikes....

- food & recipe semantic search
Bing

Enterprise semantic search engines

Endeca
Endeca Technologies Inc.
Endeca is a software company headquartered in Cambridge, MA, that sells enterprise search and business intelligence applications. Endeca was founded in 1999 and was a privately-held company, backed by venture capital investment from Bessemer, Venrock, Intel, and SAP...
Exalead
Exalead
Exalead is a software company that provides search platforms and search-based applications for consumer and business users. The company is headquartered in Paris, France, and is a subsidiary of Dassault Systèmes .- CloudView Platform :...
Invention Machine
Invention Machine
Invention Machine Corporation is software company headquartered in Boston, Massachusetts with offices in Paris, London, Frankfurt, Tokyo, and Minsk and a global network of partners and resellers.-Company Information:...
Sinequa
Sinequa
Sinequa is a French search engine.Sinequa provides an enterprise search solution that targets unified search, expertise location , enterprise 2.0 and e-enterprise enablement...
Smartlogic
Smartlogic Semaphore Limited
Smartlogic Semaphore Limited is a software company. Smartlogic specialises in information retrieval software concentrating on adding Semantic capabilities to enterprise search and content management systems. The company's trading name is Smartlogic...
Sophia
Sophia Search Limited
' is an Irish software company, founded in 2007.Sophia builds contextually aware Enterprise Search solution products, that enable organizations to intelligently discover knowledge from unstructured content based on context so that it can be recovered, consolidated and optimized...
OpenText
Inbenta
Inbenta
Inbenta is a company based in Barcelona, Spain which focuses on data search technologies. Inbenta has also offices in Madrid, and partnership presence in other countries around the world...

Disambiguation

Commonly used searching methodologies

Semantic search portals

Enterprise semantic search engines

See also