RetrievalWare
Encyclopedia
RetrievalWare is an enterprise search engine
Enterprise search
Enterprise search is the practice of making content from multiple enterprise-type sources, such as databases and intranets, searchable to a defined audience.-Enterprise search summary:...

 emphasizing natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

 and semantic networks which was commercially available from 1992 to 2007 and is especially known for its use by government intelligence agencies.

History

RetrievalWare was initially created by Paul Nelson, Kenneth Clark, and Edwin Addison as part of ConQuest Software. Development began in 1989, but the software was not commercially available on a wide scale until 1992. Early funding was provided by Rome Laboratory
Rome Laboratory
The Rome Laboratory, formerly known as the Rome Air Development Center, is one of eight research and development labs run by the US Air Force located at Griffiss AFB in Rome, NY. One of four superlabs run by the Air Force, the Rome Lab is tasked with generic research, as opposed to having a...

 via a Small Business Innovation Research
Small Business Innovation Research
The Small Business Innovation Research program is a United States Government program, coordinated by the Small Business Administration, in which 2.5% of the total extramural research budgets of all federal agencies with extramural research budgets in excess of $100 million are reserved for...

 grant.

On July 6, 1995, ConQuest Software was merged with Excalibur Technologies and the product was rebranded as RetrievalWare. On December 21, 2000, Excalibur Technologies was combined with Intel Corporation's Interactive Media Services division to form the Convera Corporation. Finally, on April 9, 2007, the RetrievalWare software and business was purchased by Fast Search & Transfer
Fast Search & Transfer
Fast Search & Transfer ASA is a Norwegian company based in Oslo. FAST focuses on data search technologies. It also has offices located in Germany, Italy, Sri Lanka, France, Japan, the United Kingdom, the United States, Brazil, Mexico and other countries around the world. The company was founded...

 at which point the product was officially retired. Microsoft Corporation
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

 continues to maintain the product for its existing customer base.

Annual revenues for RetrievalWare peaked in 2001 at around $40 million US dollars.

Use of natural language techniques

RetrievalWare is a relevancy ranking text search system with processing enhancements drawn from the fields of natural language processing (NLP)
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

 and semantic networks. NLP algorithms include dictionary-based stemming
Stemming
In linguistic morphology and information retrieval, stemming is the process for reducing inflected words to their stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same...

 (also known as lemmatisation
Lemmatisation
Lemmatisation in linguistics, is the process of grouping together the different inflected forms of a word so they can be analysed as a single item....

) and dictionary-based phrase identification. Semantic networks are used by RetrievalWare to expand the query words entered by the user to related terms with terms weights determined by the distance from the user's original terms. In addition to automatic expansion, a feedback-mode whereby users could choose the meaning of the word before performing the expansion was available. The first semantic networks were built using WordNet
WordNet
WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets...

.

In addition, RetrievalWare implemented a form of n-gram
N-gram
In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items in question can be phonemes, syllables, letters, words or base pairs according to the application...

 search (branded as APRP - Adaptive Pattern Recognition Processing), designed to search over documents with OCR
Optical character recognition
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...

 errors. Query terms are divided into sets of 2-grams which are used to locate similarly matching terms from the inverted index
Inverted index
In computer science, an inverted index is an index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents...

. The resulting matches are weighted based on similarly measures and then used to search for documents.

All of these features were available no later than 1993 and ConQuest software has claimed that it was the first commercial text-search system to implement these techniques.

Other notable features

Other notable features of RetrievalWare include distributed search servers, synchronizers for indexing external content management system
Content management system
A content management system is a system providing a collection of procedures used to manage work flow in a collaborative environment. These procedures can be manual or computer-based...

s and relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...

s, a heterogeneous security model, document categorization
Document classification
Document classification or document categorization is a problem in both library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" or algorithmically...

, real-time document-query matching (profiling), multi-lingual searches (queries containing terms from multiple languages searching for documents containing terms from multiple languages), and cross-lingual searches (queries in one language searching for documents in a different language).

Participation in TREC

RetrievalWare participated in the Text REtrieval Conference in 1992 (TREC-1), 1993 (TREC-2), and 1995 (TREC-4).

In TREC-1 and TREC-4, the RetrievalWare runs for manually entered queries produced the best results based on the 11-point averages over all search engines which participated in the Ad-Hoc category where search engines are allowed a single opportunity to process previously unknown queries against an existing database.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK