Stop words
Encyclopedia
In computing
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

, stop words are words which are filtered out prior to, or after, processing of natural language
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

 data (text). It is controlled by human input and not automated. There is not one definite list of stop words which all tools use, if even used. Some tools specifically avoid using them to support phrase search
Phrase search
Phrase Search is a type of search that allows users to search for documents containing an exact sentence or phrase opposed to being limited to keywords...

.

Any group of words can be chosen as the stop words for a given purpose. For some search machines, these are some of the most common, short function word
Function word
Function words are words that have little lexical meaning or have ambiguous meaning, but instead serve to express grammatical relationships with other words within a sentence, or specify the attitude or mood of the speaker...

s, such as the, is, at, which and on. In this case, stop words can cause problems when searching for phrases that include them, particularly in names such as 'The Who
The Who
The Who are an English rock band formed in 1964 by Roger Daltrey , Pete Townshend , John Entwistle and Keith Moon . They became known for energetic live performances which often included instrument destruction...

', 'The The
The The
The The are an English musical and multimedia group that have been active in various forms since 1979, with singer/songwriter Matt Johnson being the only constant band member.-Early years :...

', or 'Take That
Take That
Take That are a British five-piece vocal pop group comprising Gary Barlow, Howard Donald, Jason Orange, Mark Owen and Robbie Williams. Barlow acts as the lead singer and primary songwriter...

'.
Other search engines remove some of the most common words—including lexical words, such as "want"—from query in order to improve performance.

Hans Peter Luhn
Hans Peter Luhn
Hans Peter Luhn was a computer scientist for IBM, and creator of the Luhn algorithm and KWIC indexing. He was awarded over 80 patents....

, one of the pioneers in information retrieval
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

, is credited with coining the phrase and using the concept in his design.

See also

  • Text mining
    Text mining
    Text mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as...

  • Concept mining
    Concept Mining
    Concept mining is an activity that results in the extraction of concepts from artifacts. Solutions to the task typically involve aspects of artificial intelligence and statistics, such as data mining and text mining...

  • Information extraction
    Information extraction
    Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...

  • Natural language processing
    Natural language processing
    Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

  • Query expansion
    Query expansion
    Query expansion is the process of reformulating a seed query to improve retrieval performance in information retrieval operations.In the context of web search engines, query expansion involves evaluating a user's input and expanding the search query to match additional documents...

  • Stemming
    Stemming
    In linguistic morphology and information retrieval, stemming is the process for reducing inflected words to their stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same...

  • Search engine indexing
    Index (search engine)
    Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, physics, and computer science...

  • Poison words
    Poison words
    Poison words, or forbidden words, is the name given to words or phrases that trigger suspicion, mistrust and loss of respect, or are of inappropriate character for a given web site in its consideration for a search engine....

  • Function words

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK