Stop words
Encyclopedia
In computing
, stop words are words which are filtered out prior to, or after, processing of natural language
data (text). It is controlled by human input and not automated. There is not one definite list of stop words which all tools use, if even used. Some tools specifically avoid using them to support phrase search
.
Any group of words can be chosen as the stop words for a given purpose. For some search machines, these are some of the most common, short function word
s, such as the, is, at, which and on. In this case, stop words can cause problems when searching for phrases that include them, particularly in names such as 'The Who
', 'The The
', or 'Take That
'.
Other search engines remove some of the most common words—including lexical words, such as "want"—from query in order to improve performance.
Hans Peter Luhn
, one of the pioneers in information retrieval
, is credited with coining the phrase and using the concept in his design.
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...
, stop words are words which are filtered out prior to, or after, processing of natural language
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
data (text). It is controlled by human input and not automated. There is not one definite list of stop words which all tools use, if even used. Some tools specifically avoid using them to support phrase search
Phrase search
Phrase Search is a type of search that allows users to search for documents containing an exact sentence or phrase opposed to being limited to keywords...
.
Any group of words can be chosen as the stop words for a given purpose. For some search machines, these are some of the most common, short function word
Function word
Function words are words that have little lexical meaning or have ambiguous meaning, but instead serve to express grammatical relationships with other words within a sentence, or specify the attitude or mood of the speaker...
s, such as the, is, at, which and on. In this case, stop words can cause problems when searching for phrases that include them, particularly in names such as 'The Who
The Who
The Who are an English rock band formed in 1964 by Roger Daltrey , Pete Townshend , John Entwistle and Keith Moon . They became known for energetic live performances which often included instrument destruction...
', 'The The
The The
The The are an English musical and multimedia group that have been active in various forms since 1979, with singer/songwriter Matt Johnson being the only constant band member.-Early years :...
', or 'Take That
Take That
Take That are a British five-piece vocal pop group comprising Gary Barlow, Howard Donald, Jason Orange, Mark Owen and Robbie Williams. Barlow acts as the lead singer and primary songwriter...
'.
Other search engines remove some of the most common words—including lexical words, such as "want"—from query in order to improve performance.
Hans Peter Luhn
Hans Peter Luhn
Hans Peter Luhn was a computer scientist for IBM, and creator of the Luhn algorithm and KWIC indexing. He was awarded over 80 patents....
, one of the pioneers in information retrieval
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...
, is credited with coining the phrase and using the concept in his design.
See also
- Text miningText miningText mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as...
- Concept miningConcept MiningConcept mining is an activity that results in the extraction of concepts from artifacts. Solutions to the task typically involve aspects of artificial intelligence and statistics, such as data mining and text mining...
- Information extractionInformation extractionInformation extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...
- Natural language processingNatural language processingNatural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
- Query expansionQuery expansionQuery expansion is the process of reformulating a seed query to improve retrieval performance in information retrieval operations.In the context of web search engines, query expansion involves evaluating a user's input and expanding the search query to match additional documents...
- StemmingStemmingIn linguistic morphology and information retrieval, stemming is the process for reducing inflected words to their stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same...
- Search engine indexingIndex (search engine)Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, physics, and computer science...
- Poison wordsPoison wordsPoison words, or forbidden words, is the name given to words or phrases that trigger suspicion, mistrust and loss of respect, or are of inappropriate character for a given web site in its consideration for a search engine....
- Function words
External links
- List of English Stop Words (PHP array, CSV)
- English Stop Words (CSV)
- Hindi Stop Words
- German Stop Words, another list of German stop words
- Polish Stop Words