Web search query
Encyclopedia
A web search query is a query that a user enters into web search engine
to satisfy his or her information needs
. Web search queries are distinctive in that they are unstructured and often ambiguous; they vary greatly from standard query language
s which are governed by strict syntax rules.
Search engines often support a fourth type of query that is used far less frequently:
search engine showed some interesting characteristics of web search:
A study of the same Excite query logs revealed that 19% of the queries contained a geographic term (e.g., place names, zip codes, geographic features, etc.).
A 2005 study of Yahoo's query logs revealed 33% of the queries from the same user were repeat queries and that 87% of the time the user would click on the same result. This suggests that many users use repeat queries to revisit or re-find information. This analysis is confirmed by a Bing search engine blog post telling about 30% queries are navigational queries
In addition, much research has shown that query term frequency distributions conform to the power law
, or long tail distribution curves. That is, a small portion of the terms observed in a large query log (e.g. > 100 million queries) are used most often, while the remaining terms are used less often individually. This example of the Pareto principle
(or 80-20 rule) allows search engines to employ optimization techniques such as index or database partitioning
, caching
and pre-fetching.
of characteristic words, such as
of such facets; e.g. a query such as
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...
to satisfy his or her information needs
Information needs
Information need is an individual or group's desire to locate and obtain information to satisfy a conscious or unconscious need. The ‘information’ and ‘need’ in ‘information need’ are inseparable interconnection. Needs and interests call forth information...
. Web search queries are distinctive in that they are unstructured and often ambiguous; they vary greatly from standard query language
Query language
Query languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...
s which are governed by strict syntax rules.
Types
There are four broad categories that cover most web search queries:- Informational queries – Queries that cover a broad topic (e.g., colorado or trucks) for which there may be thousands of relevant results.
- Navigational queries – Queries that seek a single website or web page of a single entity (e.g., youtube or delta air lines).
- Transactional queries – Queries that reflect the intent of the user to perform a particular action, like purchasing a car or downloading a screen saver.
Search engines often support a fourth type of query that is used far less frequently:
- Connectivity queries – Queries that report on the connectivity of the indexed web graph (e.g., Which links pointBacklinkBacklinks, also known as incoming links, inbound links, inlinks, and inward links, are incoming links to a website or web page...
to this URLUniform Resource LocatorIn computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
?, and How many pages are indexed from this domain nameDomain nameA domain name is an identification string that defines a realm of administrative autonomy, authority, or control in the Internet. Domain names are formed by the rules and procedures of the Domain Name System ....
?).
Characteristics
Most commercial web search engines do not disclose their search logs, so information about what users are searching for on the Web is difficult to come by. Nevertheless, a study in 2001 analyzed the queries from the ExciteExcite
Excite is a collection of Internet sites and services owned by IAC Search & Media, which is a subsidiary of InterActive Corporation . Launched in 1994, it is an online service offering a variety of content, including an Internet portal, a search engine, a web-based email, instant messaging, stock...
search engine showed some interesting characteristics of web search:
- The average length of a search query was 2.4 terms.
- About half of the users entered a single query while a little less than a third of users entered three or more unique queries.
- Close to half of the users examined only the first one or two pages of results (10 results per page).
- Less than 5% of users used advanced search features (e.g., Boolean operators like AND, OR, and NOT).
- The top four most frequently used terms were , (empty search), and, of, and sex.
A study of the same Excite query logs revealed that 19% of the queries contained a geographic term (e.g., place names, zip codes, geographic features, etc.).
A 2005 study of Yahoo's query logs revealed 33% of the queries from the same user were repeat queries and that 87% of the time the user would click on the same result. This suggests that many users use repeat queries to revisit or re-find information. This analysis is confirmed by a Bing search engine blog post telling about 30% queries are navigational queries
In addition, much research has shown that query term frequency distributions conform to the power law
Power law
A power law is a special kind of mathematical relationship between two quantities. When the frequency of an event varies as a power of some attribute of that event , the frequency is said to follow a power law. For instance, the number of cities having a certain population size is found to vary...
, or long tail distribution curves. That is, a small portion of the terms observed in a large query log (e.g. > 100 million queries) are used most often, while the remaining terms are used less often individually. This example of the Pareto principle
Pareto principle
The Pareto principle states that, for many events, roughly 80% of the effects come from 20% of the causes.Business-management consultant Joseph M...
(or 80-20 rule) allows search engines to employ optimization techniques such as index or database partitioning
Partition (database)
A partition is a division of a logical database or its constituting elements into distinct independent parts. Database partitioning is normally done for manageability, performance or availability reasons....
, caching
Cache
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...
and pre-fetching.
Structured queries
With search engines that support Boolean operators and parentheses, a technique traditionally used by librarians can be applied. A user who is looking for documents that cover several topics or facets may want to describe each of them by a disjunctionLogical disjunction
In logic and mathematics, a two-place logical connective or, is a logical disjunction, also known as inclusive disjunction or alternation, that results in true whenever one or more of its operands are true. E.g. in this context, "A or B" is true if A is true, or if B is true, or if both A and B are...
of characteristic words, such as
vehicles OR cars OR automobiles
. A faceted query is a conjunctionLogical conjunction
In logic and mathematics, a two-place logical operator and, also known as logical conjunction, results in true if both of its operands are true, otherwise the value of false....
of such facets; e.g. a query such as
(electronic OR computerized OR DRE) AND (voting OR elections OR election OR balloting OR electoral)
is likely to find documents about electronic voting even if they omit one of the words "electronic" and "voting", or even both.