Web search query
Encyclopedia
A web search query is a query that a user enters into web search engine
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...

 to satisfy his or her information needs
Information needs
Information need is an individual or group's desire to locate and obtain information to satisfy a conscious or unconscious need. The ‘information’ and ‘need’ in ‘information need’ are inseparable interconnection. Needs and interests call forth information...

. Web search queries are distinctive in that they are unstructured and often ambiguous; they vary greatly from standard query language
Query language
Query languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...

s which are governed by strict syntax rules.

Types

There are four broad categories that cover most web search queries:
  • Informational queries – Queries that cover a broad topic (e.g., colorado or trucks) for which there may be thousands of relevant results.

  • Navigational queries – Queries that seek a single website or web page of a single entity (e.g., youtube or delta air lines).

  • Transactional queries – Queries that reflect the intent of the user to perform a particular action, like purchasing a car or downloading a screen saver.


Search engines often support a fourth type of query that is used far less frequently:
  • Connectivity queries – Queries that report on the connectivity of the indexed web graph (e.g., Which links point
    Backlink
    Backlinks, also known as incoming links, inbound links, inlinks, and inward links, are incoming links to a website or web page...

     to this URL
    Uniform Resource Locator
    In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....

    ?, and How many pages are indexed from this domain name
    Domain name
    A domain name is an identification string that defines a realm of administrative autonomy, authority, or control in the Internet. Domain names are formed by the rules and procedures of the Domain Name System ....

    ?).

Characteristics

Most commercial web search engines do not disclose their search logs, so information about what users are searching for on the Web is difficult to come by. Nevertheless, a study in 2001 analyzed the queries from the Excite
Excite
Excite is a collection of Internet sites and services owned by IAC Search & Media, which is a subsidiary of InterActive Corporation . Launched in 1994, it is an online service offering a variety of content, including an Internet portal, a search engine, a web-based email, instant messaging, stock...

 search engine showed some interesting characteristics of web search:
  • The average length of a search query was 2.4 terms.
  • About half of the users entered a single query while a little less than a third of users entered three or more unique queries.
  • Close to half of the users examined only the first one or two pages of results (10 results per page).
  • Less than 5% of users used advanced search features (e.g., Boolean operators like AND, OR, and NOT).
  • The top four most frequently used terms were , (empty search), and, of, and sex.


A study of the same Excite query logs revealed that 19% of the queries contained a geographic term (e.g., place names, zip codes, geographic features, etc.).

A 2005 study of Yahoo's query logs revealed 33% of the queries from the same user were repeat queries and that 87% of the time the user would click on the same result. This suggests that many users use repeat queries to revisit or re-find information. This analysis is confirmed by a Bing search engine blog post telling about 30% queries are navigational queries

In addition, much research has shown that query term frequency distributions conform to the power law
Power law
A power law is a special kind of mathematical relationship between two quantities. When the frequency of an event varies as a power of some attribute of that event , the frequency is said to follow a power law. For instance, the number of cities having a certain population size is found to vary...

, or long tail distribution curves. That is, a small portion of the terms observed in a large query log (e.g. > 100 million queries) are used most often, while the remaining terms are used less often individually. This example of the Pareto principle
Pareto principle
The Pareto principle states that, for many events, roughly 80% of the effects come from 20% of the causes.Business-management consultant Joseph M...

 (or 80-20 rule) allows search engines to employ optimization techniques such as index or database partitioning
Partition (database)
A partition is a division of a logical database or its constituting elements into distinct independent parts. Database partitioning is normally done for manageability, performance or availability reasons....

, caching
Cache
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...

 and pre-fetching.

Structured queries

With search engines that support Boolean operators and parentheses, a technique traditionally used by librarians can be applied. A user who is looking for documents that cover several topics or facets may want to describe each of them by a disjunction
Logical disjunction
In logic and mathematics, a two-place logical connective or, is a logical disjunction, also known as inclusive disjunction or alternation, that results in true whenever one or more of its operands are true. E.g. in this context, "A or B" is true if A is true, or if B is true, or if both A and B are...

 of characteristic words, such as vehicles OR cars OR automobiles. A faceted query is a conjunction
Logical conjunction
In logic and mathematics, a two-place logical operator and, also known as logical conjunction, results in true if both of its operands are true, otherwise the value of false....

of such facets; e.g. a query such as (electronic OR computerized OR DRE) AND (voting OR elections OR election OR balloting OR electoral) is likely to find documents about electronic voting even if they omit one of the words "electronic" and "voting", or even both.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK