Googlebot - AbsoluteAstronomy.com

Googlebot is the search bot software used by Google

Google

Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

, which collects document

Document

The term document has multiple meanings in ordinary language and in scholarship. WordNet 3.1. lists four meanings :* document, written document, papers...

s from the web

World Wide Web

The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

to build a searchable index for the Google search

Google search

Google or Google Web Search is a web search engine owned by Google Inc. Google Search is the most-used search engine on the World Wide Web, receiving several hundred million queries each day through its various services....

engine.

If a webmaster

Webmaster

A webmaster , also called a web architect, web developer, site author, or website administrator is a person responsible for maintaining one or many websites...

wishes to restrict the information on their site available to a Googlebot, or another well-behaved spider, they can do so with the appropriate directives in a robots.txt file, or by adding the meta tag

Meta element

Meta elements are the HTML or XHTML <meta … > element used to provide structured metadata about a Web page. Multiple elements are often used on the same page: the element is the same, but its attributes are different...

to the web page. Googlebot requests to Web server

Web server

Web server can refer to either the hardware or the software that helps to deliver content that can be accessed through the Internet....

s are identifiable by a user-agent string containing "Googlebot" and a host address

Host address

The host address, or the host ID portion of an IP address, is the portion of the address used to identify hosts on the network...

containing "googlebot.com".

Currently, Googlebot follows HREF links and SRC links. There is increasing evidence Googlebot can execute javascript and parse content generated by Ajax

Ajax (programming)

Ajax is a group of interrelated web development methods used on the client-side to create asynchronous web applications...

calls as well. Googlebot discovers pages by harvesting all of the links on every page it finds. It then follows these links to other web pages. New web pages must be linked to other known pages on the web in order to be crawled and indexed or manually submitted by the webmaster.

A problem which webmasters have often noted with the Googlebot is that it takes up an enormous amount of bandwidth. This can cause websites to exceed their bandwidth limit and be taken down temporarily. This is especially troublesome for mirror

Mirror (computing)

In computing, a mirror is an exact copy of a data set. On the Internet, a mirror site is an exact copy of another Internet site.Mirror sites are most commonly used to provide multiple sources of the same information, and are of particular value as a way of providing reliable access to large downloads...

sites which host many gigabytes of data. Google provides "Webmaster Tools" that allow website owners to throttle the crawl rate.

External links

Google's official Googlebot FAQ

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.