Mediabot - AbsoluteAstronomy.com

Mediabot is the name given to the web crawler

Web crawler

A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or—especially in the FOAF community—Web scutters.This process is called Web...

that Google

Google

Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

uses to crawl webpages for purposes of analysing the content so Google AdSense

AdSense

Google AdSense which is a program run by Google Inc. allows publishers in the Google Network of content sites to automatically serve text, image, video, and rich media adverts that are targeted to site content and audience. These adverts are administered, sorted, and maintained by Google, and they...

can serve contextually relevant

Contextual advertising

Contextual advertising is a form of targeted advertising for advertisements appearing on websites or other media, such as content displayed in mobile browsers...

advertising to the page.

Mediabot visits those pages running AdSense ads that have not blocked its access via a robots.txt

Robots Exclusion Standard

The Robot Exclusion Standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web crawlers and other web robots from accessing all or part of a website which is otherwise publicly viewable. Robots are often used by search engines to...

file

Computer file

A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished...

and it's a Google recommendation that webmasters specifically add a command to their robots.txt file granting Mediabot access to the entire site. Here is how to do it:

User-agent: Mediapartners-Google*

Disallow:

The Mediabot identifies itself with the user agent

User agent

In computing, a user agent is a client application implementing a network protocol used in communications within a client–server distributed computing system...

string "Mediapartners-Google/2.1".

The Mediabot revisits pages on a regular, but unpredictable basis. Changes made to a page therefore do not immediately cause changes to the ads displayed on the page. Note that ads can still be shown on a page even if the Mediabot has not yet visited it, in which case the ads chosen will be based on the overall theme of the other pages on the site. If no ads can be chosen, public service announcements are displayed instead.

You can keep some parts of the text from being crawled using:

google_ad_section_start(weight=ignore) -->

-- google_ad_section_end -->

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.