Archive site
Encyclopedia
In web archiving
Web archiving
Web archiving is the process of collecting portions of the World Wide Web and ensuring the collection is preserved in an archive, such as an archive site, for future researchers, historians, and the public. Due to the massive size of the Web, web archivists typically employ web crawlers for...

, an archive site is a website
Website
A website, also written as Web site, web site, or simply site, is a collection of related web pages containing images, videos or other digital assets. A website is hosted on at least one web server, accessible via a network such as the Internet or a private local area network through an Internet...

 that stores information on, or the actual, webpages from the past for anyone to view.

Common techniques

Two common techniques are #1 using a web crawler
Web crawler
A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or—especially in the FOAF community—Web scutters.This process is called Web...

 or #2 user submissions.
  1. By using a web crawler the service will not depend on an active community for their content, thereby building a larger database faster, which usually results in the community growing larger as well. However, web site developers and system administrators do have the ability to block these robots from accessing [certain] web pages (using a robots.txt
    Robots Exclusion Standard
    The Robot Exclusion Standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web crawlers and other web robots from accessing all or part of a website which is otherwise publicly viewable. Robots are often used by search engines to...

    ).
  2. While it can be difficult to start such services due to potentially low rates of user submission, this system can yield some of the best results. By crawling web pages one is only able to obtain the information the public has bothered to post to the Internet. They may have not bothered to post it due to not thinking anyone would be interested in it, lack of a proper medium, etc. However, if they see someone wants their information then they may be more apt to submit it.

Google Groups

On February 12, 2001, Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

 acquired the Usenet
Usenet
Usenet is a worldwide distributed Internet discussion system. It developed from the general purpose UUCP architecture of the same name.Duke University graduate students Tom Truscott and Jim Ellis conceived the idea in 1979 and it was established in 1980...

 discussion group archives from Deja.com and turned it into their Google Groups
Google Groups
Google Groups is a service from Google Inc. that supports discussion groups, including many Usenet newsgroups, based on common interests. The service was started in 1995 as Deja News, and was transitioned to Google Groups after a February 2001 buyout....

 service http://www.google.com/press/pressrel/pressrelease48.html. They allow users to search old discussions with Google's search technology, while still allowing users to post to the mailing list
Mailing list
A mailing list is a collection of names and addresses used by an individual or an organization to send material to multiple recipients. The term is often extended to include the people subscribed to such a list, so the group of subscribers is referred to as "the mailing list", or simply "the...

s.

Internet Archive

The Internet Archive
Internet Archive
The Internet Archive is a non-profit digital library with the stated mission of "universal access to all knowledge". It offers permanent storage and access to collections of digitized materials, including websites, music, moving images, and nearly 3 million public domain books. The Internet Archive...

 (official website) is building a compendium of websites and digital media
Digital media
Digital media is a form of electronic media where data is stored in digital form. It can refer to the technical aspect of storage and transmission Digital media is a form of electronic media where data is stored in digital (as opposed to analog) form. It can refer to the technical aspect of...

. Starting in 1996, Archive has been employing a web crawler to build up their database. They are one of the best known archive sites.

TEXTFILES.COM

TEXTFILES.COM
Textfiles.com
textfiles.com is a web site run by Jason Scott dedicated to preserving the digital documents that contain the history of the BBS world and various subcultures. The site categorises and stores thousands of ASCII files. It focuses on text files from the 1980s, but also contains some older files and...

 (http://www.textfiles.com) is a large library of old text files maintained by Jason Scott Sadofsky
Jason Scott Sadofsky
Jason Scott Sadofsky , more commonly known as Jason Scott, is an American archivist and historian of technology. He is the creator, owner and maintainer of textfiles.com, a web site which...

. Its mission is to archive the old documents that had floated around the bulletin board systems (BBS) of his youth and to document other people's experiences on the BBSes.

PANDORA Archive

PANDORA (Pandora Archive
Pandora Archive
PANDORA - Australia's Web Archive is the national web archive for the preservation of Australia's online publications. It was established by the National Library of Australia in 1996, and is now built in collaboration with a number of other Australian state libraries and cultural collecting...

), founded in 1996 by the National Library of Australia
Australia
Australia , officially the Commonwealth of Australia, is a country in the Southern Hemisphere comprising the mainland of the Australian continent, the island of Tasmania, and numerous smaller islands in the Indian and Pacific Oceans. It is the world's sixth-largest country by total area...

, stands for Preserving and Accessing Networked Documentary Resources of Australia, which encapsulates their mission. They provide a long-term catalog of select online publications and web sites authored by Australians or that are of an Australian topic. They employ their PANDAS (PANDORA Digital Archiving System) when building their catalog.

Nextpoint

Nextpoint offers an automated cloud-based, SaaS for marketing, compliance and litigation related needs including electronic discovery.

See also

  • Internet Archive
    Internet Archive
    The Internet Archive is a non-profit digital library with the stated mission of "universal access to all knowledge". It offers permanent storage and access to collections of digitized materials, including websites, music, moving images, and nearly 3 million public domain books. The Internet Archive...

  • Pandora Archive
    Pandora Archive
    PANDORA - Australia's Web Archive is the national web archive for the preservation of Australia's online publications. It was established by the National Library of Australia in 1996, and is now built in collaboration with a number of other Australian state libraries and cultural collecting...

  • WebCite
    WebCite
    WebCite is a service that archives web pages on demand. Authors can subsequently cite the archived web pages through WebCite, in addition to citing the original URL of the web page. Readers are able to retrieve the archived web pages indefinitely, without regard to whether the original web page is...

  • Web archiving
    Web archiving
    Web archiving is the process of collecting portions of the World Wide Web and ensuring the collection is preserved in an archive, such as an archive site, for future researchers, historians, and the public. Due to the massive size of the Web, web archivists typically employ web crawlers for...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK