International Internet Preservation Consortium
Encyclopedia

Projects

IIPC sponsored a project on "cross-archival search strategies" which included the creation of an archive focused on the 2010 Winter Olympics
2010 Winter Olympics
The 2010 Winter Olympics, officially the XXI Olympic Winter Games or the 21st Winter Olympics, were a major international multi-sport event held from February 12–28, 2010, in Vancouver, British Columbia, Canada, with some events held in the suburbs of Richmond, West Vancouver and the University...

.

IIPC maintains an electronic mailing list
Electronic mailing list
An electronic mailing list is a special usage of email that allows for widespread distribution of information to many Internet users. It is similar to a traditional mailing list — a list of names and addresses — as might be kept by an organization for sending publications to...

 open to anyone interested in issues associated with web harvesting
Web harvesting
Web harvesting is commonly used to describe Web scraping from a multitude of sites. It also refers to an implementation of a Web crawler that uses human expertise or machine guidance to direct the crawler to URLs which compose a specialized collection or set of knowledge...

, archiving, and quality maintenance issues.

Starting in 2006, the National Library of New Zealand
National Library of New Zealand
The National Library of New Zealand is New Zealand's legal deposit library charged with the obligation to "enrich the cultural and economic life of New Zealand and its interchanges with other nations"...

 and the British Library
British Library
The British Library is the national library of the United Kingdom, and is the world's largest library in terms of total number of items. The library is a major research library, holding over 150 million items from every country in the world, in virtually all known languages and in many formats,...

 developed the Web Curator Tool, an open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

 workflow management application for selective web archiving. Version 1.5.1 was released on December 10, 2010, and is available at SourceForge
SourceForge
SourceForge Enterprise Edition is a collaborative revision control and software development management system. It provides a front-end to a range of software development lifecycle services and integrates with a number of free software / open source software applications .While originally itself...

. The Web Curator Tool is built upon Java technologies such as Apache Tomcat
Apache Tomcat
Apache Tomcat is an open source web server and servlet container developed by the Apache Software Foundation...

, the Spring Framework and Hibernate
Hibernate (Java)
Hibernate is an object-relational mapping library for the Java language, providing a framework for mapping an object-oriented domain model to a traditional relational database...

, and Internet Archive
Internet Archive
The Internet Archive is a non-profit digital library with the stated mission of "universal access to all knowledge". It offers permanent storage and access to collections of digitized materials, including websites, music, moving images, and nearly 3 million public domain books. The Internet Archive...

s technologies such as the Heritrix
Heritrix
Heritrix is the Internet Archive’s web crawler, which was specially designed for web archiving. It is open-source and written in Java. The main interface is accessible using a web browser, and there is a command-line tool that can optionally be used to initiate crawls.Heritrix was developed...

 web archiving crawler, the Nutch
Nutch
Nutch is an effort to build an open source web search engine based on Lucene Java for the search and index component.- Features :Nutch is coded entirely in the Java programming language, but data is written in language-independent formats...

WAX web archive full-text search engine and the Wayback Machine
Wayback Machine
The Wayback Machine is a digital time capsule created by the Internet Archive non-profit organization, based in San Francisco, California. It is maintained with content from Alexa Internet. The service enables users to see archived versions of web pages across time, which the Archive calls a "three...

.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK