A digital library is a library
In a traditional sense, a library is a large collection of books, and can refer to the place in which the collection is housed. Today, the term can refer to any collection, including digital sources, resources, and services...

 in which collections are stored in digital formats (as opposed to print, microform
Microforms are any forms, either films or paper, containing microreproductions of documents for transmission, storage, reading, and printing. Microform images are commonly reduced to about one twenty-fifth of the original document size...

, or other media) and accessible by computers. The digital content may be stored locally, or accessed remotely via computer networks. A digital library is a type of information retrieval
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...


In the context of the DELOS, a Network of Excellence on Digital Libraries, and, a Coordination Action on Digital Library Interoperability, Best Practices and Modelling Foundations, Digital Library researchers and practitioners produced a Digital Library Reference Model which defines a digital library as: "A potentially virtual organisation, that comprehensively collects, manages and preserves for the long depth of time rich digital content, and offers to its targeted user communities specialised functionality on that content, of defined quality and according to comprehensive codified policies."

Actually, this document contains a Digital Library Manifesto which introduces the three types of relevant ‘systems’, i.e. Digital Library, Digital Library System, and Digital Library Management System. It describes the main concepts characterising these systems, i.e., organisation, content, user, functionality, quality, policy and architecture. It introduces the main roles that actors may play within digital libraries, i.e., end-user, manager and software developer. Finally, it describes the reference frameworks needed to clarify the DL universe at different levels of abstraction, i.e., the Digital Library Reference Model and the Digital Library Reference Architecture.

The first use of the term digital library in print may have been in a 1988 report to the Corporation for National Research Initiatives
Corporation for National Research Initiatives
The Corporation for National Research Initiatives , based in Reston, Virginia, is a non-profit organization founded in 1986 by Robert E. Kahn as an "activities center around strategic development of network-based information technologies", including the National Information Infrastructure in the...

 The term digital libraries was first popularized by the NSF
National Science Foundation
The National Science Foundation is a United States government agency that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National Institutes of Health...

The National Aeronautics and Space Administration is the agency of the United States government that is responsible for the nation's civilian space program and for aeronautics and aerospace research...

 Digital Libraries Initiative in 1994. These draw heavily on As We May Think
As We May Think
As We May Think is an essay by Vannevar Bush, first published in The Atlantic Monthly in July 1945, and republished again as an abridged version in September 1945 — before and after the U.S. nuclear attacks on Japan...

 by Vannevar Bush
Vannevar Bush
Vannevar Bush was an American engineer and science administrator known for his work on analog computing, his political role in the development of the atomic bomb as a primary organizer of the Manhattan Project, the founding of Raytheon, and the idea of the memex, an adjustable microfilm viewer...

 in 1945, which set out a vision not in terms of technology, but user experience. The term virtual library was initially used interchangeably with digital library, but is now primarily used for libraries that are virtual in other senses (such as libraries which aggregate distributed content).

A distinction is often made between content that was created in a digital format, known as born-digital
The term born-digital refers to materials that originate in a digital form. This is in contrast to digital reformatting, through which analog materials become digital. It is most often used in relation to digital libraries and the issues that go along with said organizations, such as digital...

, and information that has been converted from a physical medium, e.g., paper, by digitizing. The term hybrid library
Hybrid library
Hybrid library is a term used by librarians to describe libraries containing a mix of traditional print library resources and the growing number of electronic resources.-Overview:...

 is sometimes used for libraries that have both physical collections and digital collections. For example, American Memory
American Memory
American Memory is an Internet-based archive for public domain image resources, as well as audio, video, and archived Web content. It is published by the Library of Congress...

 is a digital library within the Library of Congress
Library of Congress
The Library of Congress is the research library of the United States Congress, de facto national library of the United States, and the oldest federal cultural institution in the United States. Located in three buildings in Washington, D.C., it is the largest library in the world by shelf space and...

. Some important digital libraries also serve as long term archives, for example, the Eprint
An eprint is a digital version of a research document that is accessible online, whether from a local Institutional, or...

The arXiv |Chi]], χ) is an archive for electronic preprints of scientific papers in the fields of mathematics, physics, astronomy, computer science, quantitative biology, statistics, and quantitative finance which can be accessed online. In many fields of mathematics and physics, almost all...

, and the Internet Archive
Internet Archive
The Internet Archive is a non-profit digital library with the stated mission of "universal access to all knowledge". It offers permanent storage and access to collections of digitized materials, including websites, music, moving images, and nearly 3 million public domain books. The Internet Archive...


Academic repositories

Many academic libraries are actively involved in building institutional repositories
Institutional repository
An Institutional repository is an online locus for collecting, preserving, and disseminating - in digital form - the intellectual output of an institution, particularly a research institution....

 of the institution's books, papers, theses, and other works which can be digitized or were 'born digital'. Many of these repositories are made available to the general public with few restrictions, in accordance with the goals of open access, in contrast to the publication of research in commercial journals, where the publishers often limit access rights. Institutional, truly free, and corporate repositories are sometimes referred to as digital libraries.

Digital archives

Physical archives differ from physical libraries in several ways. Traditionally, archives are defined as:
  1. Containing primary source
    Primary source
    Primary source is a term used in a number of disciplines to describe source material that is closest to the person, information, period, or idea being studied....

    s of information (typically letters and papers directly produced by an individual or organization) rather than the secondary sources found in a library (books, periodicals, etc).
  2. Having their contents organized in groups rather than individual items.
  3. Having unique contents.

The technology used to create digital libraries is even more revolutionary for archives since it breaks down the second and third of these general rules. In other words, "digital archives" or "online archives" will still generally contain primary sources, but they are likely to be described individually rather than (or in addition to) in groups or collections. Further, because they are digital their contents are easily reproducible and may indeed have been reproduced from elsewhere. The Oxford Text Archive
Oxford Text Archive
Oxford Text Archive is an archive of electronic texts and other literary and language resources which have been created, collected and distributed for the purpose of research into literary and linguistic topics...

 is generally considered to be the oldest digital archive of academic physical primary source materials.

The future

Large scale digitization projects are underway at Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

, the Million Book Project
Million Book Project
The Million Book Project , is a book digitization project, led by Carnegie Mellon University School of Computer Science and University Libraries...

, and Internet Archive
Internet Archive
The Internet Archive is a non-profit digital library with the stated mission of "universal access to all knowledge". It offers permanent storage and access to collections of digitized materials, including websites, music, moving images, and nearly 3 million public domain books. The Internet Archive...

. With continued improvements in book handling and presentation technologies such as optical character recognition
Optical character recognition
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...

 and ebooks, and development of alternative depositories and business models, digital libraries are rapidly growing in popularity as demonstrated by Google, Yahoo!, and MSN's efforts. Just as libraries have ventured into audio and video collections, so have digital libraries such as the Internet Archive
Internet Archive
The Internet Archive is a non-profit digital library with the stated mission of "universal access to all knowledge". It offers permanent storage and access to collections of digitized materials, including websites, music, moving images, and nearly 3 million public domain books. The Internet Archive...


According to Larry Lannom, Director of Information Management Technology at the nonprofit Corporation should be for National Research Initiatives, "all the problems associated with digital libraries are wrapped up in archiving." He goes on to state, "If in 100 years people can still read your article, we'll have solved the problem." Daniel Akst, author of The Webster Chronicle, proposes that "the future of libraries—and of information—is digital." Peter Lyman and Hal Varian, information scientists at the University of California, Berkeley, estimate that "the world's total yearly production of print, film, optical, and magnetic content would require roughly 1.5 billion gigabytes of storage." Therefore, they believe that "soon it will be technologically possible for an average person to access virtually all recorded information."


Most digital libraries provide a search interface which allows resources to be found. These resources are typically deep web
Deep web
The Deep Web refers to World Wide Web content that is not part of the Surface Web, which is indexed by standard search engines....

 (or invisible web) resources since they frequently cannot be located by search engine crawlers
Web crawler
A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or—especially in the FOAF community—Web scutters.This process is called Web...

. Some digital libraries create special pages or sitemaps to allow search engines to find all their resources. Digital libraries frequently use the Open Archives Initiative Protocol for Metadata Harvesting
Open Archives Initiative Protocol for Metadata Harvesting
OAI-PMH is a protocol developed by the Open Archives Initiative. It is used to harvest the metadata descriptions of the records in an archive so that services can be built using metadata from many archives...

 (OAI-PMH) to expose their metadata to other digital libraries, and search engines like Google Scholar
Google Scholar
Google Scholar is a freely accessible web search engine that indexes the full text of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes most peer-reviewed online journals of Europe and America's largest...

, Yahoo!
Yahoo! Inc. is an American multinational internet corporation headquartered in Sunnyvale, California, United States. The company is perhaps best known for its web portal, search engine , Yahoo! Directory, Yahoo! Mail, Yahoo! News, Yahoo! Groups, Yahoo! Answers, advertising, online mapping ,...

 and Scirus
Scirus is a comprehensive science-specific search engine. Like CiteSeerX and Google Scholar, it is focused on scientific information. Unlike CiteSeerX, Scirus is not only for computer sciences and IT and not all of the results include full text. It also sends its scientific search results to...

 can also use OAI-PMH to find these deep web resources.

There are two general strategies for searching a federation of digital libraries:
  1. distributed searching, and
  2. searching previously harvested metadata
    The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...


Distributed searching typically involves a client sending multiple search requests in parallel to a number of servers in the federation. The results are gathered, duplicates are eliminated or clustered, and the remaining items are sorted and presented back to the client. Protocols like Z39.50
Z39.50 is a client–server protocol for searching and retrieving information from remote computer databases. It is covered by ANSI/NISO standard Z39.50, and ISO standard 23950. The standard's maintenance agency is the Library of Congress....

 are frequently used in distributed searching. A benefit to this approach is that the resource-intensive tasks of indexing and storage are left to the respective servers in the federation. A drawback to this approach is that the search mechanism is limited by the different indexing and ranking capabilities of each database, making it difficult to assemble a combined result consisting of the most relevant found items.

Searching over previously harvested metadata involves searching a locally stored index of information that has previously been collected from the libraries in the federation. When a search is performed, the search mechanism does not need to make connections with the digital libraries it is searching - it already has a local representation of the information. This approach requires the creation of an indexing and harvesting mechanism which operates regularly, connecting to all the digital libraries and querying the whole collection in order to discover new and updated resources. OAI-PMH is frequently used by digital libraries for allowing metadata to be harvested. A benefit to this approach is that the search mechanism has full control over indexing and ranking algorithms, possibly allowing more consistent results. A drawback is that harvesting and indexing systems are more resource-intensive and therefore expensive.


The formal reference models include the DELOS Digital Library Reference Model (Agosti, et al., 2006) and the Streams, Structures, Spaces, Scenarios, Societies (5S) formal framework.
The Reference Model for an Open Archival Information System (OAIS
An Open Archival Information System is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community....

) provides a framework to address digital preservation
Digital preservation
Digital preservation is the set of processes, activities and management of digital information over time to ensure its long term accessibility. The goal of digital preservation is to preserve materials resulting from digital reformatting, and particularly information that is born-digital with no...



There are a number of software packages for use in general digital libraries, for notable ones see Digital library software. Institutional repository software, which focuses primarily on ingest, preservation and access of locally produced documents, particularly locally produced academic outputs, can be found in Institutional repository software.


In the past few years, procedures for digitizing books at high speed and comparatively low cost have improved considerably with the result that it is now possible to plan the digitization of millions of books per year for creating digital.


The advantages of digital libraries as a means of easily and rapidly accessing books, archives and images of various types are now widely recognized by commercial interests and public bodies alike.

Traditional libraries are limited by storage space; digital libraries have the potential to store much more information, simply because digital information requires very little physical space to contain it. As such, the cost of maintaining a digital library is much lower than that of a traditional library.

A traditional library must spend large sums of money paying for staff, book maintenance, rent, and additional books. Digital libraries may reduce or, in some instances, do away with these fees. Both types of library require cataloguing input to allow users to locate and retrieve material. Digital libraries may be more willing to adopt innovations in technology providing users with improvements in electronic and audio book technology as well as presenting new forms of communication such as wikis and blogs; conventional libraries may consider that providing online access to their OPAC catalogue is sufficient. An important advantage to digital conversion is increased accessibility to users. They also increase availability to individuals who may not be traditional patrons of a library, due to geographic location or organizational affiliation.
  • No physical boundary. The user of a digital library need not to go to the library physically; people from all over the world can gain access to the same information, as long as an Internet connection is available.
  • Round the clock availability A major advantage of digital libraries is that people can gain access 24/7 to the information.
  • Multiple access. The same resources can be used simultaneously by a number of institutions and patrons. This may not be the case for copyrighted material: a library may have a license for "lending out" only one copy at a time; this is achieved with a system of digital rights management
    Digital rights management
    Digital rights management is a class of access control technologies that are used by hardware manufacturers, publishers, copyright holders and individuals with the intent to limit the use of digital content and devices after sale. DRM is any technology that inhibits uses of digital content that...

     where a resource can become inaccessible after expiration of the lending period or after the lender chooses to make it inaccessible (equivalent to returning the resource).
  • Information retrieval. The user is able to use any search term (word, phrase, title, name, subject) to search the entire collection. Digital libraries can provide very user-friendly interfaces, giving clickable access to its resources.
  • Preservation and conservation. Digitization is not a long-term preservation solution for physical collections, but does succeed in providing access copies for materials that would otherwise fall to degradation from repeated use. Digitized collections and born-digital objects pose many preservation and conservation concerns that analog materials do not. Please see the following "Problems" section of this page for examples.
  • Space. Whereas traditional libraries are limited by storage space, digital libraries have the potential to store much more information, simply because digital information requires very little physical space to contain them and media storage technologies are more affordable than ever before.
  • Added value. Certain characteristics of objects, primarily the quality of images, may be improved. Digitization can enhance legibility and remove visible flaws such as stains and discoloration.
  • Easily accessible.

Digital preservation

Digital preservation aims to ensure that digital media and information systems are still interpretable into the indefinite future. Each necessary component of this must be migrated, preserved or emulated. Typically lower levels of systems (floppy disk
Floppy disk
A floppy disk is a disk storage medium composed of a disk of thin and flexible magnetic storage medium, sealed in a rectangular plastic carrier lined with fabric that removes dust particles...

s for example) are emulated, bit-streams (the actual files stored in the disks) are preserved and operating systems are emulated as a virtual machine
Virtual machine
A virtual machine is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software emulation or hardware virtualization or both together.-VM Definitions:A virtual machine is a software...

. Only where the meaning and content of digital media and information systems are well understood is migration possible, as is the case for office documents.

Copyright and licensing

Digital libraries are hampered by copyright
Copyright is a legal concept, enacted by most governments, giving the creator of an original work exclusive rights to it, usually for a limited time...

 law because, unlike with traditional libraries, digital libraries do not have access to works from every time period. The republication of material on the Web by libraries may require permission from rights holders, and there is a conflict of interest between them and publishers who may wish to create online versions of their acquired content for commercial purposes. In the year 2010 it was estimated that twenty-three percent of books in existence were created before 1923 and thus out of copyright. Of those printed after this date, only five percent were still in print as of 2010. Thus, approximately seventy-two percent of books were not available to the public.

There is a dilution of responsibility that occurs as a result of the spread-out nature of digital resources. Complex intellectual property matters may become involved since digital material is not always owned by a library. The content is, in many cases, public domain
Public domain
Works are in the public domain if the intellectual property rights have expired, if the intellectual property rights are forfeited, or if they are not covered by intellectual property rights at all...

 or self-generated content only. Some digital libraries, such as Project Gutenberg
Project Gutenberg
Project Gutenberg is a volunteer effort to digitize and archive cultural works, to "encourage the creation and distribution of eBooks". Founded in 1971 by Michael S. Hart, it is the oldest digital library. Most of the items in its collection are the full texts of public domain books...

, work to digitize out-of-copyright works and make them freely available to the public. An estimate of the number of distinct books still existent in library catalogues from 2000 BC to 1960, has been made.

The Fair Use
Fair use
Fair use is a limitation and exception to the exclusive right granted by copyright law to the author of a creative work. In United States copyright law, fair use is a doctrine that permits limited use of copyrighted material without acquiring permission from the rights holders...

 Provisions (17 USC § 107)
Title 17 of the United States Code
Title 17 of the United States Code is the title of the United States Code that outlines United States copyright law.—Subject Matter and Scope of Copyright—Copyright Ownership and Transfer—Duration of Copyright—Copyright Notice, Deposit, and Registration—Copyright Infringement and...

 under the Copyright Act of 1976
Copyright Act of 1976
The Copyright Act of 1976 is a United States copyright law and remains the primary basis of copyright law in the United States, as amended by several later enacted copyright provisions...

 provide specific guidelines under which circumstances libraries are allowed to copy digital resources. Four factors that constitute fair use are "Purpose of the use, Nature of the work, Amount or substantiality used and Market impact."

Some digital libraries acquire a license to "lend out" their resources. This may involve the restriction of lending out only one copy at a time for each license, and applying a system of digital rights management
Digital rights management
Digital rights management is a class of access control technologies that are used by hardware manufacturers, publishers, copyright holders and individuals with the intent to limit the use of digital content and devices after sale. DRM is any technology that inhibits uses of digital content that...

  for this purpose (see also above).

The Digital Millennium Copyright Act
Digital Millennium Copyright Act
The Digital Millennium Copyright Act is a United States copyright law that implements two 1996 treaties of the World Intellectual Property Organization . It criminalizes production and dissemination of technology, devices, or services intended to circumvent measures that control access to...

 of 1998 was an act created in the United States to attempt to deal with the introduction of digital works. This Act incorporates two treaties from the year 1996. It criminalizes the attempt to circumvent measures which limit access to copyrighted materials. It also criminalizes the act of attempting to circumvent access control. This act provides an exemption for nonprofit libraries and archives which allows up to three copies to be made, one of which may be digital. This may not be made public or distributed on the web, however. Further, it allows libraries and archives to copy a work if its format becomes obsolete.

Copyright issues persist. As such, proposals have been put forward suggesting that digital libraries be exempt from copyright law. Although this would be very beneficial to the public, it may have a negative economic effect and authors may be less inclined to create new works.

Metadata creation

In traditional libraries, the ability to find works of interest was directly related to how well they were catalogued. While cataloguing electronic works digitized from a library's existing holding may be as simple as copying or moving a record from the print to the electronic form, complex and born-digital works require substantially more effort. To handle the growing volume of electronic publications, new tools and technologies have to be designed to allow effective automated semantic classification and searching. While full text search
Full text search
In text retrieval, full text search refers to techniques for searching a single computer-stored document or a collection in a full text database...

 can be used for some searches, there are many common catalog searches which cannot be performed using full text, including:
  • finding texts which are translations of other texts
  • linking texts published under pseudonyms to the real authors (Samuel Clemens and Mark Twain
    Mark Twain
    Samuel Langhorne Clemens , better known by his pen name Mark Twain, was an American author and humorist...

    , for example)
  • differentiating non-fiction from parody (The Onion
    The Onion
    The Onion is an American news satire organization. It is an entertainment newspaper and a website featuring satirical articles reporting on international, national, and local news, in addition to a non-satirical entertainment section known as The A.V. Club...

    from The New York Times
    The New York Times
    The New York Times is an American daily newspaper founded and continuously published in New York City since 1851. The New York Times has won 106 Pulitzer Prizes, the most of any news organization...

    , for example)

