Preservation Metadata: Implementation Strategies (PREMIS)
Encyclopedia
PREMIS is an international working group concerned with developing metadata for use in digital preservation.
In 2003 the Online Computer Library Center (OCLC) and Research Libraries Group
(RLG) established the PREMIS working group, which consisted of a multi-national roster of more than thirty representatives from the cultural, government, and private sectors, in order to define implementable, core preservation metadata, with guidelines/recommendations for management and use. PREMIS was “charged to define a set of semantic units that are implementation independent, practically oriented, and likely to be needed by most preservation repositories”.
In May 2005, PREMIS released Data Dictionary for Preservation Metadata: Final Report of the PREMIS Working Group. This 237-page report includes: PREMIS Data Dictionary 1.0: a comprehensive, practical resource for implementing preservation metadata in digital archiving systems; accompanying report (providing context, data model, assumptions); special topics, glossary, usage examples; set of XML schema which was developed to support use of the Data Dictionary. Version 2.0 of PREMIS was released in March 2008.
An intellectual entity is a set of content that constitutes a discrete, coherent intellectual unit, such as a book or a database. These may be compound objects containing other intellectual entities and may have multiple digital representations. Descriptive metadata is usually applied at this level; given the proliferation of competing schemes, the working group did not define any further descriptive semantic units and allowed for interoperability through “extension containers” (containers hold a related group of semantic units) that can be used for external schemes.
Most of the semantic units listed in the data dictionary relate to object and event entities, the former being further divided into three subtypes of file, bitstream, and representation. A file is the level at which most end users are used to working, a “named and ordered sequence of bytes that is known by an operating system.” It includes a variety of file system attributes, rendering it understandable by an operating system, encompassing bitstreams, which are “contiguous or non-contiguous data within a file that has meaningful common properties for preservation purposes.” A representation is, in a sense, the “highest level” of this model, for it may encompass several files in order to properly render the structure and content of an intellectual entity. Not all repositories will be concerned with preserving representations, depending on their purpose and the curatorial body’s need to preserve what might be considered the entity’s digital “intrinsic value.” Furthermore, intellectual entities may have multiple representations within a repository. Events interrelate with objects insofar as they involve actions that have an effect on them or agents ("a person, organization, or software...associated with Events...or with Rights attached to an object") associated with the object.
Finally, the inclusion of rights entities responds to an increased awareness of and concern for the legal requirements of copyright and licensing. It also includes information about the specific actions permitted; for example, semantic unit 4.1.6.1, act, “the action the preservation repository is allowed to take,” includes such suggested values as replicate, migrate, and delete.
In 2003 the Online Computer Library Center (OCLC) and Research Libraries Group
Research Libraries Group
The Research Libraries Group was a U.S.-based library consortium which developed the Eureka interlibrary search engine, the RedLightGreen database of bibliographic descriptions and ArchiveGrid, a database containing descriptions of archival collections...
(RLG) established the PREMIS working group, which consisted of a multi-national roster of more than thirty representatives from the cultural, government, and private sectors, in order to define implementable, core preservation metadata, with guidelines/recommendations for management and use. PREMIS was “charged to define a set of semantic units that are implementation independent, practically oriented, and likely to be needed by most preservation repositories”.
In May 2005, PREMIS released Data Dictionary for Preservation Metadata: Final Report of the PREMIS Working Group. This 237-page report includes: PREMIS Data Dictionary 1.0: a comprehensive, practical resource for implementing preservation metadata in digital archiving systems; accompanying report (providing context, data model, assumptions); special topics, glossary, usage examples; set of XML schema which was developed to support use of the Data Dictionary. Version 2.0 of PREMIS was released in March 2008.
Entities
The PREMIS data model consists of five interrelated entities: Intellectual, Object, Event, Agent, and Rights with each semantic unit mapped to one of these areas.An intellectual entity is a set of content that constitutes a discrete, coherent intellectual unit, such as a book or a database. These may be compound objects containing other intellectual entities and may have multiple digital representations. Descriptive metadata is usually applied at this level; given the proliferation of competing schemes, the working group did not define any further descriptive semantic units and allowed for interoperability through “extension containers” (containers hold a related group of semantic units) that can be used for external schemes.
Most of the semantic units listed in the data dictionary relate to object and event entities, the former being further divided into three subtypes of file, bitstream, and representation. A file is the level at which most end users are used to working, a “named and ordered sequence of bytes that is known by an operating system.” It includes a variety of file system attributes, rendering it understandable by an operating system, encompassing bitstreams, which are “contiguous or non-contiguous data within a file that has meaningful common properties for preservation purposes.” A representation is, in a sense, the “highest level” of this model, for it may encompass several files in order to properly render the structure and content of an intellectual entity. Not all repositories will be concerned with preserving representations, depending on their purpose and the curatorial body’s need to preserve what might be considered the entity’s digital “intrinsic value.” Furthermore, intellectual entities may have multiple representations within a repository. Events interrelate with objects insofar as they involve actions that have an effect on them or agents ("a person, organization, or software...associated with Events...or with Rights attached to an object") associated with the object.
Finally, the inclusion of rights entities responds to an increased awareness of and concern for the legal requirements of copyright and licensing. It also includes information about the specific actions permitted; for example, semantic unit 4.1.6.1, act, “the action the preservation repository is allowed to take,” includes such suggested values as replicate, migrate, and delete.
Data dictionary
PREMIS data dictionary entries include twelve attribute fields, not all of which are applied to every semantic unit (analogous to an "element" in other metadata schemes). In addition to the name and definition of the unit, the fields record such things as rationale for including the unit, usage notes, and examples of how the value might be filled in. Four of the attributes - object category, applicability, repeatability, and obligation - are linked, as the last three are defined for each of the object entity levels of file, bitstream, and representation. The dictionary is hierarchical; some semantic units are contained within others. For example, 1.3 preservationLevel, includes four semantic components, such as 1.3.1 preservationLevelValue and 1.3.2 preservationLevelRole.See also
- Digital preservationDigital preservationDigital preservation is the set of processes, activities and management of digital information over time to ensure its long term accessibility. The goal of digital preservation is to preserve materials resulting from digital reformatting, and particularly information that is born-digital with no...
- Preservation metadataPreservation metadataPreservation metadata is an essential component of most digital preservation strategies. As an increasing proportion of the world’s information output shifts from analog to digital form, it is necessary to develop new strategies to preserve this information for the long-term. Preservation metadata...
- MetadataMetadataThe term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
- Digital libraryDigital libraryA digital library is a library in which collections are stored in digital formats and accessible by computers. The digital content may be stored locally, or accessed remotely via computer networks...
- Open Archives Initiative Protocol for Metadata HarvestingOpen Archives Initiative Protocol for Metadata HarvestingOAI-PMH is a protocol developed by the Open Archives Initiative. It is used to harvest the metadata descriptions of the records in an archive so that services can be built using metadata from many archives...
(OAI-PMH) - Metadata Encoding and Transmission StandardMETSThe Metadata Encoding and Transmission Standard is a metadata standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium...
maintained by the Library of CongressLibrary of CongressThe Library of Congress is the research library of the United States Congress, de facto national library of the United States, and the oldest federal cultural institution in the United States. Located in three buildings in Washington, D.C., it is the largest library in the world by shelf space and... - Dublin CoreDublin CoreThe Dublin Core metadata terms are a set of vocabulary terms which can be used to describe resources for the purposes of discovery. The terms can be used to describe a full range of web resources: video, images, web pages etc and physical resources such as books and objects like artworks...
, an ISO metadata standard