Scientific data archiving
Encyclopedia
Scientific data archiving refers to the long-term storage of scientific data and methods. The various scientific journals have differing policies regarding how much of their data and methods scientists are required to store in a public archive, and what is actually archived varies widely between different disciplines. Similarly, the major grant-giving institutions have varying attitudes towards public archival of data. In general, the tradition of science has been for publications to contain sufficient information to allow fellow researchers to replicate and therefore test the research. In recent years this approach has become strained as increasingly research in some areas depends on large datasets which cannot easily be replicated independently.

Data archiving is more important in some fields than others. In a few fields, all of the data necessary to replicate the work is already available in the journal article. In drug development, a great deal of data is generated and must be archived so researchers can verify that the reports the drug companies publish accurately reflect the data.

The requirement of data archiving is a recent development in the history of science
History of science
The history of science is the study of the historical development of human understandings of the natural world and the domains of the social sciences....

. It was made possible by advances in information technology
Information technology
Information technology is the acquisition, processing, storage and dissemination of vocal, pictorial, textual and numerical information by a microelectronics-based combination of computing and telecommunications...

 allowing large amounts of data to be stored and accessed from central locations. For example, the American Geophysical Union
American Geophysical Union
The American Geophysical Union is a nonprofit organization of geophysicists, consisting of over 50,000 members from over 135 countries. AGU's activities are focused on the organization and dissemination of scientific information in the interdisciplinary and international field of geophysics...

 (AGU) adopted their first policy on data archiving in 1993, about three years after the beginning of the WWW.. This policy mandates that datasets cited in AGU papers must be archived by a recognised data center; it permits the creation of "data papers"; and it establishes AGU's role in maintaining data archives. But it makes no requirements on paper authors to archive their data.

Prior to data archiving, researchers who wanted to evaluate or replicate a paper would have to request data and methods information from the author. The science community expects authors to share supplemental data. This process was recognized as wasteful of time and energy and obtained mixed results. Information could become lost or corrupted over the years. In some cases, authors simply refuse to provide the information.

The need for data archiving and due diligence is greatly increased when the research deals with health issues or public policy formation.

The American Naturalist

Journal of Heredity

Molecular Ecology

Nature

Such material must be hosted on an accredited independent site (URL and accession numbers to be provided by the author), or sent to the Nature journal at submission, either uploaded via the journal's online submission service, or if the files are too large or in an unsuitable format for this purpose, on CD/DVD (five copies). Such material cannot solely be hosted on an author's personal or institutional web site.

Nature requires the reviewer to determine if all of the supplementary data and methods have been archived. The policy advises reviewers to consider several questions, including: "Should the authors be asked to provide supplementary methods or data to accompany the paper online? (Such data might include source code for modelling studies, detailed experimental protocols or mathematical derivations.)"

Science

‘’’Database deposition policy’’’ – Science supports the efforts of databases that aggregate published data for the use of the scientific community. Therefore, before publication, large data sets (including microarray data, protein or DNA sequences, and atomic coordinates or electron microscopy maps for macromolecular structures) must be deposited in an approved database and an accession number provided for inclusion in the published paper.

‘’’Materials and methods’’’ – Science now requests that, in general, authors place the bulk of their description of materials and methods online as supporting material, providing only as much methods description in the print manuscript as is necessary to follow the logic of the text. (Obviously, this restriction will not apply if the paper is fundamentally a study of a new method or technique.)

Policies by funding agencies

In the United States, the National Science Foundation
National Science Foundation
The National Science Foundation is a United States government agency that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National Institutes of Health...

 (NSF) is tightening requirements on data archiving. Researchers seeking funding from NSF will be required to file a data management plan
Data management plan
A data management plan is a formal document that outlines how you will handle your data both during your research, and after the project is completed...

 as a two-page supplement to the grant application.

The NSF Datanet
Datanet
This article is about the U.S. National Science Foundation Office of Cyberinfrastructure . For the ISP, Datanet please visit Datanet .On September 28, 2007, the U.S. National Science Foundation Office of Cyberinfrastructure announced a request for proposals with the name Sustainable Digital Data...

 initiative has resulted in funding of the Data Observation Network for Earth (DataONE
DataONE
Data Observation Network for Earth is a project supported by the National Science Foundation under the DataNet program. DataONE will provide scientific data archiving for ecological and environmental data produced by scientists worldwide. DataONE's stated goal is to preserve and provide access to...

) project, which will provide scientific data archiving for ecological and environmental data produced by scientists worldwide. DataONE's stated goal is to preserve and provide access to multi-scale, multi-discipline, and multi-national data. The community of users for DataONE includes scientists, ecosystem managers, policy makers, students, educators, and the public.

In heart research

Dr. Ram Singh, a cardiologist practicing in India, has published research in many prestigious journals including The Lancet and American Journal of Cardiology. In 1992, Singh published research on heart attack victims in BMJ, The British Medical Association's flagship journal. The study was cited more than 200 times in scientific journals and in recommendations to doctors. His research was questioned in 1994. Dr. Richard Smith, BMJ's editor, wanted to investigate and consulted a statistician named Stephan Evans. Evans said a full review could only be done if he had the raw (i.e. unprocessed) data. Smith feared that Singh would refuse to provide raw data. However, Smith did ask for raw data on a study submitted by Singh in 1994. Eight months later a box of papers arrived. Evans statistical analysis showed Singh's work to be full of inconsistencies and errors and should be retracted. The medical journal investigation lasted for 12 years before deciding the research was probably fraudulent. The Alliance for Human Research Protection looked into the matter and recommended that journal editors must "adopt a PUBLICATION REQUIREMENT for all authors submitting clinical trial reports if they want to protect the integrity of both the journals and the scientific literature. Authors should be REQUIRED to submit ALL RAW DATA along with their research report." (emphasis in the original)

Data archives

  • CISL Research Data Archive
    CISL Research Data Archive
    CISL Research Data Archive archives data for atmospheric and geosciences research. The archive is maintained by the Data Support Section of the Computational and Information Systems Laboratory at the National Center for Atmospheric Research in Boulder, Colorado.The archive maintains an extensive...

  • Dryad
  • ESO/ST-ECF Science Archive Facility
    ESO/ST-ECF Science Archive Facility
    The ESO/ST-ECF Science Archive Facility is an electronic archive for astronomical data. It currently contains more than 40.0 Terabytes of scientific data obtained with the ESA/NASA Hubble Space Telescope , with the ESO New Technology Telescope and Very Large Telescope and with the Wide Field...

  • International Tree-Ring Data Bank
  • Knowledge Network for Biocomplexity
  • National Archive of Computerized Data on Aging
    National Archive of Computerized Data on Aging
    The National Archive of Computerized Data on Aging is the world’s largest repository of secondary data on aging and health. NACDA’s mission is to advance research on aging by providing easy access to secondary data and by providing research support for this data.-Description:A program within the...

  • National Archive of Criminal Justice Data http://www.icpsr.umich.edu/nacjd
  • National Climatic Data Center
    National Climatic Data Center
    The United States National Climatic Data Center in Asheville, North Carolina is the world's largest active archive of weather data. The center became established in late 1951, with the move into the new facility occurring in early 1952....

  • National Geophysical Data Center
    National Geophysical Data Center
    The National Geophysical Data Center provides scientific stewardship, products and services for geophysical data describing the solid earth, marine, and solar-terrestrial environment, as well as earth observations from space....

  • National Snow and Ice Data Center
    National Snow and Ice Data Center
    The National Snow and Ice Data Center, or NSIDC, is a United States information and referral center in support of polar and cryospheric research...

  • National Oceanographic Data Center
    National Oceanographic Data Center
    The National Oceanographic Data Center is one of the national environmental data centers operated by the National Oceanic and Atmospheric Administration of the U.S. Department of Commerce. The main NODC facility is located in Silver Spring, Maryland and is made up of five divisions...

  • Oak Ridge National Laboratory Distributed Active Archive Center
  • Pangaea - Data Publisher for Earth & Environmental Science
    PANGAEA (data library)
    PANGAEA - Data Publisher for Earth & Environmental Science is a digital data library and a data publisher for earth system science. Data can be georeferenced in time and space ....

  • World Data Center
    World Data Center
    The World Data Center system was created to archive and distribute data collected from the observational programs of the 1957-1958 International Geophysical Year. Originally established in the United States, Europe, Soviet Union, and Japan, the WDC system has since expanded to other countries and...

  • DataONE

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK