Data sharing
Encyclopedia
Data sharing is the practice of making data used for scholarly research available to other investigators. Replication has a long history in science. The motto of The Royal Society is 'Nullius in verba', translated "Take no man's word for it." Many funding agencies, institutions, and publication venues have policies regarding data sharing because transparency and openness are considered by many to be part of the scientific method
.
A number of funding agencies and science journals require authors of peer-review
ed papers to share any supplemental information (raw data, statistical methods or source code) necessary to understand, develop or reproduce
published research. A great deal of scientific research is not subject to data sharing requirements, and many of these policies have liberal exceptions. In the absence of any binding requirement, data sharing is at the discretion of the scientists themselves. In addition, in certain situations agencies and institutions prohibit or severely limit data sharing to protect proprietary interests, national security, and subject/patient/victim confidentiality. Data sharing (especially photographs and graphic descriptions of animal research) may also be restricted to protect institutions and scientists from misuse of data for political purposes by animal rights extremists.
Data and methods may be requested from an author years after publication. In order to encourage data sharing and prevent the loss or corruption of data, a number of funding agencies and journals established policies on data archiving
. Access to publicly archived data is a recent development in the history of science
made possible by technological advances in communications and information technology
.
Despite policies on data sharing and archiving, data withholding still happens. Authors may fail to archive data or they only archive a portion of the data. Failure to archive data alone is not data withholding. When a researcher requests additional information, an author sometimes refuses to provide it. When authors withhold data like this, they run the risk of losing the trust of the science community.
has a philosophy about reproducible research and disclosing all algorithms and source code necessary to reproduce the research. In a paper titled "WaveLab and Reproducible Research," the authors describe some of the problems they encountered in trying to reproduce their own research after a period of time. In many cases, it was so difficult they gave up the effort. These experiences are what convinced them of the importance of disclosing source code. The philosophy is described:
The Data Observation Network for Earth (DataONE
) and Data Conservancy are projects supported by the National Science Foundation
to encourage and facilitate data sharing among research scientists and better support meta-analysis
. In environmental sciences, the research community is recognizing that major scientific advances involving integration of knowledge in and across fields will require that researchers overcome not only the technological barriers to data sharing but also the historically entrenched institutional and sociological barriers. Dr. Richard J. Hodes, director of the National Institute on Aging
has stated, “the old model in which researchers jealously guarded their data is no longer applicable".
The Alliance for Taxpayer Access is a group of organizations that support open access to government sponsored research. The group has expressed a "Statement of Principles" explaining why they believe open access is important. They also list a number of international public access policies.
Funding agencies such as the NIH and NSF tend to require greater sharing of data, but even these requirements tend to acknowledge the concerns of patient confidentiality, costs incurred in sharing data, and the legitimacy of the request. Private interests and public agencies with national security interests (defense and law enforcement) often discourage sharing of data and methods through non-disclosure agreements.
Scientific method
Scientific method refers to a body of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. To be termed scientific, a method of inquiry must be based on gathering empirical and measurable evidence subject to specific principles of...
.
A number of funding agencies and science journals require authors of peer-review
Peer review
Peer review is a process of self-regulation by a profession or a process of evaluation involving qualified individuals within the relevant field. Peer review methods are employed to maintain standards, improve performance and provide credibility...
ed papers to share any supplemental information (raw data, statistical methods or source code) necessary to understand, develop or reproduce
Reproducibility
Reproducibility is the ability of an experiment or study to be accurately reproduced, or replicated, by someone else working independently...
published research. A great deal of scientific research is not subject to data sharing requirements, and many of these policies have liberal exceptions. In the absence of any binding requirement, data sharing is at the discretion of the scientists themselves. In addition, in certain situations agencies and institutions prohibit or severely limit data sharing to protect proprietary interests, national security, and subject/patient/victim confidentiality. Data sharing (especially photographs and graphic descriptions of animal research) may also be restricted to protect institutions and scientists from misuse of data for political purposes by animal rights extremists.
Data and methods may be requested from an author years after publication. In order to encourage data sharing and prevent the loss or corruption of data, a number of funding agencies and journals established policies on data archiving
Scientific data archiving
Scientific data archiving refers to the long-term storage of scientific data and methods. The various scientific journals have differing policies regarding how much of their data and methods scientists are required to store in a public archive, and what is actually archived varies widely between...
. Access to publicly archived data is a recent development in the history of science
History of science
The history of science is the study of the historical development of human understandings of the natural world and the domains of the social sciences....
made possible by technological advances in communications and information technology
Information technology
Information technology is the acquisition, processing, storage and dissemination of vocal, pictorial, textual and numerical information by a microelectronics-based combination of computing and telecommunications...
.
Despite policies on data sharing and archiving, data withholding still happens. Authors may fail to archive data or they only archive a portion of the data. Failure to archive data alone is not data withholding. When a researcher requests additional information, an author sometimes refuses to provide it. When authors withhold data like this, they run the risk of losing the trust of the science community.
Federal law
On August 9, 2007, President Bush signed the "America COMPETES Act" (or the "America Creating Opportunities to Meaningfully Promote Excellence in Technology, Education, and Science Act") requiring civilian federal agencies to provide guidelines, policy and procedures, to facilitate and optimize the open exchange of data and research between agencies, the public and policymakers. See Section 1009.NIH data sharing policy
The NIH Final Statement of Sharing of Research Data says:NSF Policy from Grant General Conditions
The American Naturalist
Journal of Heredity
Molecular Ecology
Nature
Royal Society Publishing
"As a condition of acceptance authors agree to honour any reasonable request by other researchers for materials, methods, or data necessary to verify the conclusion of the article. Supplementary data up to 10Mb is placed on the Society's website free of charge and is publicly accessible. Large datasets must be deposited in a recognised public domain database by the author prior to submission. The accession number should be provided for inclusion in the published article."Office of Research Integrity
Allegations of misconduct in medical research carry severe consequences. The United States Department of Health and Human Services established an office to oversee investigations of allegations of misconduct, including data withholding. The website defines the mission:Ideals in data sharing
Some research organizations feel particularly strongly about data sharing. Stanford University's WaveLabWaveLab (mathematics software)
WaveLab is a collection of MATLAB functions for wavelet analysis. Following the success of WaveLab package, there is now the availability of CurveLab and ShearLab....
has a philosophy about reproducible research and disclosing all algorithms and source code necessary to reproduce the research. In a paper titled "WaveLab and Reproducible Research," the authors describe some of the problems they encountered in trying to reproduce their own research after a period of time. In many cases, it was so difficult they gave up the effort. These experiences are what convinced them of the importance of disclosing source code. The philosophy is described:
- The idea is: An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.
The Data Observation Network for Earth (DataONE
DataONE
Data Observation Network for Earth is a project supported by the National Science Foundation under the DataNet program. DataONE will provide scientific data archiving for ecological and environmental data produced by scientists worldwide. DataONE's stated goal is to preserve and provide access to...
) and Data Conservancy are projects supported by the National Science Foundation
National Science Foundation
The National Science Foundation is a United States government agency that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National Institutes of Health...
to encourage and facilitate data sharing among research scientists and better support meta-analysis
Meta-analysis
In statistics, a meta-analysis combines the results of several studies that address a set of related research hypotheses. In its simplest form, this is normally by identification of a common measure of effect size, for which a weighted average might be the output of a meta-analyses. Here the...
. In environmental sciences, the research community is recognizing that major scientific advances involving integration of knowledge in and across fields will require that researchers overcome not only the technological barriers to data sharing but also the historically entrenched institutional and sociological barriers. Dr. Richard J. Hodes, director of the National Institute on Aging
National Institute on Aging
The National Institute on Aging ' is a division of the U.S. National Institutes of Health , located in Baltimore, Maryland.The NIA leads a broad scientific effort to understand the nature of aging and to extend the healthy, active years of life...
has stated, “the old model in which researchers jealously guarded their data is no longer applicable".
The Alliance for Taxpayer Access is a group of organizations that support open access to government sponsored research. The group has expressed a "Statement of Principles" explaining why they believe open access is important. They also list a number of international public access policies.
International policies
- Australia
- Austria
- Europe — Commission of European Communities
- Germany
- United Kingdom
- 'Omic Data Sharing — a list of policies of major science funders BioSharing.org Catalogue of Data Policies
Academic genetics
Withholding of data has become so commonplace in academic genetics that researchers at Massachusetts General Hospital published a journal article on the subject. The study found that “Because they were denied access to data, 28% of geneticists reported that they had been unable to confirm published research.”Scientists in training
A study of scientists in training indicated many had already experienced data withholding. This study has given rise to the fear the future generation of scientists will not abide by the established practices.Differing approaches in different fields
Requirements for data sharing are more commonly imposed by institutions, funding agencies, and publication venues in the medical and biological sciences than in the physical sciences. Requirements vary widely regarding whether data must be shared at all, with whom the data must be shared, and who must bear the expense of data sharing.Funding agencies such as the NIH and NSF tend to require greater sharing of data, but even these requirements tend to acknowledge the concerns of patient confidentiality, costs incurred in sharing data, and the legitimacy of the request. Private interests and public agencies with national security interests (defense and law enforcement) often discourage sharing of data and methods through non-disclosure agreements.
Literature
— discusses the international exchange of data in the natural sciences.External links
- Data sharing and replication ― Gary King.
- “The Selfish Gene: Data Sharing and Withholding in Academic Genetics” by Eric Campbell and David Blumenthal published May 31, 2002.
- Data sharing and data archiving ― American Psychology Association
- The Public Domain of Digital Research Data
- WaveLab and Reproducible Research by Jonathan B. Buckheit and David L. Donoho of Stanford University
- The Role of Data and Program Code Archives in the Future of Economic Research published by The Federal Reserve Bank of St. Louis
- Ecological Society of America data sharing and archiving initiative
- BioSharing.org A website on data sharing and data policies in biology
- UK Data Archive: Manage and Share data
- Data Management Plan Resources and Examples - Inter-university Consortium for Political and Social Research.
- DataONE