DataONE
Encyclopedia
Data Observation Network for Earth (DataONE) is a project supported by the National Science Foundation
under the DataNet program. DataONE will provide scientific data archiving
for ecological and environmental data produced by scientists worldwide. DataONE's stated goal is to preserve and provide access to multi-scale, multi-discipline, and multi-national data. The community of users for DataONE includes scientists, ecosystem managers, policy makers, students, educators, and the public.
DataONE will link together existing cyberinfrastructure
to provide a distributed framework, sound management, and robust technologies that enable long-term preservation of diverse multi-scale, multi-discipline, and multi-national observational data. The distributed framework will be composed of Coordinating Nodes currently located at the Oak Ridge Campus, University of California Santa Barbara, and University of New Mexico
, and many Member Nodes, located globally. DataONE will also provide an Investigator Tool Kit that will provide the DataONE users community with tools for accessing and using DataONE efficiently.
. The three Coordinating Nodes are:
and Python
libraries, an R programming language plug-in for analysis, extensions for Excel
, the VisTrails
scientific workflow, and the Kepler scientific workflow system
.
. The metadata
will then make this data searchable and accessible to other scientists. Data management practices include
National Science Foundation
The National Science Foundation is a United States government agency that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National Institutes of Health...
under the DataNet program. DataONE will provide scientific data archiving
Scientific data archiving
Scientific data archiving refers to the long-term storage of scientific data and methods. The various scientific journals have differing policies regarding how much of their data and methods scientists are required to store in a public archive, and what is actually archived varies widely between...
for ecological and environmental data produced by scientists worldwide. DataONE's stated goal is to preserve and provide access to multi-scale, multi-discipline, and multi-national data. The community of users for DataONE includes scientists, ecosystem managers, policy makers, students, educators, and the public.
DataONE will link together existing cyberinfrastructure
Cyberinfrastructure
United States federal research funders use the term cyberinfrastructure to describe research environments that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services distributed over...
to provide a distributed framework, sound management, and robust technologies that enable long-term preservation of diverse multi-scale, multi-discipline, and multi-national observational data. The distributed framework will be composed of Coordinating Nodes currently located at the Oak Ridge Campus, University of California Santa Barbara, and University of New Mexico
University of New Mexico
The University of New Mexico at Albuquerque is a public research university located in Albuquerque, New Mexico, in the United States. It is the state's flagship research institution...
, and many Member Nodes, located globally. DataONE will also provide an Investigator Tool Kit that will provide the DataONE users community with tools for accessing and using DataONE efficiently.
Coordinating Nodes
Coordinating Nodes will provide network-wide services to Member Nodes. They will be geographically replicated, with mirrored content and full copies of science metadataMetadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
. The three Coordinating Nodes are:
- University of New MexicoUniversity of New MexicoThe University of New Mexico at Albuquerque is a public research university located in Albuquerque, New Mexico, in the United States. It is the state's flagship research institution...
- Oak Ridge Campus (partnership of Oak Ridge National Laboratory (ORNL) and University of TennesseeUniversity of TennesseeThe University of Tennessee is a public land-grant university headquartered at Knoxville, Tennessee, United States...
) - University of California, Santa BarbaraUniversity of California, Santa BarbaraThe University of California, Santa Barbara, commonly known as UCSB or UC Santa Barbara, is a public research university and one of the 10 general campuses of the University of California system. The main campus is located on a site in Goleta, California, from Santa Barbara and northwest of Los...
, UCSB
Member Nodes
Member Nodes will consist of Earth observing institutions, projects, and networks. They will provide resources for their own data and replicated data, and focus on serving their specific constituencies. These member nodes are geographically distributed and consist of diverse implementations. Current Member Nodes include:- Dryad
- ORNL Distributed Active Archive Center
- Knowledge Network for Biocomplexity
Investigator Tool Kit
The Tool Kit will provide tools for researchers to access DataONE. These will be both general purpose and discipline-specific tools, and DataONE developers will adapt existing tools where possible. The Tool Kit will include JavaJava (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
and Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
libraries, an R programming language plug-in for analysis, extensions for Excel
Microsoft Excel
Microsoft Excel is a proprietary commercial spreadsheet application written and distributed by Microsoft for Microsoft Windows and Mac OS X. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications...
, the VisTrails
VisTrails
VisTrails is a scientific workflow management system developed at the Scientific Computing and Imaging Institute at the University of Utah that provides support for data exploration and visualization. It is written in Python and employs Qt via PyQt bindings. The system is open source, released...
scientific workflow, and the Kepler scientific workflow system
Kepler scientific workflow system
Kepler is a free software system for designing, executing, reusing, evolving, archiving, and sharing scientific workflows.Kepler's facilities provide process and data monitoring, provenance information, and high-speed data movement solutions...
.
Data Management
DataONE will provide a place for scientists to store data and its associated metadataMetadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
. The metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
will then make this data searchable and accessible to other scientists. Data management practices include
- Data management planning
- Data acquisition (techniques, protocols, methods)
- Data protection (backing up)
- Data entry and manipulation (naming files, organization)
- Quality control on data
- Data analysis
- Workflow tools (VisTrailsVisTrailsVisTrails is a scientific workflow management system developed at the Scientific Computing and Imaging Institute at the University of Utah that provides support for data exploration and visualization. It is written in Python and employs Qt via PyQt bindings. The system is open source, released...
, Kepler scientific workflow systemKepler scientific workflow systemKepler is a free software system for designing, executing, reusing, evolving, archiving, and sharing scientific workflows.Kepler's facilities provide process and data monitoring, provenance information, and high-speed data movement solutions...
) - Data documentation (metadataMetadataThe term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
) - Data sharing, citation, and discovery
- Data preservation & curation
DataONE Community
The DataONE community includes research networks, professional societies, libraries, academic institutions, data centers, data repositories, environmental observatory networks, educators, scientists, policy makers, administrators, citizen scientists, international organizations, NGOs, ecosystem managers, students, private companies and the public.External links
- http://www.unm.edu/~market/cgi-bin/archives/004536.html
- http://www.nature.com/news/specials/datasharing/index.html
- http://www.nsf.gov/pubs/2007/nsf07601/nsf07601.htm