
Open Data
    
    Encyclopedia
    
        Open data is the idea that certain data
should be freely available to everyone to use and republish as they wish, without restrictions from copyright
, patent
s or other mechanisms of control. The goals of the open data movement are similar to those of other "Open" movements such as open source
, open content
, and open access. The philosophy behind open data has been long established (for example in the Mertonian tradition of science
), but the term "open data" itself is recent, gaining popularity with the rise of the Internet
and World Wide Web
and, especially, with the launch of open-data government initiatives such as Data.gov
.
Open data is often focused on non-textual material such as map
s, genome
s, connectome
s, chemical compound
s, mathematical and scientific formulae, medical data and practice, bioscience and biodiversity. Problems often arise because these are commercially valuable or can be aggregated into works of value. Access to, or re-use of, the data is controlled by organisations, both public and private. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of open data argue that these restrictions are against the communal good and that these data should be made available without restriction or fee. In addition, it is important that the data are re-usable without requiring further permission, though the types of re-use (such as the creation of derivative works) may be controlled by license.
A typical depiction of the need for open data:
Creators of data often do not consider the need to state the conditions of ownership, licensing and re-use. For example, many scientists do not regard the published data arising from their work to be theirs to control and the act of publication in a journal is an implicit release of the data into the commons. However the lack of a license makes it difficult to determine the status of a data set
and may restrict the use of data offered in an Open spirit. Because of this uncertainty it is also possible for public or private organizations such as IEEE to aggregate said data, protect it with copyright and then resell it.
Under "Toward Open Data" Connolly (2005, v.i.) gives two quotations:
system, in preparation for the International Geophysical Year
of 1957-1958. The International Council of Scientific Unions (now the International Council for Science
) established several World Data Centers to minimize the risk of data loss and to maximize data accessibility, further recommending in 1955 that data be made available in machine-readable form.
While the open-science-data movement long predates the Internet, the availability of fast, ubiquitous networking has significantly changed the context of Open science data, since publishing or obtaining data has become much less expensive and time-consuming.
In 2004, the Science Ministers of all nations of the OECD (Organisation for Economic Co-operation and Development), which includes most developed countries of the world, signed a declaration which essentially states that all publicly-funded archive data should be made publicly available. Following a request and an intense discussion with data-producing institutions in member states, the OECD published in 2007 the OECD Principles and Guidelines for Access to Research Data from Public Funding as a soft-law recommendation.
Additionally, other levels of government have established open data websites, such as the City of Ottawa, Canada
http://ottawa.ca/online_services/opendata/ and the state of California, USA
http://www.data.ca.gov/.
It is generally held that factual data cannot be copyrighted. However, publishers frequently add copyright statements (often forbidding re-use) to scientific data accompanying publications. It may be unclear whether the factual data embedded in full text are part of the copyright.
While the human abstraction of facts from paper publications is normally accepted as legal there is often an implied restriction on the machine extraction by robots.
As the term Open Data is relatively new it is difficult to collect arguments against it. Unlike Open Access, where groups of publishers have stated their concerns, Open Data is normally challenged by individual institutions. Their arguments may include the following:
Note the fundamental requirement to be able to replicate the experiment.
Other bodies active in promoting the deposition of data as well as fulltext include the Wellcome Trust
.
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data  are typically the results of measurements and can be the basis of graphs, images,  or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...
should be freely available to everyone to use and republish as they wish, without restrictions from copyright
Copyright
Copyright is a legal concept, enacted by most governments, giving the creator of an original work exclusive rights to it, usually for a limited time...
, patent
Patent
A patent  is a form of intellectual property. It consists of a set of exclusive rights granted by a sovereign state to an inventor or their assignee for a limited period of time in exchange for the public disclosure of an invention....
s or other mechanisms of control. The goals of the open data movement are similar to those of other "Open" movements such as open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...
, open content
Open content
Open content or OpenContent is a neologism coined by David Wiley in 1998 which describes a creative work that others can copy or modify. The term evokes open source, which is a related concept in software....
, and open access. The philosophy behind open data has been long established (for example in the Mertonian tradition of science
Merton Thesis
The Merton Thesis is an argument about the nature of early experimental science proposed by Robert K. Merton. Similar to Max Weber's famous claim on the link between Protestant ethic and the capitalist economy, Merton argued for a similar positive correlation between the rise of Protestant pietism...
), but the term "open data" itself is recent, gaining popularity with the rise of the Internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite  to serve billions of users worldwide...
and World Wide Web
World Wide Web
The World Wide Web  is a system of interlinked hypertext documents accessed via the Internet...
and, especially, with the launch of open-data government initiatives such as Data.gov
Data.gov
Data.gov is a U.S. government website launched in late May 2009 by the then Federal Chief Information Officer   of the United States, Vivek Kundra....
.
Overview
The concept of open data is not new; but although the term is currently in frequent use, there are no commonly agreed definitions (unlike, for example, Open Access where several formal declarations have been made and signed).Open data is often focused on non-textual material such as map
Map
A map is a visual representation of an area—a symbolic depiction highlighting relationships between elements of that space such as objects, regions, and themes....
s, genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA.  The genome includes both the genes and the non-coding sequences of the DNA/RNA....
s, connectome
Connectome
A connectome is a comprehensive map of neural connections in the brain.The production and study of connectomes, known as connectomics, may range in scale from a detailed map of the full set of neurons and synapses within part or all of the nervous system of an organism to a macro scale description...
s, chemical compound
Chemical compound
A chemical compound  is a pure chemical substance consisting of two or more different chemical elements that can be separated into simpler substances by chemical reactions. Chemical compounds have a unique and defined chemical structure; they consist of a fixed ratio of atoms that are held together...
s, mathematical and scientific formulae, medical data and practice, bioscience and biodiversity. Problems often arise because these are commercially valuable or can be aggregated into works of value. Access to, or re-use of, the data is controlled by organisations, both public and private. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of open data argue that these restrictions are against the communal good and that these data should be made available without restriction or fee. In addition, it is important that the data are re-usable without requiring further permission, though the types of re-use (such as the creation of derivative works) may be controlled by license.
A typical depiction of the need for open data:
Numerous scientists have pointed out the irony that right at the historical moment when we have the technologies to permit worldwide availability and distributed process of scientific data, broadening collaboration and accelerating the pace and depth of discovery…..we are busy locking up that data and preventing the use of correspondingly advanced technologies on knowledgeJohn Wilbanks, Executive Director, Science Commons
Creators of data often do not consider the need to state the conditions of ownership, licensing and re-use. For example, many scientists do not regard the published data arising from their work to be theirs to control and the act of publication in a journal is an implicit release of the data into the commons. However the lack of a license makes it difficult to determine the status of a data set
Data set
A data set  is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. Its values for each of the variables, such as height and weight of an object or values of random numbers. Each...
and may restrict the use of data offered in an Open spirit. Because of this uncertainty it is also possible for public or private organizations such as IEEE to aggregate said data, protect it with copyright and then resell it.
Under "Toward Open Data" Connolly (2005, v.i.) gives two quotations:
- I want my data back. (Jon Bosak circa 1997)
- I've long believed that customers of any application own the data they enter into it.. (This quote refers to Veen's own heart-rate data.)
Major sources of open data
Open data can come from any source. This section lists some of the fields that publish (or at least discuss publishing) a large amount of open data.Open data in science
The concept of open access to scientific data was institutionally established with the formation of the World Data CenterWorld Data Center
The World Data Center  system was created to archive and distribute data collected from the observational programs of the 1957-1958 International Geophysical Year. Originally established in the United States, Europe, Soviet Union, and Japan, the WDC system has since expanded to other countries and...
system, in preparation for the International Geophysical Year
International Geophysical Year
The International Geophysical Year  was an international scientific project that lasted from July 1, 1957, to December 31, 1958.  It marked the end of a long period during the Cold War when scientific interchange between East and West was seriously interrupted...
of 1957-1958. The International Council of Scientific Unions (now the International Council for Science
International Council for Science
The International Council for Science , formerly the International Council of Scientific Unions, was founded in 1931 as an international non-governmental organization devoted to international co-operation in the advancement of science...
) established several World Data Centers to minimize the risk of data loss and to maximize data accessibility, further recommending in 1955 that data be made available in machine-readable form.
While the open-science-data movement long predates the Internet, the availability of fast, ubiquitous networking has significantly changed the context of Open science data, since publishing or obtaining data has become much less expensive and time-consuming.
In 2004, the Science Ministers of all nations of the OECD (Organisation for Economic Co-operation and Development), which includes most developed countries of the world, signed a declaration which essentially states that all publicly-funded archive data should be made publicly available. Following a request and an intense discussion with data-producing institutions in member states, the OECD published in 2007 the OECD Principles and Guidelines for Access to Research Data from Public Funding as a soft-law recommendation.
Open data in government
Several national governments have created web sites to distribute a portion of the data they collect.-  Data.govData.govData.gov is a U.S. government website launched in late May 2009 by the then Federal Chief Information Officer of the United States, Vivek Kundra....
 - U.S. government open-data website. Launched in May 2009.
-  Data.gov.ukData.gov.ukdata.gov.uk is a UK Government project to open up almost all non-personal data acquired for official purposes for free re-use. Sir Tim Berners-Lee and Professor Nigel Shadbolt are the two key figures behind the project.- Beta version and launch :...
 - U.K. government open-data website. Launched in September 2009.
- Data.gov.au - Australian government open-data website. Launched in March 2011.
- Data.gc.ca - Canadian government open-data website. Launched in March 2011.
- opendata.go.ke - Kenyan government open-data website. Launched in Jul 2011.
- data.norge.no - Norwegian government open-data website. Launched in April 2010.
- data.overheid.nl - Dutch government open-data website.
- data.govt.nz - New Zealand Government initiative to publish Government Data under Creative Commons licences, defined further at NZ GOAL
- data.gov.it - Italian government open-data website. Launched in October 2011.
Additionally, other levels of government have established open data websites, such as the City of Ottawa, Canada
Ottawa
Ottawa  is the capital of Canada, the second largest city in the Province of Ontario, and the fourth largest city in the country. The city is located on the south bank of the Ottawa River in the eastern portion of Southern Ontario...
http://ottawa.ca/online_services/opendata/ and the state of California, USA
California
California is a state located on the West Coast of the United States. It is by far the most populous U.S. state, and the third-largest by land area...
http://www.data.ca.gov/.
Arguments for and against open data
Arguments made on behalf of Open Data include the following:- "Data belong to the human race". Typical examples are genomes, data on organisms, medical science, environmental data.
- Public money was used to fund the work and so it should be universally available.
- It was created by or at a government institution (this is common in US National Laboratories and government agencies)
- Facts cannot legally be copyrighted.
- Sponsors of research do not get full value unless the resulting data are freely available
- Restrictions on data re-use create an anticommons
- Data are required for the smooth process of running communal human activities (map data, public institutions)
- In scientific research, the rate of discovery is accelerated by better access to data.
It is generally held that factual data cannot be copyrighted. However, publishers frequently add copyright statements (often forbidding re-use) to scientific data accompanying publications. It may be unclear whether the factual data embedded in full text are part of the copyright.
While the human abstraction of facts from paper publications is normally accepted as legal there is often an implied restriction on the machine extraction by robots.
As the term Open Data is relatively new it is difficult to collect arguments against it. Unlike Open Access, where groups of publishers have stated their concerns, Open Data is normally challenged by individual institutions. Their arguments may include the following:
- the revenue earned by publishing data permits non-profit organisations to fund other activities (e.g. learned society publishing supports the society)
-  the government gives specific legitimacy for certain organisations to recover costs (NISTNational Institute of Standards and TechnologyThe National Institute of Standards and Technology , known between 1901 and 1988 as the National Bureau of Standards , is a measurement standards laboratory, otherwise known as a National Metrological Institute , which is a non-regulatory agency of the United States Department of Commerce...
 in US, Ordnance SurveyOrdnance SurveyOrdnance Survey , an executive agency and non-ministerial government department of the Government of the United Kingdom, is the national mapping agency for Great Britain, producing maps of Great Britain , and one of the world's largest producers of maps.The name reflects its creation together with...
 in UK)
-  government funding may not be used to duplicate or challenge the activities of the private sector (e.g. PubChemPubChemPubChem is a database of chemical molecules and their activities against biological assays. The system is maintained by the National Center for Biotechnology Information , a component of the National Library of Medicine, which is part of the United States National Institutes of Health . PubChem can...
 )
Relation to other open activities
The goals of the Open Data movement are similar to those of other "Open" movements.-  Open AccessOpen accessOpen access refers to unrestricted access via the Internet to articles published in scholarly journals, and also increasingly to book chapters or monographs....
 is concerned with making scholarly publications freely available on the internet. In some cases, these articles include open datasets as well.
-  Open ContentOpen contentOpen content or OpenContent is a neologism coined by David Wiley in 1998 which describes a creative work that others can copy or modify. The term evokes open source, which is a related concept in software....
 is concerned with making resources aimed at a human audience (such as prose, photos, or videos) freely available.
-  Open Notebook ScienceOpen Notebook ScienceOpen Notebook Science is the practice of making the entire primary record of a research project publicly available online as it is recorded. This involves placing the personal, or laboratory, notebook of the researcher online along with all raw and processed data, and any associated material, as...
 refers to the application of the Open Data concept to as much of the scientific process as possible, including failed experiments and raw experimental data.
-  Open KnowledgeOpen KnowledgeOpen Knowledge is a term used to denote a set of principles and methodologies related to the production and distribution of knowledge works in an open manner...
 . The Open Knowledge FoundationOpen Knowledge FoundationThe Open Knowledge Foundation is a not-for-profit organization that promotes open knowledge, including open content and open data. It was founded 24 May 2004 in Cambridge, UK...
 argues for Openness in a range of issues including, but not limited to, those of Open Data. It covers (a) scientific, historical, geographic or otherwise (b) Content such as music, films, books (c) Government and other administrative information. Open Data is included within the scope of the Open Knowledge Definition, which is alluded to in Science CommonsScience CommonsScience Commons is a Creative Commons project for designing strategies and tools for faster, more efficient web-enabled scientific research. The organization identifies unnecessary barriers to research, crafts policy guidelines and legal agreements to lower those barriers, and develops technology...
 ' Protocol for Implementing Open Access Data.
-  Open SourceOpen sourceThe term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...
 (Software) is concerned with the licenses under which computer programs can be distributed and is not normally concerned primarily with data.
Funders' mandates
Several funding bodies which mandate Open Access also mandate Open Data. A good expression of requirements (truncated in places) is given by the Canadian Institutes of Health Research (CIHR) :- to deposit bioinformatics, atomic and molecular coordinate data, experimental data into the appropriate public database immediately upon publication of research results.
- to retain original data sets for a minimum of five years after the grant. This applies to all data, whether published or not.
Note the fundamental requirement to be able to replicate the experiment.
Other bodies active in promoting the deposition of data as well as fulltext include the Wellcome Trust
Wellcome Trust
The Wellcome Trust was established in 1936 as an independent charity funding research to improve human and animal health. With an endowment of around £13.9 billion, it is the United Kingdom's largest non-governmental source of funds for biomedical research...
.
Closed data
Several mechanisms restrict access to or reuse of data. They include:- compilation in databases or websites to which only registered members or customers can have access.
- use of a proprietary or closed technology or encryption which creates a barrier for access.
- copyright forbidding (or obfuscating) re-use of the data.
-  license forbidding (or obfuscating) re-use of the data (such as share-alikeShare-alikeShare-Alike is a descriptive term used in the Creative Commons project for copyright licenses which include certain copyleft provisions. The Share-Alike license comes in two varieties, CC-BY-SA and CC-BY-NC-SA.-Share-alike license types:...
 or non-commercial)
- patent forbidding re-use of the data (for example the 3-dimensional coordinates of some experimental protein structures have been patented)
- restriction of robots to websites, with preference to certain search engines
-  aggregating factual data into "databases" which may be covered by "database rights" or "database directives" (e.g. Directive on the legal protection of databasesDirective on the legal protection of databasesThe Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databasesis a European Union directive in the field of copyright law, made under the internal marketprovisions of the Treaty of Rome...
 )
- time-limited access to resources such as e-journals (which on traditional print were available to the purchaser indefinitely)
-  webstacles, or the provision of single data pointData pointIn statistics, a data point is a set of measurements on a single member of a statistical population, or a subset of those measurements for a given individual...
 s as opposed to tabular queriesInformation retrievalInformation retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...
 or bulk downloads of data setData setA data set is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. Its values for each of the variables, such as height and weight of an object or values of random numbers. Each...
 s.
-  political, commercial or legal pressure on the activity of organisations providing Open Data (for example the American Chemical SocietyAmerican Chemical SocietyThe American Chemical Society is a scientific society based in the United States that supports scientific inquiry in the field of chemistry. Founded in 1876 at New York University, the ACS currently has more than 161,000 members at all degree-levels and in all fields of chemistry, chemical...
 lobbied the US Congress to limit funding to the National Institutes of HealthNational Institutes of HealthThe National Institutes of Health are an agency of the United States Department of Health and Human Services and are the primary agency of the United States government responsible for biomedical and health-related research. Its science and engineering counterpart is the National Science Foundation...
 for its Open PubChemPubChemPubChem is a database of chemical molecules and their activities against biological assays. The system is maintained by the National Center for Biotechnology Information , a component of the National Library of Medicine, which is part of the United States National Institutes of Health . PubChem can...
 data.
Organisations promoting open data
- d8taplex
-  Scholarly Publishing and Academic Resources CoalitionScholarly Publishing and Academic Resources CoalitionThe Scholarly Publishing and Academic Resources Coalition is an international alliance of academic and research libraries developed by the Association of Research Libraries in 1998 which promotes open access to scholarship. They currently have over 800 institutions in North America, Europe, Japan,...
-  "Free our data" (The GuardianThe GuardianThe Guardian, formerly known as The Manchester Guardian , is a British national daily newspaper in the Berliner format...
 technology section)
-  The Open Knowledge FoundationOpen Knowledge FoundationThe Open Knowledge Foundation is a not-for-profit organization that promotes open knowledge, including open content and open data. It was founded 24 May 2004 in Cambridge, UK...
-  TalisTalis GroupTalis Group Ltd. is a software company based in Birmingham, England that develops a Semantic Web application platform and a suite of applications for the education, research and library sectors...
- Linking Open Data on the Semantic Web
-  Blue ObeliskBlue ObeliskBlue Obelisk is an informal group of chemists who promote Open Data, Open Source, and Open Standards; it was initiated by Peter Murray-Rust and others in 2005...
- Infochimps.org
-  FreebaseFreebase (database)Freebase is a large collaborative knowledge base consisting of metadata composed mainly by its community members. It is an online collection of structured data harvested from many sources, including individual 'wiki' contributions. Freebase aims to create a global resource which allows people to...
-  FactualFactualFactual, founded in October 2009, is an aggregator and provider of open data, which it provides access to through web service APIs and reusable, customisable web applications. It is described by its founder, Gil Elbaz as "a platform for anyone to share and mash open data". Factual has datasets...
-  Information Retrieval FacilityInformation Retrieval FacilityThe Information Retrieval Facility , founded 2006 and located in Vienna, Austria, is a research platform for networking and collaboration for professionals in the field of information retrieval.The IRF has members in the following categories:...
- Open Data Network - Germany
- OpenSourceApi
- Socrata
- Regards Citoyens - France
- Open Data Day, December 4th, 2010 - International Hackathon
-  International Development Research CentreInternational Development Research CentreThe International Development Research Centre is a Canadian Crown Corporation created by the Parliament of Canada that supports research in developing countries to promote growth and development...
- Open Municipal Geodata Standard
See also
-  Budapest Open Access InitiativeBudapest Open Access InitiativeThe Budapest Open Access Initiative was a conference convened by the Open Society Institute on December 1-2, 2001. This small gathering of individuals is recognised as one of the major historical, and defining, events of the open access movement....
-  Creative Commons licensesCreative Commons licensesCreative Commons licenses are several copyright licenses that allow the distribution of copyrighted works. The licenses differ by several combinations that condition the terms of distribution. They were initially released on December 16, 2002 by Creative Commons, a U.S...
- Open access (publishing)
-  Open contentOpen contentOpen content or OpenContent is a neologism coined by David Wiley in 1998 which describes a creative work that others can copy or modify. The term evokes open source, which is a related concept in software....
-  Open researchOpen researchOpen research is research conducted in the spirit of free and open source software. Much like open source schemes that are built around a source code that is made public, the central theme of open research is to make clear accounts of the methodology freely available via the internet, along with...
-  Merton ThesisMerton ThesisThe Merton Thesis is an argument about the nature of early experimental science proposed by Robert K. Merton. Similar to Max Weber's famous claim on the link between Protestant ethic and the capitalist economy, Merton argued for a similar positive correlation between the rise of Protestant pietism...
-  Linked DataLinked DataIn computing, linked data describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a...
External links
- OpenPSI the (OpenPSI project) is a community effort to create UK government linked data service that supports research. It is a collaboration between the University of Southampton and the UK government, led by OPSI at the National Archive and is supported by JISC funding.
- Talis Community License
- Open Data Commons Database Licence (an update to the Talis Community License)
- Open Data Commons - legal tools for open data
-  CKAN - a registry of open data from the Open Knowledge FoundationOpen Knowledge FoundationThe Open Knowledge Foundation is a not-for-profit organization that promotes open knowledge, including open content and open data. It was founded 24 May 2004 in Cambridge, UK...
-  http://www.ted.com/talks/tim_berners_lee_on_the_next_web.htmlVideo of Tim Berners-LeeTim Berners-LeeSir Timothy John "Tim" Berners-Lee, , also known as "TimBL", is a British computer scientist, MIT professor and the inventor of the World Wide Web...
 at TED (conference)TED (conference)TED is a global set of conferences owned by the private non-profit Sapling Foundation, formed to disseminate "ideas worth spreading"....
 2009 calling for "Raw Data Now"]
-  http://www.ted.com/talks/tim_berners_lee_the_year_open_data_went_worldwide.htmlSix minute Video of Tim Berners-LeeTim Berners-LeeSir Timothy John "Tim" Berners-Lee, , also known as "TimBL", is a British computer scientist, MIT professor and the inventor of the World Wide Web...
 at TED (conference)TED (conference)TED is a global set of conferences owned by the private non-profit Sapling Foundation, formed to disseminate "ideas worth spreading"....
 2010 showing examples of open data]


