Open Mind Common Sense
Encyclopedia
Open Mind Common Sense is an artificial intelligence
project based at the Massachusetts Institute of Technology
(MIT) Media Lab
whose goal is to build and utilize a large commonsense knowledge base from the contributions of many thousands of people across the Web.
Since its founding in 1999, it has accumulated more than a million English facts from over 15,000 contributors in addition to knowledge bases in other languages. Much of OMCS's software is built on three interconnected representations: the natural language corpus that people interact with directly, a semantic network built from this corpus called ConceptNet, and a matrix-based representation of ConceptNet called AnalogySpace that can infer new knowledge using dimensionality reduction. The knowledge collected by Open Mind Common Sense has enabled research projects at MIT and elsewhere.
, Push Singh, Catherine Havasi, and others. Development work began in September 1999, and the project was opened to the Internet a year later. Havasi described it in her dissertation as "an attempt to ... harness some of the distributed human computing power of the Internet, an idea which was then only in its early
stages." The original OMCS was influenced by the website Everything2
and its predecessor, and presented a minimalist interface that was inspired by Google
.
Push Singh was slated to become a professor at the MIT Media Lab
to lead the Common Sense Computing group in 2007 until his suicide on Tuesday, February 28, 2006.
The project is currently run by the Software Agents Group at the MIT Media Lab
under Henry Lieberman
.
respected" and "People want good coffee".
Originally, these statements could be entered into the Web site as unconstrained sentences of text, which had to be parsed later. The current version of the Web site collects knowledge only using more structured fill-in-the-blank templates. OMCS also makes use of data collected by the Game With a Purpose "Verbosity".
In its native form, the OMCS database is simply a collection of these short sentences that convey some common knowledge. In order to use this knowledge computationally, it has to be transformed into a more structured representation.
ConceptNet is created from the natural-language assertions in OMCS by matching them against patterns using a shallow parser. Assertions are expressed as relations between two concepts, selected from a limited set of possible
relations. The various relations represent common sentence patterns found in the OMCS corpus, and in particular, every "fill-in-the-blanks" template used on the knowledge-collection Web site is associated with a particular relation.
The data structures that make up ConceptNet were significantly reorganized in 2007, and published as ConceptNet 3. The Software Agents group currently distributes a database and API for the new version 4.0. name="launchpad-conceptnet" />
algorithms. One representation, called AnalogySpace, uses singular value decomposition
to generalize and represent patterns in the knowledge in
ConceptNet, in a way that can be used in AI applications. Its creators distribute a Python machine learning toolkit called Divisi for performing machine learning based on text corpora, structured knowledge bases such as ConceptNet, and combinations of the two.
, Mindpixel
(discontinued), Cyc
, Learner, Freebase
, YAGO
, DBpedia
, and Open Mind 1001 Questions, which have explored alternative approaches to collecting knowledge and providing incentive for participation.
The Open Mind Common Sense project differs from Cyc because it has focused on representing the common sense knowledge it collected as English sentences, rather than using a formal logical structure. ConceptNet is described by one of its creators, Hugo Liu, as being structured more like WordNet
than Cyc, due to its "emphasis on informal conceptual-connectedness over formal linguistic-rigor"
There is also the Brazilian initiative, named Open Mind Common Sense in Brazil (OMCS-Br), leaded by the Advanced Interaction Lab at Federal University of São Carlos (LIA-UFSCar) started in 2005 in collaboration with the Software Agents Group at the MIT Media Lab which main goal is to collect Brazilian's common sense and develop culturally sensitive software applications based on extracting cultural profiles' knowledge from ConceptNet to help developers and users on better suite the final user's reality on having a culturally contextualized content software, making the final applications more flexible, adaptive, accessible and usable. The main applications' focuses are education and healthcare.
Artificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...
project based at the Massachusetts Institute of Technology
Massachusetts Institute of Technology
The Massachusetts Institute of Technology is a private research university located in Cambridge, Massachusetts. MIT has five schools and one college, containing a total of 32 academic departments, with a strong emphasis on scientific and technological education and research.Founded in 1861 in...
(MIT) Media Lab
Media lab
Media lab is a term used for interdisciplinary organizations, collectives or spaces with the main focus on new media, digital culture and technology....
whose goal is to build and utilize a large commonsense knowledge base from the contributions of many thousands of people across the Web.
Since its founding in 1999, it has accumulated more than a million English facts from over 15,000 contributors in addition to knowledge bases in other languages. Much of OMCS's software is built on three interconnected representations: the natural language corpus that people interact with directly, a semantic network built from this corpus called ConceptNet, and a matrix-based representation of ConceptNet called AnalogySpace that can infer new knowledge using dimensionality reduction. The knowledge collected by Open Mind Common Sense has enabled research projects at MIT and elsewhere.
History
The project was the brainchild of Marvin MinskyMarvin Minsky
Marvin Lee Minsky is an American cognitive scientist in the field of artificial intelligence , co-founder of Massachusetts Institute of Technology's AI laboratory, and author of several texts on AI and philosophy.-Biography:...
, Push Singh, Catherine Havasi, and others. Development work began in September 1999, and the project was opened to the Internet a year later. Havasi described it in her dissertation as "an attempt to ... harness some of the distributed human computing power of the Internet, an idea which was then only in its early
stages." The original OMCS was influenced by the website Everything2
Everything2
Everything2, Everything2, or E2 for short is a collaborative Web-based community consisting of a database of interlinked user-submitted written material. E2 is moderated for quality, but has no formal policy on subject matter...
and its predecessor, and presented a minimalist interface that was inspired by Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...
.
Push Singh was slated to become a professor at the MIT Media Lab
MIT Media Lab
The MIT Media Lab is a laboratory of MIT School of Architecture and Planning. Devoted to research projects at the convergence of design, multimedia and technology, the Media Lab has been widely popularized since the 1990s by business and technology publications such as Wired and Red Herring for a...
to lead the Common Sense Computing group in 2007 until his suicide on Tuesday, February 28, 2006.
The project is currently run by the Software Agents Group at the MIT Media Lab
under Henry Lieberman
Henry Lieberman
Henry Lieberman is an American computer scientist at the MIT Media Lab in the fields of programming languages, artificial intelligence and human-computer interaction. He serves as a principal research scientist at the Media Lab and is the Director of the Software Agents Research group, which...
.
Database and website
There are many different types of knowledge in OMCS. Some statements convey relationships between objects or events, expressed as simple phrases of natural language: some examples include "A coat is used for keeping warm", "The sun is very hot", and "The last thing you do when you cook dinner is wash your dishes". The database also contains information on the emotional content of situations, in such statements as "Spending time with friends causes happiness" and "Getting into a car wreck makes one angry". OMCS contains information on people's desires and goals, both large and small, such as "People want to berespected" and "People want good coffee".
Originally, these statements could be entered into the Web site as unconstrained sentences of text, which had to be parsed later. The current version of the Web site collects knowledge only using more structured fill-in-the-blank templates. OMCS also makes use of data collected by the Game With a Purpose "Verbosity".
In its native form, the OMCS database is simply a collection of these short sentences that convey some common knowledge. In order to use this knowledge computationally, it has to be transformed into a more structured representation.
ConceptNet
ConceptNet is a semantic network based on the information in the OMCS database. ConceptNet is expressed as a directed graph whose nodes are concepts, and whose edges are assertions of common sense about these concepts. Concepts represent sets of closely related natural language phrases, which could be noun phrases, verb phrases, adjective phrases, or clauses.ConceptNet is created from the natural-language assertions in OMCS by matching them against patterns using a shallow parser. Assertions are expressed as relations between two concepts, selected from a limited set of possible
relations. The various relations represent common sentence patterns found in the OMCS corpus, and in particular, every "fill-in-the-blanks" template used on the knowledge-collection Web site is associated with a particular relation.
The data structures that make up ConceptNet were significantly reorganized in 2007, and published as ConceptNet 3. The Software Agents group currently distributes a database and API for the new version 4.0. name="launchpad-conceptnet" />
Machine learning tools
The information in ConceptNet can be used as a basis for machine learningMachine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
algorithms. One representation, called AnalogySpace, uses singular value decomposition
Singular value decomposition
In linear algebra, the singular value decomposition is a factorization of a real or complex matrix, with many useful applications in signal processing and statistics....
to generalize and represent patterns in the knowledge in
ConceptNet, in a way that can be used in AI applications. Its creators distribute a Python machine learning toolkit called Divisi for performing machine learning based on text corpora, structured knowledge bases such as ConceptNet, and combinations of the two.
Comparison to other projects
Other similar projects include Never-Ending Language LearningNever-Ending Language Learning
Never-Ending Language Learning system is a semantic machine learning system developed by a research team at Carnegie Mellon University, and supported by grants from DARPA, Google, and the NSF, with portions of the system running on a supercomputing cluster provided by Yahoo!.-Process and...
, Mindpixel
Mindpixel
Mindpixel was a web-based collaborative artificial intelligence project which aimed to create a knowledgebase of millions of human validated true/false statements, or probabilistic propositions. It ran from 2000 to 2005.-Description:...
(discontinued), Cyc
Cyc
Cyc is an artificial intelligence project that attempts to assemble a comprehensive ontology and knowledge base of everyday common sense knowledge, with the goal of enabling AI applications to perform human-like reasoning....
, Learner, Freebase
Freebase (database)
Freebase is a large collaborative knowledge base consisting of metadata composed mainly by its community members. It is an online collection of structured data harvested from many sources, including individual 'wiki' contributions. Freebase aims to create a global resource which allows people to...
, YAGO
YAGO (database)
YAGO is a knowledge base developed at the Max-Planck-Institute Saarbrücken.The knowledge base contains information harvested from Wikipedia and linked to Wordnet....
, DBpedia
DBpedia
DBpedia is a project aiming to extract structured content from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web. DBpedia allows users to query relationships and properties associated with Wikipedia resources,...
, and Open Mind 1001 Questions, which have explored alternative approaches to collecting knowledge and providing incentive for participation.
The Open Mind Common Sense project differs from Cyc because it has focused on representing the common sense knowledge it collected as English sentences, rather than using a formal logical structure. ConceptNet is described by one of its creators, Hugo Liu, as being structured more like WordNet
WordNet
WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets...
than Cyc, due to its "emphasis on informal conceptual-connectedness over formal linguistic-rigor"
There is also the Brazilian initiative, named Open Mind Common Sense in Brazil (OMCS-Br), leaded by the Advanced Interaction Lab at Federal University of São Carlos (LIA-UFSCar) started in 2005 in collaboration with the Software Agents Group at the MIT Media Lab which main goal is to collect Brazilian's common sense and develop culturally sensitive software applications based on extracting cultural profiles' knowledge from ConceptNet to help developers and users on better suite the final user's reality on having a culturally contextualized content software, making the final applications more flexible, adaptive, accessible and usable. The main applications' focuses are education and healthcare.
See also
- Never-Ending Language LearningNever-Ending Language LearningNever-Ending Language Learning system is a semantic machine learning system developed by a research team at Carnegie Mellon University, and supported by grants from DARPA, Google, and the NSF, with portions of the system running on a supercomputing cluster provided by Yahoo!.-Process and...
- MindpixelMindpixelMindpixel was a web-based collaborative artificial intelligence project which aimed to create a knowledgebase of millions of human validated true/false statements, or probabilistic propositions. It ran from 2000 to 2005.-Description:...
- ThoughtTreasureThoughtTreasureThoughtTreasure is a commonsense knowledge base and architecture for natural language processing.It contains both declarative and proceduralknowledge.-Declarative knowledge:ThoughtTreasure's knowledge baseconsists of concepts, which are...
- Semantic WebSemantic WebThe Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...
- dbpediaDBpediaDBpedia is a project aiming to extract structured content from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web. DBpedia allows users to query relationships and properties associated with Wikipedia resources,...
- Freebase (database)Freebase (database)Freebase is a large collaborative knowledge base consisting of metadata composed mainly by its community members. It is an online collection of structured data harvested from many sources, including individual 'wiki' contributions. Freebase aims to create a global resource which allows people to...
- yago (database)YAGO (database)YAGO is a knowledge base developed at the Max-Planck-Institute Saarbrücken.The knowledge base contains information harvested from Wikipedia and linked to Wordnet....