Concept learning
Encyclopedia
Concept learning, also known as category learning, concept attainment, and concept formation, is largely based on the works of the cognitive psychologist Jerome Bruner
. Bruner, Goodnow, & Austin (1967) defined concept attainment (or concept learning) as "the search for and listing of attributes that can be used to distinguish exemplars from non exemplars of various categories." More simply put, concepts are the mental categories that help us classify objects, events, or ideas and each object, event, or idea has a set of common relevant features. Thus, concept learning is a strategy which requires a learner to compare and contrast groups or categories that contain concept-relevant features with groups or categories that do not contain concept-relevant features.
Concept learning also refers to a learning task in which a human or machine learner is trained to classify objects by being shown a set of example objects along with their class labels. The learner will simplify what has been observed in an example. This simplified version of what has been learned will then be applied to future examples. Concept learning ranges in simplicity and complexity because learning takes place over many areas. When a concept is more difficult, it will be less likely that the learner will be able to simplify, and therefore they will be less likely to learn. Colloquially, task is known as learning from examples. Most theories of concept learning are based on the storage of exemplars and avoid summarization or overt abstraction of any kind.
in general. These issues are addressed in many diverse literatures, including Version Spaces, Statistical Learning Theory
, PAC Learning, Information Theory
, and Algorithmic Information Theory
. Some of the broad theoretical ideas are also discussed by Watanabe (1969,1985), Solomonoff (1964a,1964b), and Rendell (1986).
, data compression
, simplification, and summarization, currently popular psychological theories of concept learning diverge on all these basic points.
and less on definition learning. Rules can be used in learning when the stimuli are confusable as opposed to simple. When rules are used in learning, the decisions are made based on properties alone and rely on simple criteria that do require a lot of memory ( Rouder and Ratcliff, 2006).
Example of Rule based theory:
"A radiologist using rule-based categorization would observe
whether specific properties of the X-ray meet certain
criteria; for example, is there an extreme difference in brightness
in a suspicious region relative to the other regions? A decision is
then based on this property alone" (Rouder and Ratcliff 2006)
Prototype theory
:
The prototype view on concept learning holds that people categorize based on one or more central examples of a given category followed by a penumbra of decreasingly typical examples. This implies that people do not categorize based on a list of things that all correspond to a definition; rather, a hierarchical inventory based on semantic similarity to the central example(s).
To illustrate this, imagine the following mental representations of the category: Sports
The first illustration may demonstrate a mental representation if we were to categorize by definition:
Definition of Sports: an athletic activity requiring skill or physical prowess and often of a competitive nature.
Basketball Football Bowling
Baseball Skiing
Track and field Snowboarding
Lacrosse rugby
Soccer Sports Skateboarding
Golf Bike-Racing
Hockey Surfing
Weightlifting Tennis
The second illustration may demonstrate a mental representation that Prototype Theory would predict:
1. Baseball
2. Football
3. Basketball
4. Soccer
5. Hockey
6. Tennis
7. Golf
...
15. Bike-racing
16. Weightlifting
17. Skateboarding
18. Snowboarding
19. Boxing
20. Wrestling
...
32. Fishing
33. Hunting
34. Hiking
35. sky-diving
36. bungee-jumping
...
62. cooking
63. walking
...
82. Gatorade
83. water
84. protein
85. diet
As you can see the Prototype theory hypothesizes a more continuous (less discrete) way of categorization in which we don’t limit the list to things that match the category’s definition.
Problems with Exemplar Theory
Exemplar models critically depend on two measures:
Sometimes it is difficult to attain or distinguish these measures.
The original theory proposed by Mitchell, Keller, and Kedar-Cabelli in 1986, called explanation-based generalization, is that learning occurs through progressive generalizing. This theory was first developed to program machines to learn. When applied to human cognition, it translates as such - the mind actively separates information that applies to more than one thing and enters it into a broader description of a category of things. This is done by identifying sufficient conditions for a thing fitting a category, similar to schematizing.
The revised model revolves around the integration of four mental processes – generalization, chunking, operationalization, and analogy.
This particular theory of concept learning is relatively new and more research is now being conducted to test it.
One model that incorporates the Bayesian theory of concept learning is the ACT-R
model, developed by John R. Anderson. The ACT-R model is a programming language that works to define the basic cognitive and perceptual operations that enable the human mind by producing a step-by-step simulation of human behavior. This theory works along with the idea that each task humans perform should consist of a series of discrete operations. The model has been applied to learning and memory, higher level cognition, natural language, perception and attention, human-computer interaction, education and computer generated forces.
In addition to John R. Anderson, Joshua Tenenbaum
has been a contributor to the field of concept learning; studying the computational basis of human learning and inference using behavioral testing of adults, children, and machines from Bayesian statistics and probability theory, but also from geometry, graph theory, and linear algebra. Tenenbaum is working to achieve a better understanding of human learning in computational terms and trying to build computational systems that come closer to the capacities of human learners.
is not one of finding the "right" theory of concept learning, but one of finding the most effective method for a given task. As such, there has been a huge proliferation of concept learning theories. In the machine learning
literature, this concept learning is more typically called supervised learning
or supervised classification, in contrast to unsupervised learning
or unsupervised classification, in which the learner is not provided with class labels. In machine learning, algorithms of in Exemplar theory are also known as instance learners or lazy learners.
There are three important roles for machine learning.
Machine learning has an exciting future. Some future advantages include; learning across full mixed-media data, learning across multiple internal databases (including the internet and news feeds), learning by active experimentation, learning decisions rather than predictions, and the possibility of programming languages with learning embedded.
principle is a formalization of Occam's Razor
in which the best hypothesis
for a given set of data is the one that leads to the largest compression of the data. In short, data that shows a lot of regularities and/or patterns, may be compressed without losing any important information. Applying this to learning, we can conclude that the more regularity and/or patterns we are able to find within data, the more we have learned about the data.
Jerome Bruner
Jerome Seymour Bruner is an American psychologist who has contributed to cognitive psychology and cognitive learning theory in educational psychology, as well as to history and to the general philosophy of education. Bruner is currently a senior research fellow at the New York University School...
. Bruner, Goodnow, & Austin (1967) defined concept attainment (or concept learning) as "the search for and listing of attributes that can be used to distinguish exemplars from non exemplars of various categories." More simply put, concepts are the mental categories that help us classify objects, events, or ideas and each object, event, or idea has a set of common relevant features. Thus, concept learning is a strategy which requires a learner to compare and contrast groups or categories that contain concept-relevant features with groups or categories that do not contain concept-relevant features.
Concept learning also refers to a learning task in which a human or machine learner is trained to classify objects by being shown a set of example objects along with their class labels. The learner will simplify what has been observed in an example. This simplified version of what has been learned will then be applied to future examples. Concept learning ranges in simplicity and complexity because learning takes place over many areas. When a concept is more difficult, it will be less likely that the learner will be able to simplify, and therefore they will be less likely to learn. Colloquially, task is known as learning from examples. Most theories of concept learning are based on the storage of exemplars and avoid summarization or overt abstraction of any kind.
Types of Concepts
- Not a Concept. Learning through reciting something from memory (recall) or discriminating between two things that differ (discrimination) is not the same as concept learning. However, these issues are closely related, since fact memory recall could be considered a "trivial" conceptual process where prior exemplars representing the concept were invariant. Similarly, while discrimination is not the same as initial concept learning, discrimination processes are involved in refinement of concepts with repeated presentation of exemplars.
- Concrete or Perceptual Concepts vs Abstract Concepts
- Defined (or Relational) and Associated Concepts
- Complex Concepts. Constructs such as a schemaSchema (psychology)A schema , in psychology and cognitive science, describes any of several concepts including:* An organized pattern of thought or behavior.* A structured cluster of pre-conceived ideas....
and a script are examples of complex concepts. A schema is an organization of smaller concepts (or features) and is revised by situational information to assist in comprehension. A script on the other hand is a list of actions that a person follows in order to complete a desired goal. An example of a script would be buying a CD. There are several actions that must occur before the actual act of purchasing the CD and a script provides you with the necessary actions and proper order of these actions in order to be successful in purchasing the CD.
Methods of learning a concept
- discovery Every baby must rediscover concepts for itself, such as discovering that each of its fingers can be individually controlled or that care givers are individuals. Although this is perception driven, formation of the concept is more than memorizing perceptions.
- examples Supervised or unsupervised generalizing from examples may lead to learning a new concept, but concept formation is more than generalizing from examples.
- words Hearing or reading new words leads to learning new concepts, but forming a new concept is more than learning a dictionary definition. A person may have previously formed a new concept before encountering the word or phrase for it.
The Theoretical Issues
The theoretical issues underlying concept learning are those underlying inductionInductive reasoning
Inductive reasoning, also known as induction or inductive logic, is a kind of reasoning that constructs or evaluates propositions that are abstractions of observations. It is commonly construed as a form of reasoning that makes generalizations based on individual instances...
in general. These issues are addressed in many diverse literatures, including Version Spaces, Statistical Learning Theory
Statistical learning theory
Statistical learning theory is an ambiguous term.#It may refer to computational learning theory, which is a sub-field of theoretical computer science that studies how algorithms can learn from data....
, PAC Learning, Information Theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...
, and Algorithmic Information Theory
Algorithmic information theory
Algorithmic information theory is a subfield of information theory and computer science that concerns itself with the relationship between computation and information...
. Some of the broad theoretical ideas are also discussed by Watanabe (1969,1985), Solomonoff (1964a,1964b), and Rendell (1986).
Modern Psychological Theories of Concept Learning
It is difficult to make any general statements about human (or animal) concept learning without already assuming a particular psychological theory of concept learning. Although the classical views of concepts and concept learning in philosophy speak of a process of abstractionAbstraction
Abstraction is a process by which higher concepts are derived from the usage and classification of literal concepts, first principles, or other methods....
, data compression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
, simplification, and summarization, currently popular psychological theories of concept learning diverge on all these basic points.
Rule-Based Theories of Concept Learning
Rule-based theories of concept learning take classification data and a rule-based theory as input, which are the result of a rule-based learner with the hopes of producing a more accurate model of the data (Hekenaho 1997). The majority of rule-based models that have been developed are heuristic, meaning that rational analyses have not been provided and the models are not related to statistical approaches to induction. A rational analysis for rule-based models could presume that concepts are represented as rules, and would then ask what degree of belief a rational agent should be in agreement with each rule, provided some observed examples (Goodman, Griffiths, Feldman, and Tenenbaum). Rule based theories of concept learning are focused more so on perceptual learningPerceptual learning
The term perceptual learning refers to the process of long lasting improvement in performing perceptual tasks as a function of experienceand practice . According to Eleanor Gibson , it refers to the experience-induced changes in the way information is extracted following sensory experience...
and less on definition learning. Rules can be used in learning when the stimuli are confusable as opposed to simple. When rules are used in learning, the decisions are made based on properties alone and rely on simple criteria that do require a lot of memory ( Rouder and Ratcliff, 2006).
Example of Rule based theory:
"A radiologist using rule-based categorization would observe
whether specific properties of the X-ray meet certain
criteria; for example, is there an extreme difference in brightness
in a suspicious region relative to the other regions? A decision is
then based on this property alone" (Rouder and Ratcliff 2006)
Prototype Theory of Concept Learning
The prototype view on concept learning holds that people abstract out the central tendency (or prototype) of the experienced examples, and use this as a basis for their categorization decisions.Prototype theory
Prototype Theory
Prototype theory is a mode of graded categorization in cognitive science, where some members of a category are more central than others. For example, when asked to give an example of the concept furniture, chair is more frequently...
:
The prototype view on concept learning holds that people categorize based on one or more central examples of a given category followed by a penumbra of decreasingly typical examples. This implies that people do not categorize based on a list of things that all correspond to a definition; rather, a hierarchical inventory based on semantic similarity to the central example(s).
To illustrate this, imagine the following mental representations of the category: Sports
The first illustration may demonstrate a mental representation if we were to categorize by definition:
Definition of Sports: an athletic activity requiring skill or physical prowess and often of a competitive nature.
Basketball Football Bowling
Baseball Skiing
Track and field Snowboarding
Lacrosse rugby
Soccer Sports Skateboarding
Golf Bike-Racing
Hockey Surfing
Weightlifting Tennis
The second illustration may demonstrate a mental representation that Prototype Theory would predict:
1. Baseball
2. Football
3. Basketball
4. Soccer
5. Hockey
6. Tennis
7. Golf
...
15. Bike-racing
16. Weightlifting
17. Skateboarding
18. Snowboarding
19. Boxing
20. Wrestling
...
32. Fishing
33. Hunting
34. Hiking
35. sky-diving
36. bungee-jumping
...
62. cooking
63. walking
...
82. Gatorade
83. water
84. protein
85. diet
As you can see the Prototype theory hypothesizes a more continuous (less discrete) way of categorization in which we don’t limit the list to things that match the category’s definition.
Exemplar Theories of Concept Learning
Exemplar theory is the storage of specific instances (exemplars), with new objects evaluated only with respect to how closely they resemble specific known members (and nonmembers) of the category. This theory hypothesizes that learners store examples verbatim. This theory views concept learning as highly simplistic. Only individual properties are represented. These individual properties are not abstract and they do not create rules. An example of what Exemplar theory would look at is, “water is wet;” it simply knows that some (or one, or all) stored examples of water have the property wet. Exemplar based theories have become more empirically popular over the years with some evidence suggesting that human learners use exemplar based strategies only in early learning, forming prototypes and generalizations later in life. An important result of exemplar models in psychological literature has been a de-emphasis of complexity in concept learning. Some of the best known exemplar theory of concept learning is the Generalized Context Model (GCM).Problems with Exemplar Theory
Exemplar models critically depend on two measures:
- Similarity between exemplars
- Rule to determine Group Membership
Sometimes it is difficult to attain or distinguish these measures.
Multiple-Prototype Theories of Concept Learning
More recently, cognitive psychologists have begun to explore the idea that the prototype and exemplar models form two extremes. It has been suggested that people are able to form a multiple prototype representation, besides the two extreme representations. For example, consider the category spoon. There are two distinct subgroups or conceptual clusters: spoons tend to be either large and wooden or small and made of steel. The prototypical spoon would then be a medium-size object made of a mixture of steel and wood, which is clearly an unrealistic proposal. A more natural representation of the category spoon would instead consist of multiple (at least two) prototypes, one for each cluster. A number of different proposals have been made in this regard (Anderson, 1991; Griffiths, Canini, Sanborn & Navarro, 2007; Love, Medin & Gureckis, 2004; Vanpaemel & Storms, 2008). These models can be regarded as providing a compromise between exemplar and prototype models.Explanation-Based Theories of Concept Learning
The basic idea of explanation-based learning suggests that a new concept is acquired by experiencing examples of it and forming a basic outline. Put simply, by observing or receiving the qualities of a thing the mind forms a concept which possesses and is identified by those qualities.The original theory proposed by Mitchell, Keller, and Kedar-Cabelli in 1986, called explanation-based generalization, is that learning occurs through progressive generalizing. This theory was first developed to program machines to learn. When applied to human cognition, it translates as such - the mind actively separates information that applies to more than one thing and enters it into a broader description of a category of things. This is done by identifying sufficient conditions for a thing fitting a category, similar to schematizing.
The revised model revolves around the integration of four mental processes – generalization, chunking, operationalization, and analogy.
- Generalization is the process by which the characteristics of a concept which are fundamental to it are recognized and labeled. For example, birds have feathers and wings. Any thing with feathers and wings will be identified as ‘bird’.
- When information is grouped mentally, whether by similarity or relatedness, the group is called a chunk. Chunks can vary in size from a single item with parts or many items with many parts.
- A concept is operationalized when the mind is able to actively recognize examples of it by characteristics and label it appropriately.
- Analogy is the recognition of similarities between potential examples.
This particular theory of concept learning is relatively new and more research is now being conducted to test it.
Bayesian Theories of Concept Learning
Bayes' theorem is important because it provides a powerful tool for understanding, manipulating and controlling data5 that takes a larger view that is not limited to data analysis alone6. The approach is subjective and this requires the assessment of prior probabilities6, making it also very complex. However, if Bayesians show that the accumulated evidence and the application of Bayes's law are sufficient the work will overcome the subjectivity of the inputs involved7. Bayesian inference can be used for any honestly collected data and has a major advantage because of its scientific focus6.One model that incorporates the Bayesian theory of concept learning is the ACT-R
ACT-R
ACT-R is a cognitive architecture mainly developed by John Robert Anderson at Carnegie Mellon University. Like any cognitive architecture, ACT-R aims to define the basic and irreducible cognitive and perceptual operations that enable the human mind....
model, developed by John R. Anderson. The ACT-R model is a programming language that works to define the basic cognitive and perceptual operations that enable the human mind by producing a step-by-step simulation of human behavior. This theory works along with the idea that each task humans perform should consist of a series of discrete operations. The model has been applied to learning and memory, higher level cognition, natural language, perception and attention, human-computer interaction, education and computer generated forces.
In addition to John R. Anderson, Joshua Tenenbaum
Joshua Tenenbaum
Joshua Tenenbaum is Associate Professor of Cognitive Science and Computation at the Massachusetts Institute of Technology. He is known for contributions to mathematical psychology. Tenenbaum previously taught at Stanford University, where he is presently the Wasow Visiting Fellow.-Life:Tenenbaum...
has been a contributor to the field of concept learning; studying the computational basis of human learning and inference using behavioral testing of adults, children, and machines from Bayesian statistics and probability theory, but also from geometry, graph theory, and linear algebra. Tenenbaum is working to achieve a better understanding of human learning in computational terms and trying to build computational systems that come closer to the capacities of human learners.
Component Display Theory
M. D. Merrill's Component Display Theory (CDT) is a cognitive matrix that focuses on the interaction between two dimensions: the level of performance expected from the learner and the types of content of the material to be learned. Merrill classifies learner's level of performance as find, use, remember and material content as facts, concepts, procedures, and principles. The theory also calls upon four primary presentation forms, and several other secondary presentation forms. The primary presentation forms include: rules, examples, recall, and practice. Secondary presentation forms include: prerequisites, objectives, helps, mnemonics, and feedback. A complete lesson should include a combination of these primary and secondary presentation forms, but the most effective combination varies from learner to learner and also from concept to concept. Another significant aspect of the CDT model is that it allows for the learner to control the instructional strategies used and adapt them to meet his or her own learning style and preference. A major goal of this model was to reduce three common errors in concept formation: over-generalization, under-generalization and misconception.Machine Learning Approaches to Concept Learning
This is a budding field due to recent progress in algorithms, computational power, and the expansion of information on the internet. Unlike the situation in Psychology, the problem of concept learning within machine learningMachine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
is not one of finding the "right" theory of concept learning, but one of finding the most effective method for a given task. As such, there has been a huge proliferation of concept learning theories. In the machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
literature, this concept learning is more typically called supervised learning
Supervised learning
Supervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value...
or supervised classification, in contrast to unsupervised learning
Unsupervised learning
In machine learning, unsupervised learning refers to the problem of trying to find hidden structure in unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution...
or unsupervised classification, in which the learner is not provided with class labels. In machine learning, algorithms of in Exemplar theory are also known as instance learners or lazy learners.
There are three important roles for machine learning.
- Data MiningData miningData mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
: this is using historical data to improve decisions. An example is looking at medical records and applying it to medical knowledge when making a diagnoses. - Software applications that we cannot program by hand: Examples of this are autonomous driving and speech recognition
- Self-customizing programs: An example of this is a newsreader that learns a readers particular interests and highlights these when the reader visits the site.
Machine learning has an exciting future. Some future advantages include; learning across full mixed-media data, learning across multiple internal databases (including the internet and news feeds), learning by active experimentation, learning decisions rather than predictions, and the possibility of programming languages with learning embedded.
Minimum Description Length Theories
The minimum description lengthMinimum description length
The minimum description length principle is a formalization of Occam's Razor in which the best hypothesis for a given set of data is the one that leads to the best compression of the data. MDL was introduced by Jorma Rissanen in 1978...
principle is a formalization of Occam's Razor
Occam's razor
Occam's razor, also known as Ockham's razor, and sometimes expressed in Latin as lex parsimoniae , is a principle that generally recommends from among competing hypotheses selecting the one that makes the fewest new assumptions.-Overview:The principle is often summarized as "simpler explanations...
in which the best hypothesis
Hypothesis
A hypothesis is a proposed explanation for a phenomenon. The term derives from the Greek, ὑποτιθέναι – hypotithenai meaning "to put under" or "to suppose". For a hypothesis to be put forward as a scientific hypothesis, the scientific method requires that one can test it...
for a given set of data is the one that leads to the largest compression of the data. In short, data that shows a lot of regularities and/or patterns, may be compressed without losing any important information. Applying this to learning, we can conclude that the more regularity and/or patterns we are able to find within data, the more we have learned about the data.