Machine learning
Encyclopedia
Machine learning, a branch of artificial intelligence
, is a scientific discipline concerned with the design and development of algorithm
s that allow computer
s to evolve behaviors based on empirical data
, such as from sensor
data or database
s. Machine learning is concerned with the development of algorithms allowing the machine to learn via inductive inference
based on observing data that represents incomplete information about statistical phenomenon and generalize it to rules and make predictions on missing attributes or future data. An important task of machine learning is classification, which is also referred to as pattern recognition
, in which machines "learn" to automatically recognize complex patterns, to distinguish between exemplars based on their different patterns, and to make intelligent predictions on their class.
provided a widely quoted definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
However, these two areas overlap in many ways: data mining uses many machine learning methods, but often with a slightly different goal in mind. On the other hand, machine learning also employs data mining methods as "unsupervised learning" or as a preprocessing step to improve learner accuracy. Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, the performance is usually evaluated with respect to the ability to reproduce known knowledge, while in KDD the key task is the discovery of previously unknown knowledge. Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data.
in data analysis, while others adopt a collaborative approach between human and machine. Human intuition cannot, however, be entirely eliminated, since the system's designer must specify how the data is to be represented and what mechanisms will be used to search for a characterization of the data.
s can be organized into a taxonomy
based on the desired outcome of the algorithm.
known as computational learning theory
. Because training sets are finite and the future is uncertain, learning theory usually does not yield guarantees of the performance of algorithms. Instead, probabilistic bounds on the performance are quite common.
In addition to performance bounds, computational learning theorists study the time complexity and feasibility of learning. In computational learning theory, a computation is considered feasible if it can be done in polynomial time. There are two kinds of time complexity
results. Positive results show that a certain class of functions can be learned in polynomial time. Negative results show that certain classes cannot be learned in polynomial time.
There are many similarities between machine learning theory and statistics, although they use different terms.
as a predictive model
which maps observations about an item to conclusions about the item's target value.
(ANN) learning algorithm, usually called "neural network" (NN), is a learning algorithm that is inspired by the structure and/or functional aspects of biological neural networks. Computations are structured in terms of an interconnected group of artificial neuron
s, processing information using a connectionist
approach to computation
. Modern neural networks are non-linear statistical data modeling
tools. They are usually used to model complex relationships between inputs and outputs, to find patterns
in data, or to capture the statistical structure in an unknown joint probability distribution between observed variables.
-based methodology inspired by biological evolution to find computer program
s that perform a user-defined task. It is a specialization of genetic algorithms (GA) where each individual is a computer program. It is a machine learning technique used to optimize a population of computer programs according to a fitness landscape
determined by a program's ability to perform a given computational task.
as a uniform representation for examples, background knowledge, and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesized logic program which entails
all the positive and none of the negative examples.
methods used for classification and regression
. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.
, and a common technique for statistical
data analysis
.
that represents a set of random variables and their conditional independencies
via a directed acyclic graph
(DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform inference
and learning.
problem in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected.
algorithms, aim at discovering better representations of the inputs provided during training. Classical examples include principal components analysis
and clustering
. Representation learning algorithms often attempt to preserve the information in their input but transform it in a way that makes it useful, often as a pre-processing step before performing classification or predictions, allowing to reconstruct the inputs coming from the unknown data generating distribution, while not being necessarily faithful for configurations that are implausible under that distribution. Manifold learning algorithms attempt to do so under the constraint that the learned representation is low-dimensional. Sparse coding
algorithms attempt to do so under the constraint that the learned representation is sparse (has many zeros). Deep learning
algorithms discover multiple levels of representation, or a hierarchy of features, with higher-level, more abstract features defined in terms of (or generating) lower-level features. It has been argued that an intelligent machine is one that learns a representation that disentangles the underlying factors of variation that explain the observed data.
In 2006, the on-line movie company Netflix
held the first "Netflix Prize
" competition to find a program to better predict user preferences and beat its existing Netflix movie recommendation system by at least 10%. The AT&T Research Team BellKor beat out several other teams with their machine learning program "Pragmatic Chaos". After winning several minor prizes, it won the grand prize competition in 2009 for $1 million.
, Weka
, ODM
, Shogun toolbox
, Orange
, Apache Mahout
and scikit-learn
are software suite
s containing a variety of machine learning algorithms.
Artificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...
, is a scientific discipline concerned with the design and development of algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
s that allow computer
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...
s to evolve behaviors based on empirical data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...
, such as from sensor
Sensor
A sensor is a device that measures a physical quantity and converts it into a signal which can be read by an observer or by an instrument. For example, a mercury-in-glass thermometer converts the measured temperature into expansion and contraction of a liquid which can be read on a calibrated...
data or database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
s. Machine learning is concerned with the development of algorithms allowing the machine to learn via inductive inference
Inductive inference
Around 1960, Ray Solomonoff founded the theory of universal inductive inference, the theory of prediction based on observations; for example, predicting the next symbol based upon a given series of symbols...
based on observing data that represents incomplete information about statistical phenomenon and generalize it to rules and make predictions on missing attributes or future data. An important task of machine learning is classification, which is also referred to as pattern recognition
Pattern recognition
In machine learning, pattern recognition is the assignment of some sort of output value to a given input value , according to some specific algorithm. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes...
, in which machines "learn" to automatically recognize complex patterns, to distinguish between exemplars based on their different patterns, and to make intelligent predictions on their class.
Definition
Tom M. MitchellTom M. Mitchell
Tom Michael Mitchell is an American computer scientist and E. Fredkin University Professor at the Carnegie Mellon University . He is currently the Chair of Machine Learning Department at CMU...
provided a widely quoted definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
Generalization
The core objective of a learner is to generalize from its experience. The training examples from its experience come from some generally unknown probability distribution and the learner has to extract from them something more general, something about that distribution, that allows it to produce useful answers in new cases.Machine learning, knowledge discovery in databases (KDD) and data mining
These three terms are commonly confused, as they often employ the same methods and overlap strongly. They can be roughly separated as follows:- Machine learning focuses on the prediction, based on known properties learned from the training data
- Data mining (which is the analysis step of Knowledge Discovery in Databases) focuses on the discoveryDiscovery (observation)Discovery is the act of detecting something new, or something "old" that had been unknown. With reference to science and academic disciplines, discovery is the observation of new phenomena, new actions, or new events and providing new reasoning to explain the knowledge gathered through such...
of (previously) unknown properties on the data
However, these two areas overlap in many ways: data mining uses many machine learning methods, but often with a slightly different goal in mind. On the other hand, machine learning also employs data mining methods as "unsupervised learning" or as a preprocessing step to improve learner accuracy. Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, the performance is usually evaluated with respect to the ability to reproduce known knowledge, while in KDD the key task is the discovery of previously unknown knowledge. Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data.
Human interaction
Some machine learning systems attempt to eliminate the need for human intuitionIntuition (knowledge)
Intuition is the ability to acquire knowledge without inference or the use of reason. "The word 'intuition' comes from the Latin word 'intueri', which is often roughly translated as meaning 'to look inside'’ or 'to contemplate'." Intuition provides us with beliefs that we cannot necessarily justify...
in data analysis, while others adopt a collaborative approach between human and machine. Human intuition cannot, however, be entirely eliminated, since the system's designer must specify how the data is to be represented and what mechanisms will be used to search for a characterization of the data.
Algorithm types
Machine learning algorithmAlgorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
s can be organized into a taxonomy
Taxonomy
Taxonomy is the science of identifying and naming species, and arranging them into a classification. The field of taxonomy, sometimes referred to as "biological taxonomy", revolves around the description and use of taxonomic units, known as taxa...
based on the desired outcome of the algorithm.
- Supervised learningSupervised learningSupervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value...
generates a function that maps inputs to desired outputs (also called labels, because they are often provided by human experts labeling the training examples). For example, in a classification problem, the learner approximates a function mapping a vector into classes by looking at input-output examples of the function. - Unsupervised learningUnsupervised learningIn machine learning, unsupervised learning refers to the problem of trying to find hidden structure in unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution...
models a set of inputs, like clustering. See also data miningData miningData mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
and knowledge discovery. - Semi-supervised learningSemi-supervised learningIn computer science, semi-supervised learning is a class of machine learning techniques that make use of both labeled and unlabeled data for training - typically a small amount of labeled data with a large amount of unlabeled data...
combines both labeled and unlabeled examples to generate an appropriate function or classifier. - Reinforcement learningReinforcement learningInspired by behaviorist psychology, reinforcement learning is an area of machine learning in computer science, concerned with how an agent ought to take actions in an environment so as to maximize some notion of cumulative reward...
learns how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback in the form of rewards that guides the learning algorithm. - TransductionTransduction (machine learning)In logic, statistical inference, and supervised learning,transduction or transductive inference is reasoning fromobserved, specific cases to specific cases. In contrast,induction is reasoning from observed training cases...
tries to predict new outputs based on training inputs, training outputs, and test inputs. - Learning to learn learns its own inductive biasInductive biasThe inductive bias of a learning algorithm is the set of assumptions that the learner uses to predict outputs given inputs that it has not encountered ....
based on previous experience.
Theory
The computational analysis of machine learning algorithms and their performance is a branch of theoretical computer scienceTheoretical computer science
Theoretical computer science is a division or subset of general computer science and mathematics which focuses on more abstract or mathematical aspects of computing....
known as computational learning theory
Computational learning theory
In theoretical computer science, computational learning theory is a mathematical field related to the analysis of machine learning algorithms.-Overview:Theoretical results in machine learning mainly deal with a type of...
. Because training sets are finite and the future is uncertain, learning theory usually does not yield guarantees of the performance of algorithms. Instead, probabilistic bounds on the performance are quite common.
In addition to performance bounds, computational learning theorists study the time complexity and feasibility of learning. In computational learning theory, a computation is considered feasible if it can be done in polynomial time. There are two kinds of time complexity
Time complexity
In computer science, the time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the size of the input to the problem. The time complexity of an algorithm is commonly expressed using big O notation, which suppresses multiplicative constants and...
results. Positive results show that a certain class of functions can be learned in polynomial time. Negative results show that certain classes cannot be learned in polynomial time.
There are many similarities between machine learning theory and statistics, although they use different terms.
Decision tree learning
Decision tree learning uses a decision treeDecision tree
A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm. Decision trees are commonly used in operations research, specifically...
as a predictive model
Predictive modelling
Predictive modelling is the process by which a model is created or chosen to try to best predict the probability of an outcome. In many cases the model is chosen on the basis of detection theory to try to guess the probability of an outcome given a set amount of input data, for example given an...
which maps observations about an item to conclusions about the item's target value.
Association rule learning
Association rule learning is a method for discovering interesting relations between variables in large databases.Artificial neural networks
An artificial neural networkArtificial neural network
An artificial neural network , usually called neural network , is a mathematical model or computational model that is inspired by the structure and/or functional aspects of biological neural networks. A neural network consists of an interconnected group of artificial neurons, and it processes...
(ANN) learning algorithm, usually called "neural network" (NN), is a learning algorithm that is inspired by the structure and/or functional aspects of biological neural networks. Computations are structured in terms of an interconnected group of artificial neuron
Artificial neuron
An artificial neuron is a mathematical function conceived as a crude model, or abstraction of biological neurons. Artificial neurons are the constitutive units in an artificial neural network...
s, processing information using a connectionist
Connectionism
Connectionism is a set of approaches in the fields of artificial intelligence, cognitive psychology, cognitive science, neuroscience and philosophy of mind, that models mental or behavioral phenomena as the emergent processes of interconnected networks of simple units...
approach to computation
Computation
Computation is defined as any type of calculation. Also defined as use of computer technology in Information processing.Computation is a process following a well-defined model understood and expressed in an algorithm, protocol, network topology, etc...
. Modern neural networks are non-linear statistical data modeling
Data modeling
Data modeling in software engineering is the process of creating a data model for an information system by applying formal data modeling techniques.- Overview :...
tools. They are usually used to model complex relationships between inputs and outputs, to find patterns
Pattern recognition
In machine learning, pattern recognition is the assignment of some sort of output value to a given input value , according to some specific algorithm. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes...
in data, or to capture the statistical structure in an unknown joint probability distribution between observed variables.
Genetic programming
Genetic programming (GP) is an evolutionary algorithmEvolutionary algorithm
In artificial intelligence, an evolutionary algorithm is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses some mechanisms inspired by biological evolution: reproduction, mutation, recombination, and selection...
-based methodology inspired by biological evolution to find computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
s that perform a user-defined task. It is a specialization of genetic algorithms (GA) where each individual is a computer program. It is a machine learning technique used to optimize a population of computer programs according to a fitness landscape
Fitness landscape
In evolutionary biology, fitness landscapes or adaptive landscapes are used to visualize the relationship between genotypes and reproductive success. It is assumed that every genotype has a well-defined replication rate . This fitness is the "height" of the landscape...
determined by a program's ability to perform a given computational task.
Inductive logic programming
Inductive logic programming (ILP) is an approach to rule learning using logic programmingLogic programming
Logic programming is, in its broadest sense, the use of mathematical logic for computer programming. In this view of logic programming, which can be traced at least as far back as John McCarthy's [1958] advice-taker proposal, logic is used as a purely declarative representation language, and a...
as a uniform representation for examples, background knowledge, and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesized logic program which entails
Entailment
In logic, entailment is a relation between a set of sentences and a sentence. Let Γ be a set of one or more sentences; let S1 be the conjunction of the elements of Γ, and let S2 be a sentence: then, Γ entails S2 if and only if S1 and not-S2 are logically inconsistent...
all the positive and none of the negative examples.
Support vector machines
Support vector machines (SVMs) are a set of related supervised learningSupervised learning
Supervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value...
methods used for classification and regression
Regression analysis
In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...
. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.
Clustering
Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. Clustering is a method of unsupervised learningUnsupervised learning
In machine learning, unsupervised learning refers to the problem of trying to find hidden structure in unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution...
, and a common technique for statistical
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
data analysis
Data analysis
Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making...
.
Bayesian networks
A Bayesian network, belief network or directed acyclic graphical model is a probabilistic graphical modelGraphical model
A graphical model is a probabilistic model for which a graph denotes the conditional independence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning....
that represents a set of random variables and their conditional independencies
Conditional independence
In probability theory, two events R and B are conditionally independent given a third event Y precisely if the occurrence or non-occurrence of R and the occurrence or non-occurrence of B are independent events in their conditional probability distribution given Y...
via a directed acyclic graph
Directed acyclic graph
In mathematics and computer science, a directed acyclic graph , is a directed graph with no directed cycles. That is, it is formed by a collection of vertices and directed edges, each edge connecting one vertex to another, such that there is no way to start at some vertex v and follow a sequence of...
(DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform inference
Inference
Inference is the act or process of deriving logical conclusions from premises known or assumed to be true. The conclusion drawn is also called an idiomatic. The laws of valid inference are studied in the field of logic.Human inference Inference is the act or process of deriving logical conclusions...
and learning.
Reinforcement learning
Reinforcement learning is concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward. Reinforcement learning algorithms attempt to find a policy that maps states of the world to the actions the agent ought to take in those states. Reinforcement learning differs from the supervised learningSupervised learning
Supervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value...
problem in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected.
Representation learning
Several learning algorithms, mostly unsupervised learningUnsupervised learning
In machine learning, unsupervised learning refers to the problem of trying to find hidden structure in unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution...
algorithms, aim at discovering better representations of the inputs provided during training. Classical examples include principal components analysis
Principal components analysis
Principal component analysis is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. The number of principal components is less than or equal to...
and clustering
Clustering
Clustering can refer to the following:In demographics:* Clustering , the gathering of various populations based on factors such as ethnicity, economics or religion.In graph theory:...
. Representation learning algorithms often attempt to preserve the information in their input but transform it in a way that makes it useful, often as a pre-processing step before performing classification or predictions, allowing to reconstruct the inputs coming from the unknown data generating distribution, while not being necessarily faithful for configurations that are implausible under that distribution. Manifold learning algorithms attempt to do so under the constraint that the learned representation is low-dimensional. Sparse coding
Sparse coding
The sparse code is a kind of neural code in which each item is encoded by the strong activation of a relatively small set of neurons. For each item to be encoded, this is a different subset of all available neurons....
algorithms attempt to do so under the constraint that the learned representation is sparse (has many zeros). Deep learning
Deep learning
Deep learning is a sub-field within machine learning that uses deep architectures to model complex relationships among data. Such models have proven to be effective feature extractors over high-dimensional, structured data ....
algorithms discover multiple levels of representation, or a hierarchy of features, with higher-level, more abstract features defined in terms of (or generating) lower-level features. It has been argued that an intelligent machine is one that learns a representation that disentangles the underlying factors of variation that explain the observed data.
Applications
Applications for machine learning include:- machine perceptionMachine perceptionIn computing, machine perception is the ability of computing machines to sense and interpret images, sounds, or other contents of their environments, or of the contents of stored media....
- computer visionComputer visionComputer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions...
- natural language processingNatural language processingNatural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
- syntactic pattern recognitionSyntactic pattern recognitionSyntactic pattern recognition or structural pattern recognition is a form of pattern recognition, in which each object can be represented by a variable-cardinality set of symbolic, nominal features...
- search engines
- medical diagnosisDiagnosis (artificial intelligence)As a subfield in artificial intelligence, Diagnosis is concerned with the development of algorithms and techniques that are able to determine whether the behaviour of a system is correct. If the system is not functioning correctly, the algorithm should be able to determine, as accurately as...
- bioinformaticsBioinformaticsBioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
- brain-machine interfaces
- cheminformaticsCheminformaticsCheminformatics is the use of computer and informational techniques, applied to a range of problems in the field of chemistry. These in silico techniques are used in pharmaceutical companies in the process of drug discovery...
- Detecting credit card fraudCredit card fraudCredit card fraud is a wide-ranging term for theft and fraud committed using a credit card or any similar payment mechanism as a fraudulent source of funds in a transaction. The purpose may be to obtain goods without paying, or to obtain unauthorized funds from an account. Credit card fraud is also...
- stock marketStock marketA stock market or equity market is a public entity for the trading of company stock and derivatives at an agreed price; these are securities listed on a stock exchange as well as those only traded privately.The size of the world stock market was estimated at about $36.6 trillion...
analysis - Classifying DNA sequenceDNA sequenceThe sequence or primary structure of a nucleic acid is the composition of atoms that make up the nucleic acid and the chemical bonds that bond those atoms. Because nucleic acids, such as DNA and RNA, are unbranched polymers, this specification is equivalent to specifying the sequence of...
s - speechSpeech recognitionSpeech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...
and handwriting recognitionHandwriting recognitionHandwriting recognition is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning or... - object recognitionObject recognitionObject recognition in computer vision is the task of finding a given object in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes / scale...
in computer visionComputer visionComputer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions... - game playingStrategy gameA strategy game or strategic game is a game in which the players' uncoerced, and often autonomous decision-making skills have a high significance in determining the outcome...
- software engineeringSoftware engineeringSoftware Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software...
- adaptive websiteAdaptive websiteAn Adaptive website is a website that builds a model of user activity and modifies the information and/or presentation of information to the user in order to better address the user's needs.- Overview :...
s - robot locomotionRobot locomotionRobot locomotion is the collective name for the various methods that robots use to transport themselves from place to place. Although wheeled robots are typically quite energy efficient and simple to control, other forms of locomotion may be more appropriate for a number of reasons...
- computational financeComputational financeComputational finance, also called financial engineering, is a cross-disciplinary field which relies on computational intelligence, mathematical finance, numerical methods and computer simulations to make trading, hedging and investment decisions, as well as facilitating the risk management of...
- structural health monitoringStructural health monitoringThe process of implementing a damage detection and characterization strategy for engineering structures is referred to as Structural Health Monitoring . Here damage is defined as changes to the material and/or geometric properties of a structural system, including changes to the boundary conditions...
. - Sentiment Analysis (or Opinion Mining)Sentiment analysisSentiment analysis or opinion mining refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials....
.
In 2006, the on-line movie company Netflix
Netflix
Netflix, Inc., is an American provider of on-demand internet streaming media in the United States, Canada, and Latin America and flat rate DVD-by-mail in the United States. The company was established in 1997 and is headquartered in Los Gatos, California...
held the first "Netflix Prize
Netflix Prize
The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings....
" competition to find a program to better predict user preferences and beat its existing Netflix movie recommendation system by at least 10%. The AT&T Research Team BellKor beat out several other teams with their machine learning program "Pragmatic Chaos". After winning several minor prizes, it won the grand prize competition in 2009 for $1 million.
Software
RapidMiner, KNIMEKNIME
KNIME, the Konstanz Information Miner, is a user friendly, coherent open source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining concept...
, Weka
Weka (machine learning)
Weka is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand...
, ODM
Oracle Data Mining
Oracle Data Mining is an option of Oracle Corporation's Relational Database Management System Enterprise Edition . It contains several data mining and data analysis algorithms for classification, prediction, regression,...
, Shogun toolbox
Shogun (toolbox)
Shogun is an Free software, open source toolbox written in C++. It offers numerous algorithms and data structures for machine learning problems.Shogun is licensed under the terms of the GNU General Public License version 3 or later.-Description:...
, Orange
Orange (software)
Orange is a component-based data mining and machine learning software suite, featuring friendly yet powerful and flexible visual programming front-end for explorative data analysis and visualization, and Python bindings and libraries for scripting...
, Apache Mahout
Apache Mahout
Apache Mahout is an Apache project to produce free implementations of distributed or otherwise scalable machine learning algorithms on the Hadoop platform...
and scikit-learn
Scikit-learn
scikit-learn is an open source machine learning library for the Pythonprogramming language. It features various classification, regression and clustering algorithms including support vector machines, logistic regression, naive Bayes, k-means and DBSCAN, and is designed to interoperate with NumPy...
are software suite
Software suite
A software suite or application suite is a collection of computer programs, usually application software and programming software of related functionality, often sharing a more-or-less common user interface and some ability to smoothly exchange data with each other.Sometimes software makers...
s containing a variety of machine learning algorithms.
Journals and conferences
- Machine Learning (journal)Machine Learning (journal)Machine Learning is a peer-reviewed scientific journal, published since 1986.In 2001, forty editors and members of the editorial board of Machine Learning resigned in order to found the Journal of Machine Learning Research , saying that in the era of the internet, it was detrimental for...
- Journal of Machine Learning ResearchJournal of Machine Learning ResearchThe Journal of Machine Learning Research , is a scientific journal focusing on machine learning, a subfield of artificial intelligence. It was founded in 2000....
- Neural ComputationNeural ComputationNeural Computation is a peer-reviewed academic journal covering aspects of neural computation. Articles highlight problems and techniques in modeling the brain, and in the design and construction of neurally-inspired information processing systems. Neural Computation was founded in 1989 and is...
(journal) - Journal of Intelligent Systems(journal)
- International Conference on Machine Learning (ICML)ICMLThe International Conference on Machine Learning is the leading international academic conference in machine learning, attracting annually about 500 participants from all over the world...
(conference) - Neural Information Processing Systems (NIPS) (conference)
See also
- Adaptive controlAdaptive controlAdaptive control is the control method used by a controller which must adapt to a controlled system with parameters which vary, or are initially uncertain. For example, as an aircraft flies, its mass will slowly decrease as a result of fuel consumption; a control law is needed that adapts itself...
- Cache language modelCache language modelA cache language model is a type of statistical language model. These occur in the natural language processing subfield of computer science and assign probabilities to given sequences of words by means of a probability distribution...
- Computational intelligenceComputational intelligenceComputational intelligence is a set of Nature-inspired computational methodologies and approaches to address complex problems of the real world applications to which traditional methodologies and approaches are ineffective or infeasible. It primarily includes Fuzzy logic systems, Neural Networks...
- Computational neuroscienceComputational neuroscienceComputational neuroscience is the study of brain function in terms of the information processing properties of the structures that make up the nervous system...
- Cognitive scienceCognitive scienceCognitive science is the interdisciplinary scientific study of mind and its processes. It examines what cognition is, what it does and how it works. It includes research on how information is processed , represented, and transformed in behaviour, nervous system or machine...
- Data miningData miningData mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
- Explanation-based learningExplanation-based learningExplanation-based learning is a form of machine learning that exploits a very strong, or even perfect, domain theory to make generalizations or form concepts from training examples.EBL software takes four inputs:...
- Important publications in machine learning
- Multi-label classificationMulti-label classificationIn machine learning, multi-label classification is a variant of the classification problem where multiple target labels must be assigned to each instance...
- Pattern recognitionPattern recognitionIn machine learning, pattern recognition is the assignment of some sort of output value to a given input value , according to some specific algorithm. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes...
- Predictive analyticsPredictive analyticsPredictive analytics encompasses a variety of statistical techniques from modeling, machine learning, data mining and game theory that analyze current and historical facts to make predictions about future events....
Further reading
- Sergios Theodoridis, Konstantinos Koutroumbas (2009) "Pattern Recognition", 4th Edition, Academic Press, ISBN 978-1-59749-272-0.
- Ethem Alpaydın (2004) Introduction to Machine Learning (Adaptive Computation and Machine Learning), MIT Press, ISBN 0-262-01211-1
- Bing Liu (2007), Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. Springer, ISBN 3-540-37881-2
- Toby Segaran, Programming Collective Intelligence, O'Reilly ISBN 0-596-52932-5
- Ray SolomonoffRay SolomonoffRay Solomonoff was the inventor of algorithmic probability, and founder of algorithmic information theory, He was an originator of the branch of artificial intelligence based on machine learning, prediction and probability...
, "An Inductive Inference Machine" A privately circulated report from the 1956 Dartmouth Summer Research Conference on AI. - Ray Solomonoff, An Inductive Inference Machine, IRE Convention Record, Section on Information Theory, Part 2, pp., 56-62, 1957.
- Ryszard S. Michalski, Jaime G. Carbonell, Tom M. Mitchell (1983), Machine Learning: An Artificial Intelligence Approach, Tioga Publishing Company, ISBN 0-935382-05-4.
- Ryszard S. Michalski, Jaime G. Carbonell, Tom M. Mitchell (1986), Machine Learning: An Artificial Intelligence Approach, Volume II, Morgan Kaufmann, ISBN 0-934613-00-1.
- Yves Kodratoff, Ryszard S. Michalski (1990), Machine Learning: An Artificial Intelligence Approach, Volume III, Morgan Kaufmann, ISBN 1-55860-119-8.
- Ryszard S. Michalski, George Tecuci (1994), Machine Learning: A Multistrategy Approach, Volume IV, Morgan Kaufmann, ISBN 1-55860-251-8.
- Bishop, C.M. (1995). Neural Networks for Pattern Recognition, Oxford University Press. ISBN 0-19-853864-2.
- Richard O. Duda, Peter E. Hart, David G. Stork (2001) Pattern classification (2nd edition), Wiley, New York, ISBN 0-471-05669-3.
- Huang T.-M., Kecman V., Kopriva I. (2006), Kernel Based Algorithms for Mining Huge Data Sets, Supervised, Semi-supervised, and Unsupervised Learning, Springer-Verlag, Berlin, Heidelberg, 260 pp. 96 illus., Hardcover, ISBN 3-540-31681-7.
- KECMAN Vojislav (2001), Learning and Soft Computing, Support Vector Machines, Neural Networks and Fuzzy Logic Models, The MIT Press, Cambridge, MA, 608 pp., 268 illus., ISBN 0-262-11255-8.
- MacKay, D.J.C. (2003). Information Theory, Inference, and Learning Algorithms, Cambridge University Press. ISBN 0-521-64298-1.
- Ian H. Witten and Eibe Frank Data Mining: Practical machine learning tools and techniques Morgan Kaufmann ISBN 0-12-088407-0.
- Sholom Weiss and Casimir Kulikowski (1991). Computer Systems That Learn, Morgan Kaufmann. ISBN 1-55860-065-5.
- Mierswa, Ingo and Wurst, Michael and Klinkenberg, Ralf and Scholz, Martin and Euler, Timm: YALE: Rapid Prototyping for Complex Data Mining Tasks, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), 2006.
- Trevor Hastie, Robert Tibshirani and Jerome Friedman (2001). The Elements of Statistical Learning, Springer. ISBN 0-387-95284-5.
- Vladimir Vapnik (1998). Statistical Learning Theory. Wiley-Interscience, ISBN 0-471-03003-1.
External links
- International Machine Learning Society
- There is a popular online course by Andrew NgAndrew NgAndrew Ng is an Associate Professor in the Department of Computer Science at Stanford University. His work is primarily in machine learning and robotics. He received his PhD from Carnegie Mellon University and finished his postdoctoral research in the University of California, Berkeley, where he...
, at ml-class.org. It uses GNU OctaveGNU OctaveGNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command-line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with MATLAB...
. The course is a free version of Stanford UniversityStanford UniversityThe Leland Stanford Junior University, commonly referred to as Stanford University or Stanford, is a private research university on an campus located near Palo Alto, California. It is situated in the northwestern Santa Clara Valley on the San Francisco Peninsula, approximately northwest of San...
's actual course, whose lectures are also available for free. - Machine Learning Video Lectures