Softmax activation function
Encyclopedia
The softmax activation function is a neural transfer function
Transfer function
A transfer function is a mathematical representation, in terms of spatial or temporal frequency, of the relation between the input and output of a linear time-invariant system. With optical imaging devices, for example, it is the Fourier transform of the point spread function i.e...

. In neural network
Neural network
The term neural network was traditionally used to refer to a network or circuit of biological neurons. The modern usage of the term often refers to artificial neural networks, which are composed of artificial neurons or nodes...

s, transfer functions calculate a layer's output from its net input. It is a biologically plausible approximation to the maximum
Maxima and minima
In mathematics, the maximum and minimum of a function, known collectively as extrema , are the largest and smallest value that the function takes at a point either within a given neighborhood or on the function domain in its entirety .More generally, the...

 operation . It is used to simulate an invariance operation of complex cells in where it is defined as


where is a sigmoid function
Sigmoid function
Many natural processes, including those of complex system learning curves, exhibit a progression from small beginnings that accelerates and approaches a climax over time. When a detailed description is lacking, a sigmoid function is often used. A sigmoid curve is produced by a mathematical...

.

In neural network simulations, the term softmax activation function refers to a similar function defined by

where p is the value of an output node, q is the net input to an output node, and n is the number of output nodes. It ensures all of the output values p are between 0 and 1, and that their sum is 1. This is a generalization of the logistic function
Logistic function
A logistic function or logistic curve is a common sigmoid curve, given its name in 1844 or 1845 by Pierre François Verhulst who studied it in relation to population growth. It can model the "S-shaped" curve of growth of some population P...

 to multiple variables.

See Multinomial logit
Multinomial logit
In statistics, economics, and genetics, a multinomial logit model, also known as multinomial logistic regression, is a regression model which generalizes logistic regression by allowing more than two discrete outcomes...

 for a probability model which uses the softmax activation function.

Reinforcement learning

In the field of reinforcement learning
Reinforcement learning
Inspired by behaviorist psychology, reinforcement learning is an area of machine learning in computer science, concerned with how an agent ought to take actions in an environment so as to maximize some notion of cumulative reward...

, a softmax function can be used to convert values into action probabilities. The function commonly used is:

where the action value corresponds to the expected reward of following action a and is called a temperature parameter (in allusion to chemical kinetics
Chemical kinetics
Chemical kinetics, also known as reaction kinetics, is the study of rates of chemical processes. Chemical kinetics includes investigations of how different experimental conditions can influence the speed of a chemical reaction and yield information about the reaction's mechanism and transition...

). For high temperatures (), all actions have nearly the same probability and the lower the temperature, the more expected rewards affect the probability. For a low temperature (), the probability of the action with the highest expected reward tends to 1.

Smooth approximation of maximum

When parameterized by some constant, , the following formulation becomes a smooth, differentiable approximation of the maximum function:


has the following properties:
  1. as
  2. is the average of its inputs
  3. as


The gradient of softmax is given by:


which makes the softmax function useful for optimization techniques that use gradient descent.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK