Information Fuzzy Networks - AbsoluteAstronomy.com

Info Fuzzy Networks is a greedy

Greedy algorithm

A greedy algorithm is any algorithm that follows the problem solving heuristic of making the locally optimal choice at each stagewith the hope of finding the global optimum....

machine learning

Machine learning

Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

algorithm

Algorithm

In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

for supervised learning

Supervised learning

Supervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value...

.
The data structure

Data structure

In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

produced by the learning algorithm is also called Info Fuzzy Network.
IFN construction is quite similar to decision trees'

Decision tree learning

Decision tree learning, used in statistics, data mining and machine learning, uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target value. More descriptive names for such tree models are classification trees or regression trees...

construction.
However, IFN constructs a directed graph

Directed graph

A directed graph or digraph is a pair G= of:* a set V, whose elements are called vertices or nodes,...

and not a tree

Tree (data structure)

In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...

.
IFN also uses the conditional mutual information

Conditional mutual information

In probability theory, and in particular, information theory, the conditional mutual information is, in its most basic form, the expected value of the mutual information of two random variables given the value of a third.-Definition:...

metric in order to choose features during the construction stage while decision trees usually use other metrics like entropy or gini

Gini coefficient

The Gini coefficient is a measure of statistical dispersion developed by the Italian statistician and sociologist Corrado Gini and published in his 1912 paper "Variability and Mutability" ....

IFN and the knowledge discovery process's stages

Discretization of continuous features
Discretization of continuous features
In statistics and machine learning, discretization refers to the process of converting or partitioning continuous attributes, features or variables to discretized or nominal attributes/features/variables/intervals. This can be useful when creating probability mass functions – formally, in density...
Feature selection
Feature selection
In machine learning and statistics, feature selection, also known as variable selection, feature reduction, attribute selection or variable subset selection, is the technique of selecting a subset of relevant features for building robust learning models...
Creates a model for classification
Evaluation of extracted association rules and prioritizing them
Anomaly detection
Anomaly detection
Anomaly detection, also referred to as outlier detection refers to detecting patterns in a given data set that do not conform to an established normal behavior....

Attributes of IFN

The IFN model partially solves the fragmentation problem that occurs in decision trees (the deeper the node the less records it represent. Hence, the number of records might be to low for statistical significance
Statistical significance
In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. The phrase test of significance was coined by Ronald Fisher....

indication) since the entire set of records is used in every layer.
Every node inside the net is called an inner or hidden node.
In IFN every variable can appear in only one layer, and there cannot be more than one attribute in a layer. Not all attributes must be used.
The increase in conditional MI of the target variable after building the net equals to the sum of the increase in conditional MI in all layers.
The arcs from terminal nodes to the target variable nodes are weighted (terminal nodes are nodes directly connected to the target variable nodes). The weight is the conditional mutual information
Conditional mutual information
In probability theory, and in particular, information theory, the conditional mutual information is, in its most basic form, the expected value of the mutual information of two random variables given the value of a third.-Definition:...

due to the arc.
IFN was compared on few common datasets to the c4.5 decision tree
Decision tree
A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm. Decision trees are commonly used in operations research, specifically...

algorithm. The IFN model usually used less variables and had fewer nodes. The accuracy of the IFN was smallar than the one of the decision tree. The IFN model is usually more stable, which means that small changes in the training set will affect it less than in other models.

IFN construction algorithm

Input: a list of input variables that can be used, a list of data records (training set) and a minimal statistical significance used to decide whether to split a node or not (default 0.1%).

Create the root node and the layer of the target variable.
Loop until we have used up all the attributes or it cannot improve the conditional mutual information
Conditional mutual information
In probability theory, and in particular, information theory, the conditional mutual information is, in its most basic form, the expected value of the mutual information of two random variables given the value of a third.-Definition:...

any more with any statistical significance
Statistical significance
In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. The phrase test of significance was coined by Ronald Fisher....

.
1. Find the attribute with the maximal conditional mutual information
  Conditional mutual information
  In probability theory, and in particular, information theory, the conditional mutual information is, in its most basic form, the expected value of the mutual information of two random variables given the value of a third.-Definition:...
  
  .
2. Verify that the contribution of the attribute has statistical significance using the likelihood ratio test.
3. Split any node in the previous layer if the contribution of the current attribute has statistical significance. Otherwise, create a node from that node to one of the value nodes of the target variable, according to the majority rule
  Majority rule
  Majority rule is a decision rule that selects alternatives which have a majority, that is, more than half the votes. It is the binary decision rule used most often in influential decision-making bodies, including the legislatures of democratic nations...
  
  .
return the list of variables chosen to be used by the net and the net itself.

External links

Fuzzification and Reduction of Information-Theoretic Rule Sets in Data Mining and Computational Intelligence, A. Kandel, M. Last, and H. Bunke (Eds), Physica-Verlag, Studies in Fuzziness and Soft Computing, Vol. 68, pp. 63–93, 2001.
A Comparative Study Of Artificial Neural Networks And Info Fuzzy Networks On Their Use In Software Testing

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.