M-tree - AbsoluteAstronomy.com

M-trees are tree data structures that are similar to R-tree

R-tree

R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984 and has found significant use in both research and real-world applications...

s and B-tree

B-tree

In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children...

s. It is constructed using a metric

Metric (mathematics)

In mathematics, a metric or distance function is a function which defines a distance between elements of a set. A set with a metric is called a metric space. A metric induces a topology on a set but not all topologies can be generated by a metric...

and relies on the triangle inequality

Triangle inequality

In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side ....

for efficient range and k-NN queries.
While M-trees can perform well in many conditions, the tree can also have large overlap and there is no clear strategy on how to best avoid overlap. In addition, it can only be used for distance functions that satisfy the triangle inequality, while many advanced dissimilarity functions used in information retrieval

Information retrieval

Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

do not satisfy this.

Overview

As in any Tree-based data structure, the M-Tree is composed of Nodes and Leaves. In each node there is a data object that identifies it uniquely and a pointer to a sub-tree where its children reside. Every leaf has several data objects. For each node there is a radius r that defines a Ball in the desired metric space. Thus, every node

and leaf

residing in a particular node

is at most distance

from

, and every node n and leaf l with node parent N keep the distance from it.

Components

An M-Tree has these components and sub-components:

Root.
1. Routing object O.
2. Radius O_r.
3. A set of Nodes or (exclusive) Leafs.
Node.
1. Routing object O_r.
2. Radius O_r.
3. Distance from this node to its parent O_p.
4. A set of Nodes or (exclusive) Leafs.
Leaf.
1. Routing object O.
2. Radius O_r.
3. Distance from this leaf to its parent O_p.
4. Objects.
Objects.
1. Feature value (usually a d-dimensional vector).

Insert

The main idea is first to find a leaf node

where the new object

belongs. If

is not full then just attach it to

. If

is full then invoke a method to split

. The algorithm is as follows:
Input: Node

of M-Tree

, Entry

Output: A new instance of

containing all entries in original

plus

← Entries of node

is not a leaf then
{
/*Look for entries that the new object fits into*/
let

be entries such that

is not empty then
{
/*If there are one or more entry, then look for an entry such that is closer to the new object*/
let be

such that

is minimum
}
else
{
/*If there are no such entry, then look for an entry with minimal distance from its edge to the new object*/
let be

such that

is minimum
/*Upgrade the new radii of the entry*/

}
/*Continue inserting in the next level*/
return insert(

);
else
{
/*If the node has capacity then just insert the new object*/
if

is not full then
{ store(

) }
/*The node is at full capacity, then it is needed to do a new split in this level*/
else
{ split(

) }
}

Split

If the split method arrives to the root of the tree, then it choose two routing objects from

, and creates two new nodes containing all the objects in original

, and store them into the new root. If split methods arrives to a node

that is not the root of the tree, the method choose two new routing objects from

, re-arrange every routing object in

in two new nodes

and

, and store this new nodes in the parent node

of original

. The split must be repeated if

has not enough capacity to store

. The algorithm is as follow:
Input: Node

of M-Tree

, Entry

Output: A new instance of

containing a new partition.

/*The new routing objects are now all those in the node plus the new routing object*/
let be

entries of

is not the root then
{
/*Get the parent node and the parent routing object*/
let

be the parent routing object of

let

be the parent node of

}
/*This node will contain part of the objects of the node to be split*/
Create a new node

/*Promote two routing objects from the node to be split, to be new routing objects*/
Promote(

)
/*Choose which objects from the node being split will act as new routing objects*/
Partition(

)
/*Store entries in each new routing object*/
Store

's entries in

and

's entries in

is the current root then
{
/*Create a new node and set it as new root and store the new routing objects*/
Create a new root node

Store

and

}
else
{
/*Now use the parent rouing object to store one of the new objects*/
Replace entry

with entry

is no full then
{
/*The second routinb object is stored in the parent only if it has free capacity*/
Store

}
else
{
/*If there is no free capacity then split the level up*/
split(

)
}
}

Overview

Components

Insert

Split

See also