Self-organizing list - AbsoluteAstronomy.com

A self-organizing list is a list that reorders its elements based on some self-organizing heuristic to improve average

Average

In mathematics, an average, or central tendency of a data set is a measure of the "middle" value of the data set. Average is one form of central tendency. Not all central tendencies should be considered definitions of average....

access time

Access time

Access time is the time delay or latency between a request to an electronic system, and the access being completed or the requested data returned....

.
The aim of Self organizing list is to improve efficiency of linear search by moving more frequently accessed items towards the head of the list.

The "Self Organizing List" is a poor man's Hash table

Hash table

In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...

By using a probabilistic strategy, it yields nearly constant time in the best case for insert/delete operations, although the worst case remains linear.

Four ways to self organized

Ordering— The list is ordered using certain criteria natural for the information under scrutiny.
Transpose— After the desired element is located, swap it with its predecessor unless it is at the head of the list.
Move to front— After the desired element is located, put it at the beginning of the list.
Count— Order the list by the number of times elements are being accessed.

Basic Implementation of a List

In order to efficiently use memory resources, lists are generally implemented as linked list

Linked list

In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...

s allowing dynamic memory allotment at runtime. The linked list consists of a sequence of nodes, each 'linked' to the node in front (for example, in C, the 'link' is in the form of a pointer to the succeeding node), with the last node storing a link to NULL. An array is generally considered to be unsuitable because of the requirement of knowing the array size beforehand (during declaration). Also, inserting elements into an array is an inefficient operation.

Linked List Traversals

As each node is 'linked' only to the node in front of it, the only way to access a particular node in the list is by accessing the node before it, this process being repeated. So, to reach the nth element in the list, we must start at the first node and move to the second, from the second move to the third, from the third to the fourth and so on till the desired element is reached (move through each node successively to reach the nth element). When compared to an array it is immediately obvious that the linked list is extremely inefficient in random access (in an array say a[], the nth element can be immediately accessed as a[n]). In big O notation

Big O notation

In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

, retrieving the nth element in a linked list is a O(n) operation whereas in an array, it is a O(1) operation.
For a linked list, even a binary search routine is inefficient because finding the middle element as the binary search algorithm requires is exceedingly costly (due to having to traverse through all the elements before it) and a linear traversal through the list is generally adopted for random access or searches. (Binary search on an array is O(log n)).

Concept

A self organizing list modifies the order in which the records are stored, based on their actual or expected access pattern. The basis of a self organizing list can be explained in terms of the 80-20 rule which states that 80% of the records in a list or data store are accessed 20% of the time whereas 20% of the records are accessed 80% of the time (note that this is not a strict rule but merely a characterization of typical behavior). The self organizing list aims at keeping these commonly accessed elements at the head for quicker access.

Analysis of Running Times for Access/ Search in a List

Consider a list (self organized or randomly arranged) implemented using a linked list structure containing n elements.
The COST of a search is measured by the number of comparisons that must be performed to find the target record (for now we assume that the target record exists in the list).

We consider the worst and average case running times of accessing a particular element by linear searching in the list as follows:

Average Case

In the average case, suppose that N search operations are performed on the list. Suppose that the number of times the i^th element was searched for is freq(i) (for example, if the third element was searched for 45 times out of N, freq(3) = 45).
then

Now, the average running time to search for (or retrieve) the i^th element is i. thus, the time spent in retrieving only the i^th element is equal to
* (the time required to search for the i^th element) = (freq(i) * i).
hence, the total time for N searches is given by:

The average time per search operation is obtained by dividing above expression by N.

but now consider, for the i^th element, freq(i)/N is the probability of accessing the i^th element (p(i)) and thus

Substituting,

If the access probability of each element is the same (i.e. p(1) = p(2) = p(3) = ... = p(n) = 1/n) then the ordering of the elements is irrelevant and the average time complexity is given by

and T(n) does not depend on the individual access probabilities of the elements in the list in this case.
however in the case of searches on lists with non uniform record access probabilities (i.e. those lists in which the probability of accessing one element is different from another), the average time complexity can be reduced drastically by proper positioning of the elements contained in the list.

This is done by pairing smaller i with larger access probabilities so as to reduce the overall average time complexity.
Consider for example the list shown below:

Given List: A(0.1), B(0.1), C(0.3), D(0.1), E(0.4)

Arrangement 1 (Random order) :
A(0.1), B(0.1), C(0.3), D(0.1), E(0.4)

Without rearranging, average search time required is:

Now suppose the nodes are rearranged so that those nodes with highest probability of access are placed closest to the front so that the rearranged list is now:

Arrangement 2 (elements with highest probability at the front) :

E(0.4), C(0.3), D(0.1), A(0.1), B(0.1)

Here, average search time is:

Thus the average time required for searching in an organized list is (in this case) around 40% less than the time required to search a randomly arranged list.
This is the concept of the self organized list in that the average speed of data retrieval is increased by rearranging the nodes according to access frequency.

Worst Case

In the worst case, the element to be located is at the very end of the list (for both self organized or randomly arranged lists) and thus n comparisons must be made to reach it. Therefore the worst case running time of a linear search on the list is O(n) independent of the type of list used.
Note that the above argument for the average search time is a probabilistic one. Keeping the commonly accessed elements at the head of the list simply reduces the probability of the wost case occurring but does not eliminate it completely. Even in a self organizing list, if a lowest access probability element (obviously located at the end of the list) is to be accessed, the entire list must be traversed completely to retrieve it. This is the worst case search.

Move to Front Method

Any node or element requested is moved to the front of the list.

Pros

This method is easily implemented and does not require any extra memory or storage(for counter variables say)
This method easily adapts to quickly changing access patterns. even if the priorities of the nodes change dynamically at runtime, the list will reorganize itself very quickly in response.

Cons

This method may prioritize infrequently accessed nodes: for example, if a uncommon node is accessed even once, it is moved to the head of the list and given maximum priority even if it is not going to be accessed frequently in the future. these 'over rewarded' nodes clog up the list and lead to slower access times for commonly accessed elements.
This method, though easily adaptable to changing access patterns may change too commonly. basically, this technique leads to very short memories of access patterns whereby even an optimal arrangement of the list can be disturbed immediately by accessing an infrequent node in the list.

Count Method

Each node counts the number of times it was searched for i.e. every node keeps a separate counter variable which is incremented every time it is called. the nodes are then rearranged according to decreasing count.

Pros

Reflects more realistically the actual access pattern

Cons

Must store and maintain a counter for each node, thereby increasing the amount of memory required.
Does not adapt quickly to rapid changes in the access patterns. for example: if the count of the head element is say A is 100 and for any node after it say B is 40. now, even if B becomes the new most commonly accessed element, it must still be accessed at least (100 - 40 = 60) times before it can become the head element.

Transpose Method

Pros

Easy to implement and requires little memory.
More likely to keep frequently accessed nodes at the front.

Cons

More cautious than move to front. i.e. it will take many accesses to move the element to the head of the list.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.