Rope (computer science)
Encyclopedia
In computer programming
a rope, or cord, is a data structure
for efficiently storing and manipulating a very long string
. For example, a text editing program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently.
Seen from another perspective, the binary tree of a rope can be seen as several levels of nodes. The bottom level contains all the nodes that have a short string. Higher levels have fewer and fewer nodes, until finally there is just one root node in the top level. We can build the rope by putting all the nodes with short strings in the bottom level, then randomly picking half those nodes in order to form parent nodes in the next level. Nodes with no parent (for example, the two nodes storing the strings "my_" and "me_i" in the diagram above) become the right child of the node located immediately to their left, thus forming a chain. In this way, we can build higher levels one level at a time. We stop when there is only one node remaining.
Time complexity: O(log N) where N is the length of the rope
To retrieve the i-th character, we begin a recursive
search from the root node:
// Note: Assumes 1-based indexing.
function search(RopeNode node, integer i)
if node.weight < i then
return search(node.right, i - node.weight)
else
if exists(node.left) then
return search(node.left, i)
else
return node.string[i]
endif
endif
end
For example, suppose we are looking for the character at i=10 in the following rope. We start at the root node, find that 22 is greater than 10 and there's a left child, so we go to the left child. Next we find that 9 is less than 10, so we subtract 9 from 10 (leaving i=1) and go to the right child. Then because 7 is greater than 1 and there's a left child, we go to the left child. There we find that 6 is greater than 1 and there's a left child, so we go to the left child again. Finally we see that 2 is greater than 1 but there is no left child, so we take the character at index 1 of the short string "na", returning "n" as the final answer.
There are two cases: in the first, the i-th character is the end of an array like the following picture; in the second, the character is in the middle of an array. We can reduce the second case to the first case as follows: first we split the node at the character into two nodes each with part of the array and make the second node as the right child of first node. We handle the first case as explained in the following example.
For example, we want to split the following rope into two parts. First we query the i-th character to locate the node v at the bottom level. Then we cut down the link between v and the right child of v, which we'll call v’. Then go up to the parent u and subtract the weight of v’ from the weight of u. Since the parent has the right child of u, which we'll call u’, modify u’ to link to v’ and increase the weight of u’ by the weight of its new left child v’. The former left child of u’ become the right child of v’. The result is shown in the second picture below. In this way, we continue up and reach the parent of u, which we'll call w. First subtract the weight of v’ from the weight of w. Then modify the right child of w, which we'll call w’, to link to u’ and the former child of w’ becomes the right child of u’. Then increase the weight of w’ by the weight of v’. Then go up and reach the parent of w, which we'll call x. Since w is already the right child of x, there is no need for further modification. Then go up and reach the parent of x, which we'll call y, and reduce the weight of x by the weight of w’.Clearly, the expected cost is O(log N).
This operation can be considered as the reversion of split. The time complexity is also O(log N).
This operation can be done by a split and a concat. First split the rope at the i-th character, the add a new node v with string(v) = S’ to the right child of the rightmost node of the first rope. Then update the weight of nodes in the path from the new node to root. Finally concatenate the two ropes. Because the split and concat both cost O(logN) time, the time complexity of this operation is also O(logN).
This operation can be done by two split and a concat. First, split the rope into three ropes divided by i-th and j-th character respectively,that is, first split S into S1 and S2 at i-th character, then split S1 into S3 and S4 at (j-i)-th character, then concatenate the S1 and S2. Because the split and concat both cost O(logN) time, the time complexity of this operation is also O(logN).
To report the string Ci, …, Ci+j-1, we first search for the node u that contains ci and weight(u) >=j, and then traverse T starting at node u. we can then output Ci, …, Ci+j-1 in O(j+logN) expected time by doing an in-order traversal of T starting at node u.
The main disadvantages are a little greater overall space usage (mainly to store the nodes that don't have strings) and the consequent increase in time that such extra storage requires.
Computer programming
Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...
a rope, or cord, is a data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
for efficiently storing and manipulating a very long string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....
. For example, a text editing program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently.
Description
A rope is a binary tree in which each node has a weight. Leaf nodes (as well as some single-child internal nodes) also contain a short string. The weight of a node is equal to the length of its string plus the sum of all the weights in its left subtree. Thus a node with two children divides the whole string into two parts: the left subtree stores the first part of the string, the right subtree stores the second part, and the weight is the length of the first part.Seen from another perspective, the binary tree of a rope can be seen as several levels of nodes. The bottom level contains all the nodes that have a short string. Higher levels have fewer and fewer nodes, until finally there is just one root node in the top level. We can build the rope by putting all the nodes with short strings in the bottom level, then randomly picking half those nodes in order to form parent nodes in the next level. Nodes with no parent (for example, the two nodes storing the strings "my_" and "me_i" in the diagram above) become the right child of the node located immediately to their left, thus forming a chain. In this way, we can build higher levels one level at a time. We stop when there is only one node remaining.
Search
Definition: Search(i): return the character at position iTime complexity: O(log N) where N is the length of the rope
To retrieve the i-th character, we begin a recursive
Recursion
Recursion is the process of repeating items in a self-similar way. For instance, when the surfaces of two mirrors are exactly parallel with each other the nested images that occur are a form of infinite recursion. The term has a variety of meanings specific to a variety of disciplines ranging from...
search from the root node:
// Note: Assumes 1-based indexing.
function search(RopeNode node, integer i)
if node.weight < i then
return search(node.right, i - node.weight)
else
if exists(node.left) then
return search(node.left, i)
else
return node.string[i]
endif
endif
end
For example, suppose we are looking for the character at i=10 in the following rope. We start at the root node, find that 22 is greater than 10 and there's a left child, so we go to the left child. Next we find that 9 is less than 10, so we subtract 9 from 10 (leaving i=1) and go to the right child. Then because 7 is greater than 1 and there's a left child, we go to the left child. There we find that 6 is greater than 1 and there's a left child, so we go to the left child again. Finally we see that 2 is greater than 1 but there is no left child, so we take the character at index 1 of the short string "na", returning "n" as the final answer.
Split
Definition: Split (i, S): split the string S into two new strings S1 and S2, S1 = C1,…Ci and S2 = Ci+1, …, Cm.There are two cases: in the first, the i-th character is the end of an array like the following picture; in the second, the character is in the middle of an array. We can reduce the second case to the first case as follows: first we split the node at the character into two nodes each with part of the array and make the second node as the right child of first node. We handle the first case as explained in the following example.
For example, we want to split the following rope into two parts. First we query the i-th character to locate the node v at the bottom level. Then we cut down the link between v and the right child of v, which we'll call v’. Then go up to the parent u and subtract the weight of v’ from the weight of u. Since the parent has the right child of u, which we'll call u’, modify u’ to link to v’ and increase the weight of u’ by the weight of its new left child v’. The former left child of u’ become the right child of v’. The result is shown in the second picture below. In this way, we continue up and reach the parent of u, which we'll call w. First subtract the weight of v’ from the weight of w. Then modify the right child of w, which we'll call w’, to link to u’ and the former child of w’ becomes the right child of u’. Then increase the weight of w’ by the weight of v’. Then go up and reach the parent of w, which we'll call x. Since w is already the right child of x, there is no need for further modification. Then go up and reach the parent of x, which we'll call y, and reduce the weight of x by the weight of w’.Clearly, the expected cost is O(log N).
Concat
Definition: Concat(S1, S2): concatenate two rope S1, S2 into one single rope.This operation can be considered as the reversion of split. The time complexity is also O(log N).
Insert
Definition: Insert(i, S’): insert the string S’ beginning at position i in the string s, to form a new string C1,….,Ci, S’, Ci+1,…, Cm.This operation can be done by a split and a concat. First split the rope at the i-th character, the add a new node v with string(v) = S’ to the right child of the rightmost node of the first rope. Then update the weight of nodes in the path from the new node to root. Finally concatenate the two ropes. Because the split and concat both cost O(logN) time, the time complexity of this operation is also O(logN).
Delete
Definition: Delete(i, j): delete the substring Ci, …, Ci+j-1, from s to form a new string C1, …, Ci-1, Ci+j, …, Cm.This operation can be done by two split and a concat. First, split the rope into three ropes divided by i-th and j-th character respectively,that is, first split S into S1 and S2 at i-th character, then split S1 into S3 and S4 at (j-i)-th character, then concatenate the S1 and S2. Because the split and concat both cost O(logN) time, the time complexity of this operation is also O(logN).
Report
Definition: Report(i, j): output the string Ci, …, Ci+j-1.To report the string Ci, …, Ci+j-1, we first search for the node u that contains ci and weight(u) >=j, and then traverse T starting at node u. we can then output Ci, …, Ci+j-1 in O(j+logN) expected time by doing an in-order traversal of T starting at node u.
Advantages and Disadvantages
Advantages:- Ropes enable much faster insertion and deletion of text than ordinary strings.
- Ropes don't need the extra O(n) memory that arrays need for copying operations, and ropes don't require a large contiguous memory space to store a string.
The main disadvantages are a little greater overall space usage (mainly to store the nodes that don't have strings) and the consequent increase in time that such extra storage requires.
Comparison to array-based strings
This table compares the algorithmic characteristics of string and rope implementations, not their "raw speed". Array-based strings have smaller overhead, so (for example) concatenation and split operations are faster on small datasets. However, when array-based strings are used for large sets of data, time complexity and memory usage for insertion and deletion of characters becomes unacceptably large. A rope data structure, on the other hand, has a stable performance regardless of data size. Moreover, the space complexity for ropes and arrays are both O(n). In summary, ropes are better suited when the data is large and frequently modified.>- align="middle" | >- align="middle" | >- align="middle" | >- align="middle" | >- align="middle" | >- align="middle" |
See also
- The Cedar programming environment, which used ropes "almost since its inception"
- The Model T enfiladeEnfilade (Xanadu)Enfilades are a class of tree data structures used in Project Xanadu designs of the 1970s and 1980s. Enfilades allow quick editing, versioning, retrieval and inter-comparison operations in a large, cross-linked hypertext database...
, a similar data structure from the early 1970s. - Gap bufferGap bufferA gap buffer In computer science is a dynamic array that allows efficient insertion and deletion operations clustered near the same location. Gap buffers are especially common in text editors, where most changes to the text occur at or near the current location of the cursor. The text is stored in...
, a data structure commonly used in text editors that allows efficient insertion and deletion operations clustered near the same location
External links
- SGI's implementation of ropes for C++
- libstdc++ support for ropes
- Ropes for Java
- Ropes for Ocaml
- ropes for Common LispCommon LispCommon Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 , . From the ANSI Common Lisp standard the Common Lisp HyperSpec has been derived for use with web browsers...