Nothing Special   »   [go: up one dir, main page]

Fusion Trees Report

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Implementation and Analysis of Fusion Trees

July 6, 2022
Gopesh Gaba (2020MCB1236) ,
Ishaan Chhabra (2020MCB1238) ,
Rohan Kumar (2020MCB1247)

Instructor:
Dr. Anil Shukla
Summary: In this paper we will discuss the theory behind fusion trees and also
Teaching Assistant: how to implement it, followed by an analysis of it’s time and space complexity
Simran Setia

1. Introduction
In 1990, Michael Fredman and Dan Wilfred introduced a new sub-logarithmic data structure for searching: The
Fusion Tree. The idea behind Fusion trees was mostly geared toward proving that it was possible to surpass
the Ω(logn) lower bound on BST operations by using word-level parallelism and without regard to wall-clock
runtime costs.
This fusion tree stores n w-bits statistically and performs predecessor/successor problems in O(logw n) time,
and requires O(n) space, where n is the number of elements stored. Two essential bit operations utilized by
fusion trees are parallel comparison and sketching.

Fusion Trees

Essentially the structure of Fusion Trees is a B-Tree with branching factor w1/c where w is the word size and c
is some constant greater than 1. This means that the height of the tree will be logw n.

Figure 1: Fusion Tree is essentially a B-Tree

The main problem arises when we try to make searching a node take O(1) time. As each node would have
O(w1/c ) keys and each key has w bits, we would be required to read atleast O(w1+1/c ) bits. But we can only
read w-bits in constant time, so achieving O(1) time seems impossible.
This problem can be solved, using k O(1) preprocessing, where k is the number of keys present in the node. We
have to distinguish the set of keys in one node with less than w-bits. We achieve this with SKETCHING and
PARALLEL COMPARISONS.

1
2. Sketching
We wish to reduce the size of the bits being compared, in order to obtain w-bits for comparison in total. As
there are w bits in a word, and there are w1/c keys at most in a node, there can only be at most w1/c − 1 bits
which matter when trying to compare all the keys. These bits are called IMPORTANT BITS.
Sketching refers to the extraction of these important bits from the specified word, i.e sketch(X) refers to ex-
tracting of the important bits from the word X. Let there be r important bits per key.Then sketch(X) is the
r-bit vector whose i-th bit = bi th bit of word X. We can see that sketch(X0 ) < sketch(X1 ) ..... < sketch(Xk−1 )

Figure 2: Sketch of Node

We tend towards APPROXIMATE SKETCHING instead of exact sketch because exact sketch cannot be com-
puted in O(1). Prefect sketch requires the distinguishing bits to be compressed which a computation expensive
operation. Approximate sketch doesn’t compress the distinguishing bits tightly, rather it adds a fixed pattern
of zeroes between every two distinguishing bits. This is done by multiplication with a predetermined constant.
The result will be of order O(rc−1 ) where r is count of distinguishing bits. Let Mask be the mask that will filter
out distinguishing bits and m be a predetermined constant. x’ = x AND Mask. After bitwise AND, x’ will have
only distinguishing bits. result = x’ * m. Right shift result until it is rc−1 bits long. Result will then be the
approximate sketch

3. Node Searching (Desketchifying)


Let us search query q. We can check where sketch(q) fits in sketch(X0 ) < sketch(X1 )..... < sketch(Xk−1 ). But
that is not necessarily where q fits in X0 < X1 ... < Xk−1 . So we will use parallel comparison for this.

Parallel Comparison

Parallel comparison is used to find position of a value within the set of keys in a node in constant time. A node
will have at most r + 1 keys. The sketches of all those keys can be concatenated to form a word. This word is
used to compare query value q directly with all the keys at once rather than comparing one by one.

1. Let X0 , X1 , ... ,Xr be the keys.


2. Then the sketch of a node is 1sketch(X0 )1sketch(X1 )... 1sketch(Xr ).
3. Convert sketch(a) to sketch(q)’ = 0sketch(q)0sketch(q)... 0sketch(q).
4. After subtracting value at step 3 from value at step 2, we will get a number that will be like : 0________0__
... ___0________1________1________1. Call it diff.
5. There are 0’s where sketch of Xi is less than sketch(q) and 1 when Xi is greater than sketch(q). The dashes
represent garbage bits where sketch values used to be. It will always be a series of 0’s followed by a series of
1’s since sketch operation maintains order of keys.The position where the series of 0 turns into 1 is the position

2
where sketch(q) should be placed. So sketch(Xi ) < sketch(q) < sketch(Xi+1 ), when bit of diff for sketch(Xi ) =
0, and for sketch(Xi+1 ) is 1

Figure 3: Parallel Comparison Demonstration

Note that parallel comparison gives index such that sketch of k is tightly bound with the sketches of keys, not
the keys itself.
Now let us find where q fits among the keys. We will use parallel comparison to find Xi such that sketch(Xi )
<= sketch(q) < sketch(Xi+1 ). The longest common prefix is the lowest common ancestor between q and either
Xi or Xi+1 , whichever is the longest or lowest. Let node y be where q fell off( the keys).
If bit y+1 = 1: e = y011111....
If bit y+1 = 0: e = y100000.... .
Then find sketch(e) among sketch(Xi )s. Finally, the predecessor and successor of sketch(e) among sketch(Xi )s
will be equal to the predecessor and successor of q among Xi s.

4. Time Complexity
The time complexity for searching an element or finding it’s predecessor and successor is O(logw N ), where N is
the number of elements in total and w is the word size being used to store the data .
For inserting an element, time complexity will be O(log2 w) and for deleting an element time complexity will be
O(log2 w) as well.
The space complexity of our constructed tree will be of O(N ).

3
5. Algorithms
Pseudo-algorithms can be written in LATEX with the algorithm and algorithmic packages.

Algorithm 1 Parallel Comparison( root rt, node p, int data)


1: sketch = SKETCHAP( rt, p, data)
2: sketch_long = sketch*(p->mask_q)
3: res = p->node_sketch - sketch_long
4: res = res AND p->mask_sketch
5: i=0
6: while ((1 « i) < res) do
7: i++
8: end while
9: i++
10: sketch_len = p->n*p->n*p->n + 1
11: return (p->n - (1/sketch_len))

Algorithm 2 Approximate Sketching


1: SKETCHAP(self, node, x)
2: xx = x & node.mask_b
3: res = xx * node.m
4: res = res & node.mask_bm
5: return res

Algorithm 3 Initiating Node


1: initiateNode(self, node)
2: if node.key_count! = 0 then
3: node.b_bits = self.getDiffBits(node.keys)
4: node.m_bits, node.m = self.getConst(node.b_bits);
5: node.mask_b = self.getMask(node.b_bits)
6: end if
7: initialize temp
8: for i in range(len(node.b_bits)) do
9: temp.append(node.b_bits[i] + node.m_bits[i])
10: end for
11: node.mask_bm = self.getMask(temp)
12: r3 = int(pow(node.key_count, 3))
13: node.node_sketch = 0
14: sketch_len = r3 + 1
15: node.mask_sketch = 0
16: node.mask_q = 0
17: for i in range(node.key_count) do
18: sketch = self.sketchApprox(node, node.keys[i])
19: temp = 1 « r3
20: temp |= sketch
21: node.node_sketch «= sketch_len
22: node.node_sketch |= temp
23: node.mask_q |= 1 « i * (sketch_len)
24: node.mask_sketch |= (1 « (sketch_len - 1)) « i * (sketch_len)
25: end for
26: return

4
Algorithm 4 Finding Successor
1: successor(self, k, node = None)
2: if node == NULL then
3: node = self.root
4: end if
5: if node.key_count == 0 then
6: if node.isLeaf then
7: return -1
8: else
9: return self.successor(k, node.children[0])
10: end if
11: end if
12: if node.keys[0] >= k then
13: if not node.isLeaf then
14: res = self.successor(k, node.children[0])
15: if res == 1 then
16: return node.keys[0]
17: else
18: return min(node.keys[0], res)
19: else
20: return node.keys[0]
21: end if
22: end if
23: end if
24: if node.keys[node.key_count - 1] < k then
25: if node.isLeaf then
26: return -1
27: else
28: return self.successor(k, node.children[node.key_count])
29: end if
30: end if
31: pos = self.parallelComp(node, k)
32: if pos >= node.key_count then
33: print(node.keys, pos)
34: dump = input()
35: end if
36: if pos == 0 then
37: pos += 1
38: end if
39: x = max(node.keys[pos - 1], node.keys[pos])
40: common_prefix = 0
41: i = self.w
42: while i >= 0 and (x & (1 « i)) == (k & (1 « i)) do
43: common_prefix |= x & (1 « i)
44: i -=1
45: end while
46: if i == -1 then
47: return x
48: end if
49: temp = common_prefix | (1 « i)
50: pos = self.parallelComp(node, temp)
51: if node.isLeaf then
52: return node.keys[pos]
53: else
54: res = self.successor(k, node.children[pos])
55: if res == -1 then 5
56: return node.keys[pos]
6. Conclusion
Fusion Trees are a modification of B-trees where each node contains some special information that helps speed
up the search. Fusion Trees use word-level parallelism to do searches faster by packing multiple numbers into
a single machine word and using individual operations to speed up processing. Sketching is used to reduce
the number of bits in the input numbers to a point where parallel processing with a machine word is possible.
All these operations help Fusion Trees to obtain an exceptional time complexity of O(logw N ) while searching.
Fusion tree are used extensively in database systems. Fusion tree and Van Emde boas tree are used together
such that when word size is large, fusion tree is used and when word size is smaller, Van Emde Boas tree is
used. This is because fusion trees are better at solving predecessor and successor problems when the universe
is large.

7. Bibliography and citations


While doing this project we referred to different researches and following are some sources which we referred to:
[1] Michael L. Fredman and Dan E. Willard. Surpassing the information Theoretic bound with Fusion Trees.
Journal of Computer and System Sciences, pages 47:424–436, 1993
[2] Introduction to Algorithms 3rd edition, by Cormen, Leiserson, Rivest, and Stein
[3] https://iq.opengenus.org/fusion-tree/
[4] https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-851-advanced-data-structures-spring-
2012/calendar-and-notes/MIT6_851S12_L12.pdf

You might also like