Nothing Special   »   [go: up one dir, main page]

Matrix and Graph

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 44

Matrix and Graph

• Matrix
• Binary Matrix
• Sparse Matrix
• Operations for Vectors/Matrices
• Graph and Adjacent Matrix
• Adjacent List
Matrix and Graph

• Matrix is a 2-dimensional structure


• Used in wide areas from physical simulations to customer
management

• Graphs are also used in many areas, to represent the


relations and flows between data

• Some data structures have been considered to handle matrix


and graph; update, preserve, search, and operate
2-Dimensional Structure of Matrix

• An n×m matrix has n×m numbers

  can be stored in an array of size n×m


  [i,j] element corresponds to the i*m+j th cell of the array
  
• A naïve design is done, but there are something more
2-Diemnsaional Array

• There is a way to make 2-dimensional array, instead of


usual 1-dimensional array

• Prepare an array of pointers of size n


• Prepare n arrays of size m, and write the place of the first
cell of i-th array to the i-th cell of the pointer array
• [i,j] element of matrix a is accessed by a[i][j] (in C)
  

Simple structure

O(nm) memory space


Allocate a 2-Dimensional Array
int *MATRIX_alloc ( int n, int m ){
int i, **a, flag =0;
a = malloc ( sizeof(int *)*n );
if ( a=NULL ) return (NULL);
for ( i=0 ; i<n ; i++ ){
a[i] = malloc (sizeof(int)*m);
if ( a[i] = NULL ) flag = 1;
}
if ( flag == 1 ) return (NULL); else return (a);
}

int *MATRIX_free ( int **a ){


int i;
for ( i=0 ; i<n ; i++ ) free ( a[i] );
}
Binary Matrix

• A binary matrix is a matrix all whose cells are either 0 or 1


  + each cell is either ○ or ×
  + adjacency matrix of a graph, shown later

• Space consuming if use one integer for one 01 value (1 bit)


  motivated to compress the matrix

01010
10001
11110
11000
Representation by Bits
• A row composed of 01 values can be considered as a big integer

 by chopping into some integers of 32 bits (or 64 bits), the


integer becomes tractable

 └m/32┘ integers are sufficient to store a row


   (space efficiency also increases, and also cache efficiency)

• [i,j] element can be accessed by looking at the (j%32)th bit of


the j/32 th integer in the i-th row
Handling Bit Access
• [i,j] element can be accessed by looking at the (j%32)th bit of
the j/32 th integer in the i-th row
 … writing a code is bothering

• Prepare an array
BIT_MASK[]= {1,2,4,8,16,…}
BIT_MASK_[]= {0xfffffffe, 0xfffffffd, 0xfffffffb, …}

+ read value: a[i][j/32] & BITMASK[j%32]


+ set to 1 : a[i][j/32] = a[i][j/32] | BITMASK[j%32]
+ set to 0 : a[i][j/32] = a[i][j/32] & BITMASK_[j%32]
Sparse Matrix

• That’s all, for structures for simple matrices


• Space efficiency is in some sense optimal

• But, in application, it is often not sufficient/efficient


  for example, if matrix is sparse, many parts are redundant

• Sparse matrix has the same value in many cells (usually 0)

• Sparse matrix should be stored by memorizing the places with


non-zero values
Storing Sparse Matrix

• Let’s begin from binary matrix, for simplicity


  almost cells are 0, and few 1’s

• A simple idea is to make a list of the places of the cells being 1


• That is, memorize (x1,y1),(x2,y2),(x3,y3),… , store the row ID
and column ID of the cells being 1

• The memory requirement is “twice the number of 1’s”


this is very efficient if there are few 1’s (sparse)

• But, bad accessibility; to read a cell, we have to scan all


(binary tree / hash can be used)
Store Row-wise

• Let’s have a structure to improve the accessibility

• Classify the places of 1’s according to their row ID


  prepare n arrays, and store the column ID of 1’s in i-th
row, in the ith array

• We need to have n pointers to n arrays, but we don’t have to


store the row ID’s, thus memory efficiency increases

• The memory requirement is “# 1’s + #rows×2”


(can be “# 1’s + #rows”)
• Accessibility is good; sorting ID’s in a row array, binary search
works (linear scan is enough, if few column ID’s)
Structure in Each Row

• In sparse cases, the efficiency is increased,


However, the update concerned with insertion/deletions is not
efficient

• They are the same, in the situation of stacks and queues

• So, according to the purpose, we use lists bucket/hash/binary


tree for structures in a row
  (having n arrays is equivalent to having buckets)
Real World Data

• The characteristics of sparse matrices in practice are;

+ Matrix representing mesh network (structural calculation)


few meshes are adjacent to one, in geometrical sense, thus not
so many non-zeros per row

 array is sufficient for structures of rows

+ Road network data (adjacency of cross points + distance)


  almost the same, but update comes sometimes
(would be sufficient if (re-)allocate bit larger memory)
Real World Data (2)
Ex) A matrix representing, row  text, column  word, a cell is
one if the word is included in the text, is sparse, usually
  (POS data, Web links, Web surfing, etc.)

+ on average, #1’s in a row/column is constant, but some have so


many   (texts having many words, words included in many
texts)
+ distribution of 1’s is that so called power (zip) law, scale free;
#of items of size D is proportional to 1 / ΔD
can be often seen in real world data (≠ geometric distribution)

+ Such data needs algorithms designed so that the dense part will
not affect badly; will be the bottle neck of the computation
Non-binary Sparse Matrix

• Usual matrix are of course non-binary, it is not sufficient to


remember the places having non-zero value
 remember (place, value)

• In the case of using array,


(place, value), (place, value), (place, value),…, or
place1, plcae2,…, value1, value2,…

• In the case of lists of binary tree, assign (place, value) to each


cell/node
or, simple prepare two of them
Exercise

• Make data representing the following matrix in a sparse way

0,0,1,4,0
0,1,0,0,5
2,0,0,0,0
1,2,5,0,2
0,0,0,0,0
Column: Memory Saving for Matrix

• Buckets, or a row of a sparse matrix needs two data


(pointer to the first cell, and the size kii)
• We decrease these from two to one

• First, prepare an array of size equal to # non-zero cells. Then,


+ 0th row uses the cells of the array ranging from 0 to k00-1
+ 1stst uses from k00 to k00+k11-1 …
+ i-th row uses from k00+…+ki-1 i-1
to k00+…+kii-1,
and we remember only the start positions of the rows

• The size of i-th row can be obtained by


(start position of i+1) - (start position of i)
Matrix Operation

• Basic matrix operations are addition and multiplication


(inner product of vectors is a special case)
Further, AND and OR for binary matrix

• Algorithms for the operations are trivial if the matrices are in


the form of 2-dimensional array

However, not clear if they are in sparse forms

• Further, there are several structures that have advances for


matrix operations
Addition of Matrix
• For the addition, it is sufficient to have algorithms for additions
of each row
(so, operations of vectors are sufficient)

• First, we see the case of inner product of sparse vectors


Inner Product
• For computing inner product of two sparse vectors, the difficulty
is that we have to find the cell corresponding to each

• Sort the cells in each vector according to their column ID


• Scan two vectors simultaneously, from smaller indices
 “ simultaneously” means that iteratively pick up the smallest
column ID among the two vectors

• When we find a column ID at which both vector have non-zero


values, accumulate the product of the cells

1 5 5 1 7 3
1 1 3 3 5 4
A Code for Sparse Inner Product
int SVECTOR_innerpro (int *va, int ta, int *vb, int tb){
int ia=0, ib=0, c=0;
while ( ia<ta && ib<tb){
if (va[ia*2] < vb[ib*2] ) ia++;
else if (va[ia*2] > vb[ib*2] ) ib++;
else {
c = c + va[ia*2+1]*vb[ib*2+1];
ia++; ib++;
}
}
return ( c );
}

1 5 5 1 7 3
2 1 3 3 5 4
Addition of Two Vectors
• The addition can be done in a similar way

• Sort the cells in each vector according to their column ID


• Scan two vectors simultaneously, from smaller indices

• The positions of non-zero values in the resulted vectors are those


having non-zero values in one of two vectors, thus can be easily
identified by the scan

1 5 5 1 7 3
2 1 3 3 5 4
A Code for Addition
int SVECTOR_add (int *vc, int *va, int ta, int *vb, int tb){
int ia=0, ib=0, ic=0, c, cc;
while ( ia<ta || ib<tb){
if (ia == ta ){ c = vb[ib*2+1]; cc = vb[ib*2]; ib++; }
else if ( ib == tb ){c = va[ia*2+1]; cc = va[ia*2]; ia++; }
else if (va[ia*2] > vb[ib*2] ) { c = vb[ib*2+1]; cc = vb[ib*2]; ib++; }
else if (va[ia*2] < vb[ib*2] ) { c = va[ia*2+1]; cc = va[ia*2]; ia++; }
else {
c = va[ia*2+1] + vb[ib*2+1]; cc = vb[ib*2];
ia++; ib++;
}
vc[ic*2] = cc; vc[ic*2+1] = c; ic++;
}
return ( ic );
}

1 5 5 1 7 3
2 1 3 3 5 4
Column: Endmarks do a Good Job!

• Compared to inner product, code for addition is relatively long


 we have exceptions at the end of the array

• So, we are motivated to simplify the code by using “endmark”


(endmark is a symbol that represent the end of the array, or
something else representing the end)

• 0, -1 or a very large value is used as an endmark


• We prepare an additional cell next to the end of each array, and
put an endmark at the cell
Column: Endmarks do a Good Job! (2)

int SVECTOR_innerpro (int *vc, *vc, int *va,


*va, int
int ta,
ta, int *vb,
*vb, int
int tb){
int
int ia=0,
ia=0, ib=0,
ib=0, ic=0,
ic=0, c,
c, cc;
cc;
while
while (( va[ia*2]
va[ia*2] !=
!= ENDMARK && vb[ib*2] != ENDMARK){ ENDMARK){
if (va[ia*2]
(va[ia*2] > vb[ib*2] ) { c = vb[ib*2+1]; cc = vb[ib*2]; ib++; }
else if (va[ia*2] < vb[ib*2]
vb[ib*2] )) {{ c = va[ia*2+1]; cc = va[ia*2]; ia++; }
else {{
c = va[ia*2+1] + vb[ib*2+1]; cc = vb[ib*2];
ia++; ib++;
}
vc[ic*2] = cc; vc[ic*2+1] = c; ic++;
}}
vc[ic*2]
vc[ic*2] == ENDMARK;
ENDMARK;
return
return (( ic );
);
} 1 5 5 1 7 3 ■
2 1 3 3 5 4 ■
Matrix Multiplication
• For sparse matrix multiplication, compute the inner products of
all the pairs of a row and a column

• However, a sparse matrix has row representations but not


column representations, getting column vectors is hard

• A simple solution is to use transposing algorithm that is


explained in the section of bucket; we will have column
representation

• On the other hand, some data structures are designed to be


enabled to trace also columns
Four-Direction List
• Lists are good at storing sparse vectors, for tracing
• However, collection of lists isn’t good at tracing column vectors,
because the cells are not connected vertically
• …so, let’s have a list connected in both row direction and
column direction

• Each cell has four arms, that point the neighboring cells in
directions of (←, →, ↑, ↓)

2
7
Pointing the Neighbors
• Links to four directions seems to form a mesh network, but not
• …since, the links can cross

• In the other words, this structure can be seen as a superimpose of


two kinds of lists; horizontal direction and vertical direction,
and the identical cells are unified into one

4 4

4 7 2

4
Having Lists of 2-Directions
• If we have lists of row vectors and column vectors both, we can
have the same accessibility, but insertions/deletions are not same

• For example, when we want to delete a cell in a row vector, we


would take long time to find the corresponding cell in column lists
In four-direction lists, they are already unified

4 7
Graph Structure

• A graph is a structure composed of a set of vertices and a set of


edges (an edge is a pair of vertices)

• Formed by sets, so the information such as positions, shapes, and


crossing edges do not matter, when it is drawn as a picture
 
(a graph with shape/position information
is called “graph visualization” or
“embedded graph”)

• When edges have directions (from one


vertex to another), it is called directed

very popular structure


Examples of Graph Data
• Adjacency relation
Hierarchy in an organization
Similarity relation

• Web network, human network, SNS friend network,…


Graph Terminology
• Edge e is said to be incident to u, v, and vice versa, if e = (u,v)
also u and v are said to be adjacent

• The #edges incident to v is the degree of v

• A graph having edges for any two vertices is a complete graph

• When there are two or more edges connecting two vertices, the
edges are called multiple edges

• If there is a partition of vertices so that any edge connects a vertex


in a group and one in the other, the graph is called bipartite graph
Storing a Graph
• n vertices can be seen as numbers 0,…,n-1
• Then, an edge is a pair of numbers

 can be stored by writing the pairs in array, lists, etc.

• Further, we need something for the accessibility

for example, we often visit a vertex, and go to the


neighboring vertex, and so we need to scan
all edges incident to the vertex
Using Matrix
• The set of edges can be represented by a matrix as follows

① j-th row/column corresponds to vertex j, and ij-cell is 1 if there


is edge (i, j) (called adjacency matrix)
  + efficient for dense graph having many edges
+ multiplicity of edges can be represented by the value of a cell

② j-th row corresponds to vertex j, and each column corresponds to


an edge; when edge e is incident to vertex i, ij cell is 1 (called
incidence matrix)
+ multiple edges represented easily

• Sparse matrix representation has advantage for


incidence matrix and sparse graph
In Practice
• 2-dimensional array is sufficient when the matrix size is small
the cost is small, redundancy is small

• Sparse matrix such as 100 by 100 with 10 non-zero elements in a


row, sparse representation will be efficient
(approximately, when density is less than 10%)

+ When we often want to scan non-zero elements, such as tracing


all vertices adjacent to a vertex, sparse representation is useful

+ If we want to check whether there is an edge between two


specified vertices, 2-dimensional array has advantage
Incidence Matrix
• An incidence matrix represents the incidence relation between
vertices and edges

• Put indices from 0,…,n-1 to vertices, and 0,…,m-1 to edges


+ store edges incident to a vertex to the corresponding row
= storing vertices incident to an edge in
0: 0,1
the corresponding column
1: 1,5
2: 0,3
0 0 0: 1,3 0: 0,2 3: 1,4
1 1: 0,2,4,5 1: 0,1,3,4 4: 1,2
5 2
1 2: 1,3,4,5 2: 4,6,7,8 5: 4,5
5
6 3 4 +
3: 0,2 3: 2,7 6: 2,5
4 8 2 4: 1,2,5 4: 3,5,8 7: 2,3
3 7 5: 1,2,4 5: 1,5,6 8: 2,4
Advantage of Incidence Matrix
• In the case of incidence matrix, each edge has ID
 so, easy to handle the attached information to each edge
 just allocate an array of size m, and it is sufficient
• In the case of adjacency matrix, edge doesn’t have ID, thus not
easy to manage correspondence of edge and its data

0: 0,1
• Multiple edges are also easy to handle
1: 1,5
2: 0,3
0 0
0: 1,3 0: 0,2 3: 1,4
1 1: 0,2,4,5 1: 0,1,3,4 4: 1,2
5 2
1 2: 1,3,4,5 2: 4,6,7,8 5: 4,5
5
6 3 4 +
3: 0,2 3: 2,7 6: 2,5
4 8 2 4: 1,2,5 4: 3,5,8 7: 2,3
7 5: 1,2,4 5: 1,5,6
3 8: 2,4
Allocate Memory for Cells
• Incidence matrix can be realized by cells of lists having four links
like sparse matrix
(two for vertices of the edges, and two for the edges in the vertex)
 disadvantages of arrays are eliminated
• Also can be of two array lists
• or, prepare an array and edge i corresponds to
0: 0,1
cells 2i and 2i+1, to represent four links
1: 1,5
2: 0,3
0 0 0: 1,3 0: 0,2 3: 1,4
1 1: 0,2,4,5 1: 0,1,3,4 4: 1,2
5 2
1 2: 1,3,4,5 2: 4,6,7,8 5: 4,5
5
6 3 4 +
3: 0,2 3: 2,7 6: 2,5
4 8 2 4: 1,2,5 4: 3,5,8 7: 2,3
3 7 5: 1,2,4 5: 1,5,6 8: 2,4
Exercise
• Make an adjacency matrix of the following graph, and that in
A sparse incidence matrix
6 0
5 1

4 2
3
Bipartite Graph
• A bipartite graph is often seen as a representation of a (binary)
(sparse) matrix

 associate nodes of one group to rows, and the others to


columns
connect by edges between vertices corresponding a cell with
non-zero value
0 4
0: 4,6
• A representation of different style
1 5 1: 4,5
2: 5,6
3: 5,6
2 6

3
Column: Store Huge Graph
• A graph needs two pointer (or integer) per edge
weight, and etc. need more

• 64 bits are required in 32 bit CPU

• However, Web graphs have billion of vertices, and 20 billions of


edges
 160GB is necessary in this way

• This is too much. Can we reduce the storage size?


Column: Store Huge Graph (2)
① Only few edges have large degrees
Vertices are mainly adjacent to these few vertices

 Put indices so that large degree vertices have small indices,


and represent small indices by small number of bits, and large
indices by many bits

Ex.)
• If the bit sequence representing a number begins with “0”, the
following 7 bits represent [0-127]
• If “10”, the following 14 bits represent 128+[0-16383]
• If “11”, the following 30 bits represent 16384+128,…
Column: Store Huge Graph (3)
② Sort the sites in dictionary order of their URLs
 links are usually to near, thus difference of ID’s becomes small

• They can be recorded in the same way, to reduce the space

• Using these, one edge needs just 10 bits


Further, we can reduce it to 5 bits

 The storage will be 20GB, thus can fit recent computers


Summary

• Data structures for matrix

• Structures for sparse matrix, and four directed lists

• Structures for graphs:


adjacency matrix and incidence matrix
adjacency list

You might also like