Open AccessArticle

A Preconditioned Variant of the Refined Arnoldi Method for Computing PageRank Eigenvectors

Zhao-Li Shen

^1,2,*,†,

Hao Yang

^1,†,

Bruno Carpentieri

^3,†,

Xian-Ming Gu

^4,*

and

Chun Wen

^5,*

College of Science, Sichuan Agricultural University, Ya’an 625000, China

Johann Bernoulli Institute for Mathematics and Computer Science, Faculty of Science and Engineering, University of Groningen, 9700 AK Groningen, The Netherlands

Faculty of Computer Science, Free University of Bozen-Bolzano, 39100 Bolzano, Italy

⁴

School of Economic Mathematics, Southwestern University of Finance and Economics, Chengdu 611130, China

⁵

School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Symmetry 2021, 13(8), 1327; https://doi.org/10.3390/sym13081327

Submission received: 30 June 2021 / Revised: 19 July 2021 / Accepted: 19 July 2021 / Published: 23 July 2021

(This article belongs to the Special Issue PDE, Optimization Modeling and Symmetry in Multi-Dimensional Data and Low-Level Vision Tasks)

Download Versions Notes

Abstract

The PageRank model computes the stationary distribution of a Markov random walk on the linking structure of a network, and it uses the values within to represent the importance or centrality of each node. This model is first proposed by Google for ranking web pages, then it is widely applied as a centrality measure for networks arising in various fields such as in chemistry, bioinformatics, neuroscience and social networks. For example, it can measure the node centralities of the gene-gene annotation network to evaluate the relevance of each gene with a certain disease. The networks in some fields including bioinformatics are undirected, thus the corresponding adjacency matrices are symmetry. Mathematically, the PageRank model can be stated as finding the unit positive eigenvector corresponding to the largest eigenvalue of a transition matrix built upon the linking structure. With rapid development of science and technology, the networks in real applications become larger and larger, thus the PageRank model always desires numerical algorithms with reduced algorithmic or memory complexity. In this paper, we propose a novel preconditioning approach for solving the PageRank model. This approach transforms the original PageRank eigen-problem into a new one that is more amenable to solve. We then present a preconditioned version of the refined Arnoldi method for solving this model. We demonstrate theoretically that the preconditioned Arnoldi method has higher execution efficiency and parallelism than the refined Arnoldi method. In plenty of numerical experiments, this preconditioned method exhibits noticeably faster convergence speed over its standard counterpart, especially for difficult cases with large damping factors. Besides, this superiority maintains when this technique is applied to other variants of the refined Arnoldi method. Overall, the proposed technique can give the PageRank model a faster solving process, and this will possibly improve the efficiency of researches, engineering projects and services where this model is applied.

Keywords:

PageRank; Arnoldi; preconditioning; Krylov subspace methods

1. Introduction

With the rapid development of the Internet, web search engines become very popular for information retrieval. Because a web search engine can usually find an immense set of Web pages matching the search query, it is necessary to rank higher the most important pages to make this tool practical. Fur this purpose, the PageRank model was developed by the Google team to rank the importance of Web pages based on the frequency of page visits recorded by a random user who keeps browsing the World Wide Web with an equal probability of choosing the hyperlinks on each page. Mathematically speaking, it requires the computation of the stationary distribution of this Markov random walk on the linking structure of pages, the values within the distribution represent the frequency of visits to each Web page. The linking structure of the Web is represented by a large directed graph (called the Web link graph) and by its adjacency matrix

G \in N^{n \times n}

(here n is the number of Web pages) such that

G (i, j) \neq 0

(being 1) only when page j has a hyperlink pointing to page i. The transition probability matrix

P \in R^{n \times n}

of this random walk process is defined as

P (i, j) = \{\begin{matrix} \frac{1}{\sum_{k = 1}^{n} G (k, j)}, & if G (i, j) = 1, \\ 0, & otherwise . \end{matrix}

(1)

However, to avoid that the process stagnates if a dangling page without hyperlinks is visited, in the model P is usually modified as

\tilde{P} = P + v d^{T},

where

v \in R^{n \times 1}

is a probability distribution vector,

d \in N^{n \times 1}

is a binary vector, and

d (i) = 1

(1 \leq i \leq n)

if page i has no hyperlinks. According to the Perron-Frobenius theorem, the unique existence of the stationary distribution vector is guaranteed when

\tilde{P}

is an irreducible stochastic matrix. To enforce this assumption,

\tilde{P}

is further modified into the matrix

A = α \tilde{P} + (1 - α) v e^{T},

(2)

where

α \in (0, 1)

is called the damping factor and

e = {[1, 1, \dots, 1]}^{T}

. Matrix A is often called the Google matrix [1]. Finally, the PageRank model can be stated mathematically as finding the unit positive eigenvector corresponding to the eigenvalue 1 of A, that is the solution of the eigen-problem

A x = {x, ∥ x ∥}_{1} = 1, x > 0 .

(3)

Because of the assumption that A is an irreducible stochastic matrix, given by Equation (2), it follows that Equation (3) is uniquely solvable. The unique stationary distribution vector x is called the PageRank vector. Note that, the PageRank model now is often used as a network centrality measure to identify the most important nodes within large networks arising in several applications such as in chemistry, bioinformatics, neuro-science, bibliometrics, web search engines, social networks, etc. [2]. In some fields such as the bioinformatics, the related networks are usually undirected, thus the corresponding adjacency matrices are symmetry. This symmetry can be used to build efficient storage format and efficient implementation for matrix-vector multiplications, then the difficulty of solving the PageRank model (also called the PageRank problem) can be reduced.

With rapid development of science and technology, the dimension of PageRank problems from various application fields has grown hugely in the last decades and still keeps growing. Accordingly, iterative methods become the only viable option to solve Equation (3) numerically. Stationary iterative methods such as the Power, Jacobi and Gauss-Seidel methods are effective when the damping parameter

α

is not too close to 1, e.g., for search engine applications Google initially used

α = 0.85

. However, larger values of

α

(not too close to 1) such as

0.99

may sometimes give a better ranking result [3], and they tend to converge significantly more slowly when

α

is large. These computational issues make the PageRank problem always require new algorithmic solutions that are more time-saving and memory-saving.

The development of more efficient algorithms for solving PageRank problems has been ongoing for the past decade or so. One research direction is to accelerate the convergence speed of stationary iterative methods, related works include adaptive [4] and extrapolation methods [5,6,7], multigrid solvers [8,9], and inner-outer iterations [10,11,12,13,14]. Meanwhile a significant amount of work has been devoted in particular to the analysis of Krylov subspace methods for computing PageRank. Golub and Greif proposed in [3] a new variant of the Arnoldi algorithm that is particularly suited for solving problems with a large range of damping values, and can outperform many conventional stationary iterative solvers when

α

is closer to 1. Yin et al. have used weighted inner-products instead of the standard inner-products in the Arnoldi process reporting faster convergence due to a more favourable distribution of the harmonic Ritz values [15]. The Power-Arnoldi method [16], its weighted version [17] and the Arnoldi-Inout method [18] are hybrid variants of the Arnoldi process that combine periodically stationary and Krylov iterations, and can improve other established PageRank solvers on many examples. All of these techniques could accelerate the Arnoldi method from [3], but there is likely to be a lot of room for further improvement. One obvious thing is that, preconditioning, which is widely used to accelerate Krylov subspace methods for solving linear systems, has not been considered yet for accelerating the Arnoldi-type methods when solving the PageRank eigen-problem. This is an area worth investigating because successful preconditioning usually results in significant acceleration of the solving process, and can be well combined with other acceleration techniques.

In this work we propose a new theoretical formulation of the PageRank eigenvector problem (3) that is characterized by a better eigenvalue separation in the spectrum between the dominant and the second dominant eigenvalues of the coefficient matrix, so that the Arnoldi process may require significantly less iterations to converge, and consequently less BLAS-1 (inner-products and SAXPY) operations in the Gram-Schmidt orthogonalization. Our strategy to transform the original problem into a new one that is more amenable to the iterative solution can be seen as a form of preconditioning. The optimal parameter setting to choose in this preconditioning technique is also discussed. Our experiments confirm the theoretical findings and demonstrate that the Arnoldi-type methods applied to the preconditioned eigen-problem can converge much faster over their standard counterparts. Thus it has potential to solve large-scale PageRank computations more effectively on both sequential and parallel machines. Accordingly, we can expect that the proposed technique can accelerate accomplishing projects from various fields that use the PageRank model. For example, for the GeneRank problem [19] and the ProteinRank [20] problem that use the PageRank model on the gene-gene and protein-protein networks, users can find the genes and proteins that are pathogenic with high probability faster.

The paper is organized as follows. In Section 2, we outline the refined Arnoldi method proposed in [3] that is at the basis of our development, along with its main convergence results. In Section 3, we present a preconditioned variant of the refined Arnoldi method for computing PageRank eigenvectors. Numerical experiments are reported in Section 4 to support our theoretical findings. Finally, some conclusions from this study are presented in Section 5. Note that, MATLAB notations are used throughout this article.

2. The Refined Arnoldi Method for PageRank

The Arnoldi process proposed in 1951 is an algorithm based on the modified Gram-Schmidt orthogonalization that, after m steps, computes an orthonormal basis

{v_{1}, v_{2}, \dots,

v_{m + 1}}

of the Krylov subspace

K_{m + 1} (A, v_{0}) = s p a n {v_{0}, A v_{0}, \dots, A^{m} v_{0}}

, where

A \in R^{n \times n}

and

v_{0} \in R^{n \times 1}

are a given matrix and an initial vector, respectively. Here we sketch it in Algorithm 1.

In matrix form, Algorithm 1 yields after m steps the Arnoldi decompositions

\begin{matrix} A V_{m} = & V_{m + 1} {\bar{H}}_{m} \end{matrix}

(4)

\begin{matrix} = & V_{m} H_{m} + h_{m + 1, m} v_{m + 1} {e_{m}}^{T}, \end{matrix}

(5)

and

V_{m}^{T} A V_{m} = H_{m},

(6)

where

H_{m} = {h_{i, j}} \in R^{m \times m}

is an upper Hessenberg matrix, and we denote by

{\bar{H}}_{m} = [H_{m}; h_{m + 1, m} e_{m}^{T}] \in R^{(m + 1) \times m}

, by

e_{m} = {[0, 0, \dots, 0, 1]}^{T}

and by

V_{m + 1} = [V_{m}, v_{m + 1}]

. In Table 1 we summarize the computational cost of Algorithm 1 for the case of a general matrix A, and in Table 2 for the special case of the Google matrix ((2). Note that in the latter case, from the expression

A = α P + v {((1 - α) e + α d)}^{T}

, it follows that the product

A u

requires 1 sparse matrix-vector operation

α P u

(hereafter u represents an arbitrary vector with appropriate dimension), 1 inner product

f^{T} u

where

f = ((1 - α) e + α d)

is stored, 1 vector scaling operation and 1 vector addition.

Algorithm 1 The Arnoldi process [

V_{m}

H_{m}

v_{m + 1}

h_{m + 1, m}

β

A r n o l d i (A, v_{0}, m)

1:: Compute $β = ∥ v_{0} ∥_{2}$ , $v_{1} = v_{0} / β$ .
2:: for $j = 1 : m$ do
3:: Compute $w = A v_{j}$ .
4:: for $i = 1 : j$ do
5:: Compute $h_{i, j} = v_{i}^{T} w$ .
6:: Compute $w = w - h_{i, j} v_{i}$ .
7:: end for
8:: Compute $h_{j + 1, j} = {∥ w ∥}_{2}$ .
9:: if $h_{j + 1, j} = 0$ then
10:: Set $m = j$ , $v_{m + 1} = 0$ , and break.
11:: else
12:: Set $v_{j + 1} = w / h_{j + 1, j}$ .
13:: end if
14:: end for
15:: Generate $V_{m} = [v_{1}, v_{2}, \dots, v_{m}]$ .

The Arnoldi process is the main ingredient of many efficient numerical techniques for solving linear systems and eigen-problems. After m steps, scalars

λ_{i}^{(m)}

and vectors

ϕ_{i}^{(m)} = V_{m} y_{i}^{(m)}

(i = 1, 2, \dots, m)

satisfying

H_{m} y_{i}^{(m)} = λ_{i}^{(m)} y_{i}^{(m)}

are called the Ritz values and the Ritz vectors of A onto the Krylov subspace

K_{m} (A, v_{0})

, respectively. The Ritz values with largest real parts and their corresponding Ritz vectors are often used to approximate the eigenvalues with largest real parts of A and their associated eigenvectors [21]. However, in-depth convergence analysis of the Arnoldi method shows that the Ritz vectors are not guaranteed to converge to the actual eigenvectors, even when the Ritz values do converge. To overcome this difficulty, Jia proposed in [22] to compute the refined Ritz vector

u_{i}^{(m)}

associated to the Ritz value

λ_{i}^{(m)}

, defined as

∥ (A - λ_{i}^{(m)} I) u_{i}^{(m)} ∥ = min_{u \in K_{m} (A, v_{0}), ∥ u ∥ = 1} ∥ (A - λ_{i}^{(m)} I) u ∥ .

(7)

The solution of Equation (7) is given by

\begin{matrix} [H_{m}; h_{m + 1, m} e_{m}^{T}] - λ_{i}^{(m)} [I; O] & = U Σ W^{T}, \\ u_{i}^{(m)} & = V_{m} W (:, m), \end{matrix}

(8)

where

U Σ W^{T}

is the singular value decomposition of matrix

[H_{m}; h_{m + 1, m} e_{m}^{T}] - λ_{i}^{(m)} [I; O]

, the singular values are stored in the diagonal part of

Σ

in decreasing order and the columns of W are the right singular vectors. Besides, the 2-norm of the residual vector

∥ (A - λ_{i}^{(m)} I) u_{i}^{(m)} ∥_{2}

is equal to the smallest singular value

Σ (m, m)

Although in exact arithmetic the refined Ritz vector

u_{i}^{(m)}

is guaranteed to converge to the eigenvector associated to

λ_{i}

when

λ_{i}^{(m)} \to λ_{i}

, some potential numerical difficulties may still arise in practice. Firstly, the largest Ritz value may be complex, and the use of complex arithmetic may be a memory burden for large-scale computations. Secondly, when

α

is close to 1, slow or irregular convergence can still happen due to a weak separation of the eigenvalues of A. Golub and Greif propose to use the largest eigenvalue of the Google matrix, which is known and equal to 1, as a shift in (8) instead of the largest Ritz value

λ_{1}^{(m)}

as an attempt to overcome problems related to slow solving process [3]. We present the Golub and Greif variant of the refined Arnoldi method for PageRank in Algorithm 2, and refer to it shortly as the GG-Arnoldi method hereafter.

Algorithm 2 The Golub and Greif variant of the restarted refined Arnoldi method (shortly referred to as GG-Arnoldi)

Input: The PageRank coefficient matrix A, initial guess

x_{0}

, parameters m and

t o l

1:: Run Algorithm 1 as [ $V_{m}$ , $H_{m}$ , $v_{m + 1}$ , $h_{m + 1, m}$ , $β$ ]= $A r n o l d i (A, x_{0}, m)$ .
2:: Compute the SVD factorization $[H_{m}; h_{m + 1, m} e_{m}^{T}] - [I; O] = U Σ W^{T}$ .
3:: Compute the updated approximated solution $x = V_{m} W (:, m)$ .
4:: Compute the residual 2-norm by the smallest singular value $r = Σ (m, m)$ .
5:: if $r / {∥ x ∥}_{1} \leq t o l$ then
6:: Outputs $x = x / {∥ x ∥}_{1}$ and ends.
7:: else
8:: Set $x_{0} = x$ and go to step 1.
9:: end if
10:: returnx.

The overall computational cost of Algorithm 2 is summarized in Table 3. Note that the singular value decomposition computed at line 3 is omitted because this cost is negligible when m is very small. Analogously, the vector scaling

x = x / {∥ x ∥}_{1}

at line 6 is not reported in the table as it is computed only at the last cycle. Finally, the operation

x = V_{m} W (:, m)

is equivalent to m vector scaling operations and

m - 1

vector additions.

The convergence analysis of the GG-Arnoldi method is presented in [20]. Here we recall the main result. We denote as

X_{⊥}

the standard orthogonal basis of the orthogonal complement of the space

s p a n {x}

, so that

[x, X_{⊥}]

is orthonormal. Then, we have

[\begin{matrix} x^{T} \\ X_{⊥}^{T} \end{matrix}] A [x, X_{⊥}] = [\begin{matrix} {∥ x ∥}_{2}^{2} & x^{T} A X_{⊥} \\ O & A_{2} \end{matrix}],

where

A_{2} = X_{⊥}^{T} A X_{⊥}

. According to the Perron-Frobenius theorem, the spectral radius of A is equal to 1, A has only one eigenvalue 1 on the unit circle, and the eigenspace of this eigenvalue is 1-dimensional. As 1 is a simple eigenvalue of A, we introduce a separation function named “sep” and defined as

s e p (1, A_{2}) = {∥ {(1 - A_{2})}^{- 1} ∥}_{2}^{- 1} = σ_{m i n} (I - A_{2}) .

(9)

Since

σ_{m i n} (I - A_{2}) \leq | 1 - λ_{2} |

, where

λ_{2}

is the second largest eigenvalue of A [20], we obtain

s e p (1, A_{2}) \leq | 1 - λ_{2} | .

Under these assumptions, the theorem below suggests that the larger the second dominant eigenvalue

λ_{2}

, the slower the convergence rate of the GG-Arnoldi method applied to the eigen-problem (3).

Theorem 1

(Theorem 2.2 in [20]). Let

P_{m}

be the orthogonal projector onto the subspace

K_{m} (A, v_{0})

ϵ_{m} = {∥ (I - P_{m}) x ∥}_{2}

be the distance between the PageRank vector x and the subspace

K_{m} (A, v_{0})

, and

x_{A T}

be the solution generated by the GG-Arnoldi method. Then it holds

s i n ∠ (x, x_{A T}) \leq \frac{ϵ_{m}}{\sqrt{1 - ϵ_{m}^{2}}} \cdot \frac{{∥ A - I ∥}_{2}}{s e p (1, A_{2})} .

(10)

We recall below another property of the eigenvalues of the Google matrix A that is very relevant to our analysis.

Theorem 2

([23]). If a column-stochastic matrix

\tilde{P}

has at least two irreducible closed subsets (which is the case for the web hyperlink matrix), then the second eigenvalue

λ_{2}

A = α \tilde{P} + (1 - α) v e^{T}

, where

0 < α < 1

and v is a vector with non-negative elements satisfying

{∥ v ∥}_{1} = 1

, is given by α.

Therefore, slow convergence of the GG-Arnoldi method for PageRank problems may be expected in particular when

α

approaches 1. As this situation arises in several applications, some acceleration techniques have been developed in the past years to enhance the robustness of the GG-Arnoldi method. They can be classified in two types: (1) using weighted inner products instead of the standard inner products in Algorithm 2 to ensure a more favourable distribution of Ritz values [15,17]; (2) combining the GG-Arnoldi method with a few cycles of stationary iterative solvers, like in the Power-Arnoldi method [16], the Inout-Arnoldi method [18], the Arnoldi-PET method [17] and others. Both approaches attempt to provide better initial guesses for the Arnoldi process. To the best of our knowledge, no research work has investigated efficient ways to precondition the PageRank eigen-problem (3), in order to make it easier to be solved by the Arnoldi-type method. This is the main objective of our study.

3. Preconditioning the Refined Arnoldi Method

Preconditioning is an established technique used to accelerate Krylov subspace methods for solving linear systems. A nonsingular linear system

A y = c

can be transformed into an equivalent one of either the form

M A y = M c

(left preconditioned system) or the form

A M z = c

with

y = M z

(right preconditioned system), where

M \approx A^{- 1}

is called the preconditioner matrix or simply the preconditioner. Goal of preconditioning is to improve the spectral distribution of the coefficient matrix A so that a Krylov subspace algorithm can solve the transformed preconditioned system much faster than the original one. The development of efficient preconditioners is a very important topic in numerical linear algebra because it can enable rapid solution of problems that may initially appear numerically intractable. However, preconditioning is seldom used for solving eigen-problems

A y = λ y

because the left preconditioned (

M A y = λ M y

) and right preconditioned (

A M z = λ y

y = M z

) eigenproblems are no longer equivalent to the original one.

Here we propose a new approach to precondition the PageRank eigen-problem (3) for the computation of the dominant eigenvector x corresponding to the largest eigenvalue

λ_{1} = 1

of the Google matrix A. According to the analysis presented in Section 2, the convergence speed of the GG-Arnoldi method is mainly determined by the separation between the largest and second largest eigenvalues of A, namely by the quantity

| λ_{1} - λ_{2} | = 1 - α

. Therefore, our strategy for preconditioning Equation (3) is to transform it into an equivalent eigen-problem that has a better separation between its two dominant eigenvalues. The main theoretical result underlying our method is presented in the theorem below.

Theorem 3.

For Google matrix A in (2) and its eigenvalues

1 > | λ_{2} | > | λ_{3} | > \dots > | λ_{s} |

, if a polynomial

P

satisfies

P (λ_{i}) \neq P (1)

(2 \leq i \leq s)

, then the PageRank eigen-problem (3) is equivalent to the eigen-problem:

P (A) x = {P (1) x, x > 0 and ∥ x ∥}_{1} = 1 .

(11)

Proof.

It is clear that any solution of

A x = x

is also the solution of

P (A) x = P (1) x

. Suppose

A = T^{- 1} [\begin{matrix} 1 \\ J_{2} \\ J_{3} \\ ⋱ \\ J_{s} \end{matrix}] T,

where each

J_{i}

(

2 \leq i \leq s

) is a Jordan matrix whose diagonal elements equal to

λ_{i}

. Then,

P (A) = T^{- 1} [\begin{matrix} P (1) \\ P (J_{2}) \\ P (J_{3}) \\ ⋱ \\ P (J_{s}) \end{matrix}] T .

Clearly, each

P (J_{i})

(

2 \leq i \leq s

) is still a lower triangle matrix, and its diagonal elements equal to

P (λ_{i})

. Because

P (λ_{i}) \neq P (1)

(2 \leq i \leq s)

, therefore the algebraic multiplicity of the eigenvalue

P (1)

P (A)

equals to 1, as same as that of the eigenvalue 1 of A. Therefore

P (A) x = P (1) x

has the same solution space as

A x = x

, i.e., problem (3) is equivalent to problem (11). □

The obvious question to address is whether such a polynomial

P

satisfying the condition

P (λ_{i}) \neq P (1)

(2 \leq i \leq s)

on the eigenvalues

1 > | λ_{2} | > | λ_{3} | > \dots > | λ_{s} |

of A exists, and then how to make

P (A) x = P (1) x

easier to solve than

A x = x

. The simple polynomial

P (x) = x^{k}

(

k \in N^{+}

) clearly satisfies

P (λ_{i}) \neq P (1) = 1

(2 \leq i \leq s)

. According to Theorem 3, the two eigen-problems

A^{k} x = x, x > 0 and {∥ x ∥}_{1} = 1

and

A x = {x, x > 0 and ∥ x ∥}_{1} = 1

are equivalent. Besides, in

A^{k}

the distance between the largest two eigenvalues equals to

| P (1) - P (α) | = 1 - α^{k} > > 1 - α

. As a result, we can expect that the GG-Arnoldi method will require less cycles and operations to converge when it is applied to

A^{k} x = x

than to the problem

A x = x

. We sketch the complete preconditioned version of the Golub and Greif variant of the refined Arnoldi method with the polynomial choice

P (x) = x^{k}

, hereafter shortly referred to as the PGG-Arnoldi method, in Algorithm 3.

Algorithm 3 The preconditioned version of the Golub and Greif variant of the refined Arnoldi method, with the polynomial choice

P (x) = x^{k}

(shortly, PGG-Arnoldi) for PageRank.

Input: the PageRank coefficient matrix A, initial guess

x_{0}

, parameters m, k and

t o l

1:: Run Algorithm 1 as [ $V_{m}$ , $H_{m}$ , $v_{m + 1}$ , $h_{m + 1, m}$ , $β$ ]= $A r n o l d i (A^{k}, x_{0}, m)$ .
2:: Compute the SVD factorization $[H_{m}; h_{m + 1, m} e_{m}^{T}] - [I; O] = U Σ W^{T}$ ;
3:: Compute the updated approximated solution $x = V_{m} W (:, m)$ .
4:: Compute the residual 2-norm by the smallest singular value $r = m i n (d i a g (Σ))$ .
5:: if ${r ∥ x ∥}_{1} \leq t o l$ then
6:: Outputs $x = x / {∥ x ∥}_{1}$ and ends.
7:: else
8:: Set $x_{0} = x$ and go to step 1.
9:: end if
10:: returnx

Note that the matrix

A^{k}

in Algorithm 3 is never formed explicitly. The operations associated with

A^{k}

in this algorithm are all matrix-vector multiplications, such operation

A^{k} u

is computed by carrying out k sparse matrix-vector products without assembling the Google matrix A shown as follows in Algorithm 4.

Algorithm 4 Implementation of the matrix-vector product

y \leftarrow A^{k} u

1:: fori = 1:k do
2:: Compute $u \leftarrow α P u + (f^{T} u) v$ .
3:: end for
4:: Set $y \leftarrow u$ .
5:: returny.

The overall algorithmic complexity of a cycle of the PGG-Arnoldi method (Algorithm 3) is summarized in Table 4.

It is clear that one cycle of Algorithm 3 is computationally more expensive than that of the GG-Arnoldi method (Algorithm 2) for the same value of m, because the former requires to compute the matrix-vector product

A^{k} u

. In detail, the PGG-Arnoldi algorithm needs additional

(k - 1) m

sparse matrix-vector products,

(k - 1) m

inner products,

(k - 1) m

vector scalings and

(k - 1) m

vector additions compared with the GG-Arnoldi algorithm per cycle. The computation of an inner product requires n floating-point multiplications and

n - 1

floating-point additions while scaling a vector only needs n floating-point multiplications. Thus, we will assume that a vector scaling operation has half the cost of an inner product with vectors of the same dimension. For the same reason, the cost of each vector addition, 2-norm and 1-norm of a vector can be counted as

0.5

, 1 and

0.5

inner-product, respectively. Finally, the algorithmic complexity estimates of one cycle of GG-Arnoldi and PGG-Arnoldi in terms of sparse matrix-vector multiplications and vector inner-products are presented in Table 5.

We observe from Table 5 that one cycle of the PGG-Arnoldi costs less than k times the cost of GG-Arnoldi. This means that the operations in addition to the sparse matrix-vector product in PGG-Arnoldi are less than those required in GG-Arnoldi. Therefore, if the number of cycles required by the PGG-Arnoldi is less than

\frac{1}{k}

times of the GG-Arnoldi, the former must be faster. Indeed, it may be faster even when the number of cycles required by the PGG-Arnoldi is larger than

\frac{1}{k}

times of the GG-Arnoldi, which will be shown by numerical experiments. Besides, in real applications, the computation of a matrix-vector product is more efficient than that of vector operations in either serial or parallel environments, as the former belongs to BLAS-2 classification while the latter belongs to BLAS-1. Note that, it is very desirable to improve the efficiency of parallel computing for solving large problems such as PageRank. Given a good estimate of the convergence rate comparison between these two methods and the density of matrix G, the results presented in Table 5 can guide the choice between the PGG-Arnoldi or the GG-Arnoldi methods for solving the problem at hand. However, few things (in particular, quantitative conclusions) are known about the convergence rate of the GG-Arnoldi method besides that it increases with the gap

| 1 - λ_{2} |

, therefore we rely on numerical experiments to analyze the performance of the PGG-Arnoldi method.

Finally, we remark that:

acceleration techniques designed to work with the GG-Arnoldi, such as extrapolation methods [6,7], formulations based on weighted inner products [3] and hybrid solution schemes [16,17,18] that combine stationary iterative iterations with the Arnoldi method, can still be applied to the PGG-Arnoldi;
the performance of the PGG-Arnoldi can be further improved by using pre-processing algorithms such as the elimination strategy in [24] and the low-rank factorization in [25] that reduce the time cost and the memory cost of computing the sparse matrix-vector product $α P u$ , because the sparse matrix-vector products in the PGG-Arnoldi account for a larger proportion of computation compared to the GG-Arnoldi.

4. Numerical Results

In this section, we present results of numerical experiments on a suite of Web matrix problems obtained from the University of Florida matrix repository [26] and from the Laboratory for Web Algorithmics [27,28,29]. The characteristics of our test problems are presented in Table 6. For each Web adjacency matrix G, we build the Google matrix using Equation (2) with personalization vector

v = {[1, 1, \dots, 1]}^{T} / n

. We use the value

α = 0.99

for the damping parameter, so that the resulting PageRank problems are rather difficult to solve for iterative methods. All the runs with iterative solvers are started from the initial guess

x_{0} = {[1 / n, 1 / n, \dots, 1 / n]}^{T}

and are stopped when either the approximate solution

x_{i}

satisfies

\frac{∥ (I - A) x_{i} ∥_{2}}{∥ x_{i} ∥_{1}} < 10^{- 8}

, or the number of matrix-vector products computed exceeds 20,000. All the runs are carried out in MATLAB R2016b on a 64-bit Windows-10 computer equipped with an Intel core i7-8750H processor and 16 GB RAM memory.

4.1. Performance Analysis of the PGG-Arnoldi Method

We first assess the performance behaviour of the PGG-Arnoldi method with the polynomial choice

P (x) = x^{k}

, using different values of degree k and increasing dimensions m of the Krylov search space. Because of the very large size of PageRank problems in real applications, the dimension of the Krylov subspace should be kept as small as possible. In our experiments on the web_NotreDame and in-2004 matrices, we use

m = [3, 4, \dots, 10]

and

k = [1, 2, \dots, 10]

. In Table 7 and Table 8 we report on the results of our experiments in terms of number of iterations and CPU time (in seconds) for each run.

We observe that, in general, the number of iterations decreases when higher degree polynomials and Krylov subspaces of higher dimensions are used. By a simple multiple linear regression of number of iterations (

I t e r

\frac{1}{k}

and

\frac{1}{m}

for Table 7, we obtained

I t e r = - 66.4 + 394.2 \frac{1}{m} + 157.1 \frac{1}{k}

with the

R^{2}

statistic being

0.8

, the probability of the F-statistic is less than

0.05

, and all the confidence intervals not including the origin. Therefore, the linear relationship can be considered significant. Similar results are obtained for Table 8. The CPU time cost generally decreases when m increases, while it oscillates with the increase of k. For the purpose of saving computing time, it is suggested to set the dimension of the Krylov subspace to the maximum value allowed by the available memory. For a given m, the number of iteration tends to increase inversely proportional with k, or approximately proportional to

\frac{1}{k}

. Then, if the ratio between the time costs of computing the matrix-vector product

α P u

and an inner product between vectors of the same dimension is known, the complexity estimates given in Table 5 can guide the user in the choice of the almost optimal k in the range 1:10. Here we set

m = 10

and

k = 5

based on our experiments.

4.2. Preconditioning Combined with Weighted Inner-Product

We study the effect of using weighted inner products instead of standard inner products on the performance of the PGG-Arnoldi method. The experiments are carried out on the same problems as in our previous experiments. The results are presented in Table 9 and Table 10.

The general trend is still the same, that is the number of iterations generally decreases with the increase of the polynomial degree k and of the Krylov subspace dimension m, but some exceptions to this trend can be observed. Here we also carry out a simple multiple linear regression of the data: the number of iterations (

I t e r

\frac{1}{k}

and

\frac{1}{m}

. The result for Table 9 is

I t e r = - 72.4 + 426.3 \frac{1}{m} + 146.0 \frac{1}{k}

, with the

R^{2}

statistic being 0.75 that is smaller than 0.8 of the results of Table 7. This phenomenon may be explained by the fact that the adaptively weighting technique [15] makes the Arnoldi method more irregular. Besides, we can see that the preconditioning strategy proposed in this paper can also accelerate the weighted Arnoldi remarkably when solving these difficult PageRank problems. For nearly all the value of m, the CPU time cost corresponding to the optimal value of k is less than 50% of the time cost corresponding to

k = 1

that is the case with no preconditioning.

4.3. Comparisons with Other Methods

In this section, we compare the performance of the PGG-Arnoldi algorithm against other algorithms including the FOM method in [30] and its weighted version (referred to as W-FOM), the Golub and Greif variant of the refined Arnoldi method for PageRank (referred to as GG-Arnoldi) and its weighted version in [15] (referred to as W-Arnoldi), the Power method (referred to as Power), the extrapolation accelerated Power-Arnoldi method in [7,17] (referred to as EXT-Arnoldi) and the multi-step Power-inner-outer method in [12] (referred to as MPIO). Note that, we also test the performances of the preconditioned weighted Arnoldi method (referred to as PW-Arnoldi) and the preconditioned extrapolation accelerated Power-Arnoldi method (referred to as EXT-PArnoldi) where the GG-Arnoldi method is replaced by our preconditioned Arnoldi method. All the matrices listed in Table 6 are tested. For all the tested methods, the Krylov subspace dimension m is set as

m = 10

, and the polynomial degree k of the preconditioner is set as

k = 5

. For the MPIO method, the number of steps of the Power method is set as 7, the parameters

β

and

η

for controlling the inner iterations are set as

0.5

and

0.1

respectively. For the EXT-Arnoldi method and the EXT-PArnoldi method, the extrapolation technique is applied every 40 Power iterations, and the residual tolerance value

ϵ_{1}

of the Power method is set as

10^{- 6}

. Note that, the parameter settings of the MPIO and the EXT-Arnoldi methods are almost the best settings given in the literatures [7,12]. The numerical results are presented in Table 11, where the smallest CPU time cost is typeset for clarity in bold font.

We can see from Table 11 the clear potential of the proposed preconditioning strategy for accelerating the GG-Arnoldi method remarkably. The time costs of PGG-Arnoldi and PW-Arnoldi are significantly smaller than their unpreconditioned versions. Moreover, the time cost can be often further reduced when PGG-Arnoldi is combined with Power iterations and extrapolation techniques. The resulting EXT-PArnoldi outperforms all the other methods on four out of six problems, PGG-Arnoldi acts as the fastest method on one problem and is close to the best on the remaining problem. Note that, the GG-Arnoldi method is always slower than W-FOM or MPIO, while PGG-Arnoldi outperforms them in most cases. We can conclude that the proposed preconditioning strategy can improve the efficiency of the Arnoldi-type methods, and makes them faster than some other state-of-the-art methods, when solving difficult PageRank problems.

Because the Arnoldi-type methods are very competitive for computing PageRank, it can be expected that the developed preconditioning technique can possibly accelerate the computation of PageRank problems arising from various fields. Besides, this technique can improve the efficiency of parallel computing of the PageRank problem. Accordingly, any project using the PageRank model can have a faster solving process, this will possibly improve the efficiencies of scientific researches, engineering and services. As pre-described, for the GeneRank problem [19] and the ProteinRank [20] problem, users can find the genes and proteins that are pathogenic with high probability faster.

5. Conclusions

In this paper, we show that if a polynomial

P

satisfies

P (1) \neq P (λ_{i})

for any

| λ_{i} | < 1

, then the PageRank problem

A x = x

is equivalent to the new eigen-problem

P (A) x = P (1) x

. Moreover, by a suitable choice of the polynomial

P

such as

P = x^{k}

chosen in this paper, the new eigen-problem can exhibit a much better separation between the two largest eigenvalues, thus the Arnoldi-type methods can solve this problem by less iterations. Accordingly, the number of vector-vector operations with low parallelism in the solving process can be reduced. Based on this result, we introduce a preconditioned version of the Golub and Greif variant of the refined Arnoldi method for computing PageRank. Numerical experiments demonstrate that this method can solve PageRank problem much faster than the refined Arnoldi in a wide range of parameter settings, meanwhile the weighted-Arnoldi and the extrapolated-Arnoldi methods can also be accelerated by this preconditioning strategy. Finally, preconditioned Arnoldi-type methods show superiority over some other state-of-the-art methods for solving this problem class.

Author Contributions

Methodology, Z.-L.S., B.C. and H.Y.; software, Z.-L.S. and H.Y.; writing—original draft preparation, Z.-L.S. and H.Y.; writing—review and editing, B.C., X.-M.G. and C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Two-Way Support Programs of Sichuan Agricultural University (Grant No.1921993077). The third author is a member of the Gruppo Nazionale per il Calcolo Scientifico (GNCS) of the Istituto Nazionale di Alta Matematica (INdAM) and this work was partially supported by INdAM-GNCS under Progetti di Ricerca 2020.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Langville, A.N.; Meyer, C.D. Google’s Pagerank and Beyond: The Science of Search Engine Rankings, 3rd ed.; Princeton University Press: Princeton, NJ, USA, 2006. [Google Scholar]
Gleich, D.F. PageRank beyond the web. SIAM Rev. 2015, 57, 321–363. [Google Scholar] [CrossRef]
Golub, G.H.; Greif, C. An Arnoldi-type algorithm for computing pagerank. BIT 2006, 46, 759–771. [Google Scholar] [CrossRef]
Kamvar, S.D.; Haveliwala, T.H.; Golub, G.H. Adaptive methods for the computation of the PageRank. Linear Algebra Appl. 2004, 386, 51–65. [Google Scholar] [CrossRef] [Green Version]
Kamvar, S.D.; Haveliwala, T.H.; Manning, C.D.; Golub, G.H. Extrapolation methods for accelerating PageRank computation. In Proceedings of the 12th International World Wide Web Conference, Budapest, Hungary, 20–24 May 2003; pp. 261–270. [Google Scholar]
Wu, G.; Wei, Y. An Arnoldi-Extrapolation algorithm for computing PageRank. J. Comput. Appl. Math. 2010, 234, 3196–3212. [Google Scholar] [CrossRef] [Green Version]
Tan, X. A new extrapolation method for PageRank computations. J. Comput. Appl. Math. 2017, 313, 383–392. [Google Scholar] [CrossRef]
Sterck, H.D.; Manteuffel, S.F.; Nguyen, Q.; Ruge, J. Multilevel adaptive aggregation for Markov chains, with application to web ranking. SIAM J. Sci. Comput. 2008, 30, 2235–2262. [Google Scholar] [CrossRef] [Green Version]
Shen, Z.-L.; Huang, T.-Z.; Carpentieri, B.; Wen, C.; Gun, X.-M. Block-accelerated aggregation multigrid for Markov chains with application to PageRank problems. Commun. Nonlinear Sci. 2018, 59, 472–487. [Google Scholar] [CrossRef]
Gleich, D.F.; Gray, A.P.; Greif, C.; Wen, C.; Lau, T. An Inner-Outer Iteration for Computing PageRank. SIAM J. Sci. Comput. 2010, 32, 349–371. [Google Scholar] [CrossRef]
Gu, C.-Q.; Xie, F.; Greif, C.; Zhang, K. A two-step matrix splitting iteration for computing PageRank. J. Comput. Appl. Math. 2015, 278, 19–28. [Google Scholar] [CrossRef]
Wen, C.; Huang, T.-Z.; Shen, Z.-L. A note on the two-step matrix splitting iteration for computing PageRank. J. Comput. Appl. Math. 2017, 315, 87–97. [Google Scholar] [CrossRef]
Tian, Z.-L.; Liu, L.; Zhang, Y.; Liu, Z.-Y.; Tian, M.-Y. The general inner-outer iteration method based on regular splittings for the PageRank problem. Appl. Math. Comput. 2019, 356, 479–501. [Google Scholar] [CrossRef]
Tian, M.-Y.; Liu, L.; Zhang, Y.; Wang, Y.-D. A general multi-splitting iteration method for computing PageRank. Comput. Appl. Math. 2019, 38, 1–29. [Google Scholar] [CrossRef]
Yin, J.-F.; Yin, G.-J.; Ng, M. On adaptively accelerated Arnoldi method for computing PageRank. Numer. Linear Algebra Appl. 2012, 19, 73–85. [Google Scholar] [CrossRef]
Wu, G.; Wei, Y. A power-Arnoldi algorithm for computing pagerank. Numer. Linear Algebra Appl. 2007, 14, 521–546. [Google Scholar] [CrossRef]
Hu, Q.-Y.; Wei, C.; Huang, T.-Z.; Shen, Z.-L.; Gu, X.-M. A variant of the Power-Arnoldi algorithm for computing PageRank. J. Comput. Appl. Math. 2021, 381, 113034. [Google Scholar] [CrossRef]
Gu, C.-Q.; Wang, W.-W. An Arnoldi-Inout algorithm for computing PageRank problems. J. Comput. Appl. Math. 2017, 309, 219–229. [Google Scholar] [CrossRef]
Morrison, J.L.; Breitling, R.; Higham, D.J.; Gilbert, D.R. GeneRank: Using search engine technology for the analysis of microarray experiments. BMC Bioinform. 2005, 6, 233. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wu, G.; Zhang, Y.; Wei, Y. Accelerating the Arnoldi-type algorithm for the PageRank problem and the ProteinRank problem. J. Sci. Comput. 2013, 57, 74–104. [Google Scholar] [CrossRef]
Saad, Y. Variations on Arnoldi’s method for computing eigenelements of large unsymmetric matrices. Numer. Linear Algebra Appl. 1980, 34, 269–295. [Google Scholar] [CrossRef] [Green Version]
Jia, Z. Refined iterative algorithms based on Arnoldi’s process for large unsymmetric eigenproblems. Numer. Linear Algebra Appl. 1997, 259, 1–23. [Google Scholar] [CrossRef] [Green Version]
Haveliwala, T.; Kamvar, S. The Second Eigenvalue of the Google Matrix; Stanford University Technical Report; Stanford InfoLab: Stanford, CA, USA, 2004. [Google Scholar]
Shen, Z.-L.; Huang, T.-Z.; Carpentieri, B.; Gun, X.-M.; Wen, C. An efficient elimination strategy for solving PageRank problems. Appl. Math. Comput. 2017, 298, 111–122. [Google Scholar] [CrossRef]
Shen, Z.-L.; Huang, T.-Z.; Carpentieri, B.; Wen, C.; Gun, X.-M.; Tan, X.-Y. Off-diagonal low-rank preconditioner for difficult PageRank problems. J. Comput. Appl. Math. 2019, 346, 456–470. [Google Scholar] [CrossRef]
Davis, T.A.; Hu, Y.; Wei, Y. The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 2011, 38, 1:1–1:25. [Google Scholar] [CrossRef]
Boldi, P.; Vigna, S. The WebGraph Framework I: Compression Techniques. In Proceedings of the 13th International Conference on World Wide Web, New York, NY, USA, 17–20 May 2004; pp. 595–602. [Google Scholar]
Boldi, P.; Rosa, M.; Santini, M.; Vigna, S. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In Proceedings of the 20th International Conference on World Wide Web, Hyderabad, India, 28 March–1 April 2011; pp. 587–596. [Google Scholar]
Boldi, P.; Codenotti, B.; Santini, M.; Vigna, S. Ubicrawler: A scalable fully distributed Web crawler. Softw. Pract. Exp. 2004, 34, 711–726. [Google Scholar] [CrossRef]
Zhang, H.-F.; Huang, T.-Z.; Shen, Z.-L. FOM accelerated by an extrapolation method for solving PageRank problems. J. Comput. Appl. Math. 2016, 296, 397–409. [Google Scholar] [CrossRef]

Table 1. Computational cost of the Arnoldi process (Algorithm 1).

Operation	Line	Times
Matrix-vector multiplication	3	m
Inner-product	5	$\frac{m (m + 1)}{2}$
Vector scaling	1, 12	$m + 1$
Vector 2-norm computation	1, 8	$m + 1$
SAXPY (SAXPY denotes the operation $x + a y$ where a is a scalar, x and y are vectors.)	6	$\frac{m (m + 1)}{2}$

Table 2. Computational cost of the Arnoldi process (Algorithm 1) for PageRank.

Operation	Times
Sparse matrix-vector multiplication	m
Inner product	$\frac{m (m + 3)}{2}$
Vector scaling	$\frac{m^{2} + 5 m + 2}{2}$
Vector 2-norm computation	$m + 1$
Vector addition	$\frac{m (m + 3)}{2}$

Table 3. The computational cost per cycle of the GG-Arnoldi method for PageRank.

Operation	Times
Sparse matrix-vector multiplication	m
Inner-product	$\frac{m (m + 3)}{2}$
Vector-scaling	$\frac{m^{2} + 7 m + 4}{2}$
Vector 2-norm computation	$m + 1$
Vector-addition	$\frac{m^{2} + 5 m - 2}{2}$
Vector 1-norm computation	1

Table 4. The computational cost per cycle of the PGG-Arnoldi method for PageRank.

Operation	Times
Sparse matrix-vector multiplication	$k m$
Inner-product	$\frac{m (m + 2 k + 1)}{2}$
Vector-scaling	$\frac{m^{2} + 2 k m + 5 m + 4}{2}$
Vector 2-norm computation	$m + 1$
Vector-addition	$\frac{m^{2} + 2 k m + 3 m - 2}{2}$
Vector 1-norm computation	1

Table 5. Estimated algorithmic complexity of the GG-Arnoldi (Algorithm 2) and of its preconditioned version PGG-Arnoldi (Algorithm 3).

Method	GG-Arnoldi	PGG-Arnoldi
Sparse matrix-vector multiplications	m	$k m$
Inner-products	$\frac{2 m^{2} + 11 m + 4}{2}$	$\frac{2 m^{2} + 4 k m + 7 m + 4}{2}$

Table 6. Characteristics of the Web adjacency matrices G tested in our experiments. The symbol n is the dimension of the matrix, and

n n z

is the number of nonzero elements (listed in increasing matrix size).

Table 6. Characteristics of the Web adjacency matrices G tested in our experiments. The symbol n is the dimension of the matrix, and

n n z

is the number of nonzero elements (listed in increasing matrix size).

Name	n	$n n z$	$n n z / n^{2}$
`web-Stanford`	281,903	2,312,497	2.9e-5
`cnr-2000`	325,557	3,216,152	3.0e-5
`web-BerkStan`	685,230	7,600,595	1.6e-5
`eu-2005`	862,664	19,235,140	2.6e-5
`in-2004`	1,382,908	16,917,053	8.8e-6
`indochina-2004`	7,414,866	194,109,311	3.5e-6
`wb-edu`	9,845,725	57,156,537	5.9e-7

Table 7. Performances of the PGG-Arnoldi method on the web_NotreDame problem.

Number of iterations
$m ↓$	$k = 1$	$k = 2$	$k = 3$	$k = 4$	$k = 5$	$k = 6$	$k = 7$	$k = 8$	$k = 9$	$k = 10$
3	351	226	117	110	73	72	53	55	39	44
4	233	147	84	71	50	47	35	34	25	27
5	173	109	61	53	37	35	27	25	22	19
6	139	87	49	42	29	27	22	20	17	15
7	112	70	35	34	23	22	17	16	13	13
8	90	58	34	27	19	18	15	13	10	11
9	78	47	28	23	16	16	12	12	10	8
10	62	39	24	18	14	12	10	8	8	7
CPU time (in seconds)
$m ↓$	$k = 1$	$k = 2$	$k = 3$	$k = 4$	$k = 5$	$k = 6$	$k = 7$	$k = 8$	$k = 9$	$k = 10$
3	8.0	6.7	4.6	5.1	4.0	4.5	3.8	4.3	3.5	4.4
4	6.4	5.8	4.3	4.4	3.7	4.0	3.4	3.7	3.0	3.6
5	5.9	5.4	3.9	4.2	3.5	3.7	3.2	3.4	3.3	3.1
6	5.9	5.2	3.7	4.0	3.4	3.4	3.2	3.3	3.0	2.9
7	5.5	4.8	3.1	3.7	2.9	3.3	2.9	3.0	2.7	3.0
8	5.0	4.6	3.5	3.3	2.8	3.1	2.9	2.8	2.4	2.9
9	5.1	4.3	3.2	3.2	2.7	3.1	2.7	2.9	2.7	2.4
10	4.4	3.9	3.1	2.9	2.6	2.6	2.4	2.2	2.4	2.3

Table 8. Performances of the PGG-Arnoldi method on the in-2004 problem.

Number of iterations
$m ↓$	$k = 1$	$k = 2$	$k = 3$	$k = 4$	$k = 5$	$k = 6$	$k = 7$	$k = 8$	$k = 9$	$k = 10$
3	355	272	121	137	75	88	56	67	42	52
4	235	163	82	82	52	57	35	41	27	30
5	175	114	61	58	39	37	25	28	20	18
6	110	74	44	38	28	21	20	18	15	12
7	95	44	37	25	23	17	16	12	12	10
8	62	42	26	20	17	14	12	10	10	8
9	57	30	21	17	12	10	10	9	8	6
10	46	24	17	18	10	10	8	7	7	6
CPU time (in seconds)
$m ↓$	$k = 1$	$k = 2$	$k = 3$	$k = 4$	$k = 5$	$k = 6$	$k = 7$	$k = 8$	$k = 9$	$k = 10$
3	41.5	46.3	27.3	38.6	25.4	34.9	25.3	34.0	23.7	32.2
4	35.9	37.2	24.9	30.9	23.9	30.0	20.9	27.5	20.1	24.6
5	33.2	32.2	22.9	27.3	22.3	24.2	18.9	23.7	18.8	21.4
6	27.7	25.5	21.6	22.7	20.2	20.4	18.2	18.2	16.8	14.8
7	25.7	17.6	19.8	16.5	18.3	15.7	17.6	14.7	16.3	14.7
8	19.8	22.0	15.9	15.2	15.5	14.8	14.5	13.6	15.0	13.3
9	20.7	15.8	14.6	14.6	12.3	12.0	14.1	13.8	13.8	11.2
10	18.6	14.3	13.2	17.3	11.5	13.3	12.1	11.9	13.2	12.5

Table 9. Performances of the weighted PGG-Arnoldi method on the web_NotreDame problem.

Number of iterations
$m ↓$	$k = 1$	$k = 2$	$k = 3$	$k = 4$	$k = 5$	$k = 6$	$k = 7$	$k = 8$	$k = 9$	$k = 10$
3	355	226	114	112	71	71	52	54	39	43
4	231	145	82	69	48	47	35	33	25	27
5	199	108	62	54	32	35	25	24	19	20
6	128	81	48	41	25	26	19	19	14	15
7	86	60	30	31	20	22	14	15	11	13
8	59	33	27	25	14	17	12	12	10	11
9	60	30	19	17	11	12	11	9	7	7
10	42	24	13	16	11	9	8	9	7	7
CPU time (in seconds)
$m ↓$	$k = 1$	$k = 2$	$k = 3$	$k = 4$	$k = 5$	$k = 6$	$k = 7$	$k = 8$	$k = 9$	$k = 10$
3	10.3	8.3	5.2	6.0	4.4	5.0	4.1	4.7	3.7	4.4
4	8.6	7.0	4.9	4.9	3.9	4.3	3.6	3.8	3.5	4.0
5	9.6	6.8	4.7	4.7	3.2	4.0	3.3	3.5	2.9	3.4
6	7.1	5.8	4.4	4.3	3.0	3.6	2.9	3.2	2.6	3.0
7	5.6	5.0	3.1	3.8	2.8	3.5	2.5	3.0	2.4	3.1
8	4.4	3.2	3.2	3.5	2.3	3.1	2.5	2.7	2.5	3.0
9	5.1	3.3	2.5	2.7	2.0	2.5	2.5	2.3	2.0	2.1
10	4.0	2.9	1.9	2.8	2.2	2.1	2.1	2.6	2.2	2.4

Table 10. Performances of the weighted PGG-Arnoldi method on the in-2004 problem.

Number of iterations
$m ↓$	$k = 1$	$k = 2$	$k = 3$	$k = 4$	$k = 5$	$k = 6$	$k = 7$	$k = 8$	$k = 9$	$k = 10$
3	354	269	118	133	73	87	53	65	40	53
4	231	156	81	82	49	50	33	38	26	17
5	161	59	53	33	33	28	23	20	17	13
6	88	35	32	21	21	15	16	13	12	9
7	49	25	24	15	16	10	13	8	9	9
8	38	24	15	14	10	9	10	8	8	6
9	36	23	14	10	11	8	8	7	7	5
10	34	17	12	10	8	8	6	6	6	5
CPU time (in seconds)
$m ↓$	$k = 1$	$k = 2$	$k = 3$	$k = 4$	$k = 5$	$k = 6$	$k = 7$	$k = 8$	$k = 9$	$k = 10$
3	53.5	54.2	30.4	41.7	27.3	37.2	25.7	35.2	24.2	34.7
4	44.2	41.6	27.6	34.6	24.0	28.4	21.1	27.1	20.5	14.6
5	38.1	19.5	22.5	17.1	20.2	20.3	18.8	17.8	16.7	14.0
6	25.0	13.9	16.3	13.0	15.6	12.7	15.2	13.9	14.1	11.6
7	16.3	11.6	14.2	10.8	13.7	9.9	14.5	10.0	12.4	13.5
8	14.5	12.7	10.2	11.6	9.8	10.1	12.8	11.4	12.6	10.3
9	15.7	13.9	10.8	9.4	12.2	10.2	11.6	11.3	12.9	9.9
10	16.8	11.5	10.4	10.7	9.9	11.4	9.7	10.8	11.9	10.9

Table 11. Comparative analysis of different methods for solving PageRank problems with

α = 0.99

. Notation:

M V

denotes the number of matrix-vector multiplications that have been computed for achieving convergence,

C P U_{t}

is the CPU time cost (in seconds).

Table 11. Comparative analysis of different methods for solving PageRank problems with

α = 0.99

. Notation:

M V

denotes the number of matrix-vector multiplications that have been computed for achieving convergence,

C P U_{t}

is the CPU time cost (in seconds).

Method	$M V$	${C P U}_{t}$	$M V$	${C P U}_{t}$	$M V$	${C P U}_{t}$
	`web-Stanford`		`web-NotreDame`		`in-2004`
FOM	271	2.10	721	4.87	491	18.70
W-FOM	291	3.16	411	4.12	281	13.77
GG-Arnoldi	421	3.40	621	4.35	461	17.89
PGG-Arnoldi	401	2.12	701	2.58	501	11.38
PW-Arnoldi	401	2.30	551	2.29	401	9.87
Power	1141	5.75	910	3.04	1094	23.78
EXT-Arnoldi	375	2.02	501	2.33	430	11.21
EXT-PArnoldi	415	2.20	451	1.69	400	9.33
MPIO	800	3.99	803	2.65	809	17.20
	`wiki-Talk`		`indochina`		`web-edu`
FOM	61	2.99	641	187.39	621	144.03
W-FOM	61	4.21	551	184.88	601	183.14
GG-Arnoldi	101	5.22	611	175.33	591	138.89
PGG-Arnoldi	101	2.28	651	129.38	701	81.56
PW-Arnoldi	101	2.56	501	104.23	551	70.53
Power	688	15.16	933	179.49	942	104.95
EXT-Arnoldi	252	6.10	453	105.44	459	78.58
EXT-PArnoldi	292	6.71	513	101.77	529	62.76
MPIO	784	16.53	715	135.01	650	70.08

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shen, Z.-L.; Yang, H.; Carpentieri, B.; Gu, X.-M.; Wen, C. A Preconditioned Variant of the Refined Arnoldi Method for Computing PageRank Eigenvectors. Symmetry 2021, 13, 1327. https://doi.org/10.3390/sym13081327

AMA Style

Shen Z-L, Yang H, Carpentieri B, Gu X-M, Wen C. A Preconditioned Variant of the Refined Arnoldi Method for Computing PageRank Eigenvectors. Symmetry. 2021; 13(8):1327. https://doi.org/10.3390/sym13081327

Chicago/Turabian Style

Shen, Zhao-Li, Hao Yang, Bruno Carpentieri, Xian-Ming Gu, and Chun Wen. 2021. "A Preconditioned Variant of the Refined Arnoldi Method for Computing PageRank Eigenvectors" Symmetry 13, no. 8: 1327. https://doi.org/10.3390/sym13081327

APA Style

Shen, Z.-L., Yang, H., Carpentieri, B., Gu, X.-M., & Wen, C. (2021). A Preconditioned Variant of the Refined Arnoldi Method for Computing PageRank Eigenvectors. Symmetry, 13(8), 1327. https://doi.org/10.3390/sym13081327

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Preconditioned Variant of the Refined Arnoldi Method for Computing PageRank Eigenvectors

Abstract

1. Introduction

2. The Refined Arnoldi Method for PageRank

3. Preconditioning the Refined Arnoldi Method

4. Numerical Results

4.1. Performance Analysis of the PGG-Arnoldi Method

4.2. Preconditioning Combined with Weighted Inner-Product

4.3. Comparisons with Other Methods

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI