Adaptive on-line page importance computation
S Abiteboul, M Preda, G Cobena - Proceedings of the 12th international …, 2003 - dl.acm.org
S Abiteboul, M Preda, G Cobena
Proceedings of the 12th international conference on World Wide Web, 2003•dl.acm.orgThe computation of page importance in a huge dynamic graph has recently attracted a lot of
attention because of the web. Page importance, or page rank is defined as the fixpoint of a
matrix equation. Previous algorithms compute it off-line and require the use of a lot of extra
CPU as well as disk resources (eg to store, maintain and read the link matrix). We introduce
a new algorithm OPIC that works on-line, and uses much less resources. In particular, it
does not require storing the link matrix. It is on-line in that it continuously refines its estimate …
attention because of the web. Page importance, or page rank is defined as the fixpoint of a
matrix equation. Previous algorithms compute it off-line and require the use of a lot of extra
CPU as well as disk resources (eg to store, maintain and read the link matrix). We introduce
a new algorithm OPIC that works on-line, and uses much less resources. In particular, it
does not require storing the link matrix. It is on-line in that it continuously refines its estimate …
The computation of page importance in a huge dynamic graph has recently attracted a lot of attention because of the web. Page importance, or page rank is defined as the fixpoint of a matrix equation. Previous algorithms compute it off-line and require the use of a lot of extra CPU as well as disk resources (e.g. to store, maintain and read the link matrix). We introduce a new algorithm OPIC that works on-line, and uses much less resources. In particular, it does not require storing the link matrix. It is on-line in that it continuously refines its estimate of page importance while the web/graph is visited. Thus it can be used to focus crawling to the most interesting pages. We prove the correctness of OPIC. We present Adaptive OPIC that also works on-line but adapts dynamically to changes of the web. A variant of this algorithm is now used by Xyleme.We report on experiments with synthetic data. In particular, we study the convergence and adaptiveness of the algorithms for various scheduling strategies for the pages to visit. We also report on experiments based on crawls of significant portions of the web.
ACM Digital Library