An instant and accurate size estimation method for joins and selections in a retrieval-intensive environment
W Sun, Y Ling, N Rishe, Y Deng - ACM SIGMOD Record, 1993 - dl.acm.org
W Sun, Y Ling, N Rishe, Y Deng
ACM SIGMOD Record, 1993•dl.acm.orgThis paper proposes a novel strategy for estimating the size of the resulting relation after an
equi-join and selection using a regression model. An approximating series representing the
underlying data distribution and dependency is derived from the actual data. The proposed
method provides an instant and accurate size estimation by performing an evaluation of the
series, with no run-time overheads in page faults and space, and with negligible CPU
overhead. In contrast, the popular sampling methods incur run-time overheads in page faults …
equi-join and selection using a regression model. An approximating series representing the
underlying data distribution and dependency is derived from the actual data. The proposed
method provides an instant and accurate size estimation by performing an evaluation of the
series, with no run-time overheads in page faults and space, and with negligible CPU
overhead. In contrast, the popular sampling methods incur run-time overheads in page faults …
This paper proposes a novel strategy for estimating the size of the resulting relation after an equi-join and selection using a regression model. An approximating series representing the underlying data distribution and dependency is derived from the actual data. The proposed method provides an instant and accurate size estimation by performing an evaluation of the series, with no run-time overheads in page faults and space, and with negligible CPU overhead. In contrast, the popular sampling methods incur run-time overheads in page faults (for sampling), CPU time and space. These overheads of sampling methods increase the response time of processing a query. The results of a comprehensive experimental study are also reported, which demonstrate that the estimation accuracy by the proposed method is comparable with that of the sampling methods which are believed to provide the most accurate estimation. The proposed method seems ideal for retrieval-intensive database and information systems. Since the overheads involved in deriving the approximating series are fairly moderate, we believe that this method is also an extremely competent method when moderate or periodical updates are present.
ACM Digital Library