An optimal randomized cell probe lower bound for approximate nearest neighbor searching

A Chakrabarti, O Regev - SIAM Journal on Computing, 2010 - SIAM
SIAM Journal on Computing, 2010SIAM
We consider the approximate nearest neighbor search problem on the Hamming cube
{0,1\}^d. We show that a randomized cell probe algorithm that uses polynomial storage and
word size d^O(1) requires a worst case query time of Ω(\rmlog\rmlog\,d/\rmlog\rmlog\rmlog\,d
). The approximation factor may be as loose as 2^\rmlog^1-ηd for any fixed η>0. Our result
fills a major gap in the study of this problem since all earlier lower bounds either did not
allow randomization A. Chakrabarti et al., A lower bound on the complexity of approximate …
We consider the approximate nearest neighbor search problem on the Hamming cube . We show that a randomized cell probe algorithm that uses polynomial storage and word size requires a worst case query time of . The approximation factor may be as loose as for any fixed . Our result fills a major gap in the study of this problem since all earlier lower bounds either did not allow randomization [A. Chakrabarti et al., A lower bound on the complexity of approximate nearest-neighbor searching on the Hamming cube, in Discrete and Computational Geometry, Springer, Berlin, 2003, pp. 313–328; D. Liu, Inform. Process. Lett., 92 (2004), pp. 23–29] or did not allow approximation [A. Borodin, R. Ostrovsky, and Y. Rabani, Proceedings of the 31st Annual ACM Symposium on Theory of Computing, 1999, pp. 312–321; O. Barkol and Y. Rabani, Proceedings of the 32nd Annual ACM Symposium on Theory of Computing, 2000, pp. 388–396; T. S. Jayram et al., J. Comput. System Sci., 69 (2004), pp. 435–447]. We also give a cell probe algorithm that proves that our lower bound is optimal. Our proof uses a lower bound on the round complexity of the related communication problem. We show, additionally, that considerations of bit complexity alone cannot prove any nontrivial cell probe lower bound for the problem. This shows that the “richness technique” [P. B. Miltersen et al., J. Comput. System Sci., 57 (1998), pp. 37–49] used in a lot of recent research around this problem would not have helped here. Our proof is based on information theoretic techniques for communication complexity, a theme that has been prominent in recent research [A. Chakrabarti et al., Proceedings of the 42nd Annual IEEE Symposium on Foundations of Computer Science, 2001, pp. 270–278; Z. Bar-Yossef et al., Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002, pp. 209–218; P. Sen, Proceedings of the 18th Annual IEEE Conference on Computational Complexity, 2003, pp. 73–83; R. Jain, J. Radhakrishnan, and P. Sen, Proceedings of the 30th International Colloquium on Automata, Languages and Programming, 2003, pp. 300–315].
Society for Industrial and Applied Mathematics