Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Relaxation in text search using taxonomies

Published: 01 August 2008 Publication History

Abstract

In this paper we propose a novel document retrieval model in which text queries are augmented with multi-dimensional taxonomy restrictions. These restrictions may be relaxed at a cost to result quality. This new model may be applicable in many arenas, including multifaceted, product, and local search, where documents are augmented with hierarchical metadata such as topic or location. We present efficient algorithms for indexing and query processing in this new retrieval model. We decompose query processing into two sub-problems: first, an online search problem to determine the correct overall level of relaxation cost that must be incurred to generate the top k results; and second, a budgeted relaxation search problem in which all results at a particular relaxation cost must be produced at minimal cost. We show the latter problem is solvable exactly in two hierarchical dimensions, is NP-hard in three or more dimensions, but admits efficient approximation algorithms with provable guarantees. We present experimental results evaluating our algorithms on both synthetic and real data, showing order of magnitude improvements over the baseline algorithm.

References

[1]
R. Agrawal, A. Gupta, and S. Sarawagi. Modeling multidimensional databases. In Proc. 13th ICDE, pages 232--243, 1997.
[2]
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In Proc. 18th ICDE, pages 5--16, 2002.
[3]
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999.
[4]
Y. Bartal. On approximating arbitrary metrics by tree metrics. In Proc. 30th ACM STOC, pages 161--168, 1998.
[5]
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. WWW/Computer Networks, 30(1--7):107--117, 1998.
[6]
A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Y. Zien. Efficient query evaluation using a two-level retrieval process. In Proc. 12th ACM CIKM, pages 426--434, 2003.
[7]
H. Bronnimann and M. T. Goodrich. Almost optimal set covers in finite VC-dimension. In Proc. 10th ACM SoCG, pages 293--302, 1994.
[8]
D. Burdick, P. M. Deshpande, T. S. Jayram, R. Ramakrishnan, and S. Vaithyanathan. OLAP over uncertain and imprecise data. In Proc. 31st VLDB, pages 970--981, 2005.
[9]
Y.-Y. Chen, T. Suel, and A. Markowetz. Efficient query processing in geographic web search engines. In Proc. ACM SIGMOD, pages 277--288, New York, NY, USA, 2006. ACM.
[10]
K. L. Clarkson and K. Varadarajan. Improved approximation algorithms for geometric set cover. In Proc. 21st ACM SoCG, pages 135--141, 2005.
[11]
W. F. Cody, J. T. Kreulen, V. Krishna, and W. S. Spangler. The integration of business intelligence and knowledge management. IBM Systems Journal, 41(4):697--713, 2002.
[12]
G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava. Diamond in the rough: Finding hierarchical heavy hitters in multi-dimensional data. In Proc. ACM SIGMOD, pages 155--166, 2004.
[13]
R. Fagin, R. Guha, R. Kumar, J. Novak, D. Sivakumar, and A. Tomkins. Multi-structural databases. In Proc. 24th PODS, pages 184--195, 2005.
[14]
R. Fagin, P. Kolaitis, R. Kumar, J. Novak, D. Sivakumar, and A. Tomkins. Efficient implementation of large-scale multi-structural databases. In Proc. 31st VLDB, pages 958--969, 2005.
[15]
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. JCSS, 66(4):614--656, 2003.
[16]
D. Florescu, D. Kossmann, and I. Manolescu. Integrating keyword search into XML query processing. Computer Networks, 33(1--6):119--135, 2000.
[17]
M. Fontoura, V. Josifovski, E. Shekita, and B. Yang. Optimizing cursor movement in holistic twig joins. In Proc. 14th ACM CIKM, pages 784--791, 2005.
[18]
M. Fontoura, E. J. Shekita, J. Y. Zien, S. Rajagopalan, and A. Neumann. High performance index build algorithms for intranet search engines. In Proc. 30th VLDB, pages 1158--1169, 2004.
[19]
R. J. Fowler, M. Paterson, and S. L. Tanimoto. Optimal packing and covering in the plane are NP-complete. IPL, 12(3):133--137, 1981.
[20]
H. Garcia-Molina, J. Ullman, and J. Widom. Database System Implementation. Prentice Hall, 2000.
[21]
J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. DMKD, 1(1):29--53, 1997.
[22]
D. Gruhl, L. Chavet, D. Gibson, J. Meyer, P. Pattanayak, A. Tomkins, and J. Y. Zien. How to build a WebFountain: An architecture for large-scale text analytics. IBM Systems Journal, 43(1):64--77, 2004.
[23]
S. Heinz and J. Zobel. Efficient single-pass index construction for text databases. JASIST, 54(8), 2003.
[24]
V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-style keyword search over relational databases. In Proc. 29th VLDB, pages 850--861, 2003.
[25]
H. V. Jagadish, L. S. Lakshmanan, and D. Srivastava. What can hierarchies do for data warehouses? In Proc. 25th VLDB, pages 530--541, 1999.
[26]
V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In Proc. 31st VLDB, pages 505--516, 2005.
[27]
E. Kandogan, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar semantic search: A database approach to information retrieval. In Proc. ACM SIGMOD, pages 790--792, 2006.
[28]
D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. JMLR, 5:361--397, 2004.
[29]
X. Long and T. Suel. Optimized query execution in large search engines with global page ordering. In Proc. 29th VLDB, pages 129--140, 2003.
[30]
S. Melnik, S. Raghavan, B. Yang, and H. Garcia-Molina. Building a distributed full-text index for the web. In Proc. 10th WWW, pages 396--406, 2001.
[31]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
[32]
H. Turtle and J. Flood. Query evaluation: strategies and optimizations. IPM, 31(6), 1995.
[33]
I. Witten, A. Moffat, and T. Bell. Managing Gigabytes. Morgan Kaufmann, 1999.
[34]
Y. Xu and Y. Papakonstantinou. Efficient keyword search for smallest LCAs in XML databases. In Proc. ACM SIGMOD, pages 537--538, 2005.
[35]
K.-P. Yee, K. Swearingen, K. Li, and M. Hearst. Faceted metadata for image search and browsing. In Proc. ACM CHI, pages 401--408, 2003.
[36]
X. Zhou, J. Gaugaz, W.-T. Balke, and W. Nejdl. Query relaxation using malleable schemas. In Proc. ACM SIGMOD, pages 545--556, 2007.

Cited By

View all
  • (2018)Augmented keyword search on spatial entity databasesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-018-0497-627:2(225-244)Online publication date: 1-Apr-2018
  • (2014)Taxonomy-based relaxation of query answering in relational databasesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-013-0350-x23:5(747-769)Online publication date: 1-Oct-2014
  • (2013)Search and Graphical Visualization of Concepts in Document Collections Using TaxonomiesProceedings of the 2013 46th Hawaii International Conference on System Sciences10.1109/HICSS.2013.472(1429-1434)Online publication date: 7-Jan-2013
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 1, Issue 1
August 2008
1216 pages

Publisher

VLDB Endowment

Publication History

Published: 01 August 2008
Published in PVLDB Volume 1, Issue 1

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Augmented keyword search on spatial entity databasesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-018-0497-627:2(225-244)Online publication date: 1-Apr-2018
  • (2014)Taxonomy-based relaxation of query answering in relational databasesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-013-0350-x23:5(747-769)Online publication date: 1-Oct-2014
  • (2013)Search and Graphical Visualization of Concepts in Document Collections Using TaxonomiesProceedings of the 2013 46th Hawaii International Conference on System Sciences10.1109/HICSS.2013.472(1429-1434)Online publication date: 7-Jan-2013
  • (2012)Rewriting null e-commerce queries to recommend productsProceedings of the 21st International Conference on World Wide Web10.1145/2187980.2187989(73-82)Online publication date: 16-Apr-2012
  • (2011)Efficient query rewrite for structured web queriesProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063981(2417-2420)Online publication date: 24-Oct-2011
  • (2011)Evolutionary taxonomy construction from dynamic tag spaceWorld Wide Web10.1007/s11280-011-0150-415:5-6(581-602)Online publication date: 7-Dec-2011
  • (2010)Evolutionary taxonomy construction from dynamic tag spaceProceedings of the 11th international conference on Web information systems engineering10.5555/1991336.1991350(105-119)Online publication date: 12-Dec-2010
  • (2010)Querying databases with taxonomiesProceedings of the 29th international conference on Conceptual modeling10.5555/1929757.1929793(377-390)Online publication date: 1-Nov-2010
  • (2010)Supporting multiple paths to objects in information hierarchiesInformation Processing and Management: an International Journal10.1016/j.ipm.2009.06.00746:1(22-43)Online publication date: 1-Jan-2010
  • (2010)Evolutionary Taxonomy Construction from Dynamic Tag Space11th International Conference on Web Information Systems Engineering --- WISE 2010 - Volume 648810.1007/978-3-642-17616-6_11(105-119)Online publication date: 12-Dec-2010
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media