Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Combining user and database perspective for solving keyword queries over relational databases

Published: 01 January 2016 Publication History

Abstract

Over the last decade, keyword search over relational data has attracted considerable attention. A possible approach to face this issue is to transform keyword queries into one or more SQL queries to be executed by the relational DBMS. Finding these queries is a challenging task since the information they represent may be modeled across different tables and attributes. This means that it is needed to identify not only the schema elements where the data of interest is stored, but also to find out how these elements are interconnected. All the approaches that have been proposed so far provide a monolithic solution. In this work, we, instead, divide the problem into three steps: the first one, driven by the user's point of view, takes into account what the user has in mind when formulating keyword queries, the second one, driven by the database perspective, considers how the data is represented in the database schema. Finally, the third step combines these two processes. We present the theory behind our approach, and its implementation into a system called QUEST (QUEry generator for STructured sources), which has been deeply tested to show the efficiency and effectiveness of our approach. Furthermore, we report on the outcomes of a number of experimental results that we have conducted. HighlightsA three step approach for solving keyword queries in structured databases is proposed.The forward step exploits heuristic rules and machine learning techniques.The backward step models the database with a Steiner Tree.A probabilistic framework based on the Dempster Shafer Theory combines the results.An extensive set of experiments offers a deep understanding of the whole process.

References

[1]
J.X. Yu, L. Qin, L. Chang, Keyword search in databases, in: Synthesis Lectures on Data Management, Morgan & Claypool Pub., San Rafael, California (USA), 2010.
[2]
V. Hristidis, Y. Papakonstantinou, Discover: keyword search in relational databases, in: Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002, August 20-23, Hong Kong, China, 2002, pp. 670-681.
[3]
S. Agrawal, S. Chaudhuri, G. Das, Dbxplorer: a system for keyword-based search over relational databases, in: Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, February 26-March 1, IEEE Computer Society, Washington, DC 20036-4928, 2002, pp. 5-16.
[4]
B. Aditya, G. Bhalotia, S. Chakrabarti, A. Hulgeri, C. Nakhe, Parag, S. Sudarshan, Banks: browsing and keyword searching in relational databases, in: Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002, August 20-23, Hong Kong, China, 2002, pp. 1083-1086.
[5]
Y. Luo, X. Lin, W. Wang, X. Zhou, Spark: top-k keyword query in relational databases, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, China, June 12-14, ACM, New York, New York 10121, 2007, pp. 115-126.
[6]
S. Tata, G.M. Lohman, SQAK: doing more with keywords, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10-12, ACM, New York, New York 10121, 2008, pp. 889-902.
[7]
F. Liu, C.T. Yu, W. Meng, A. Chowdhury, Effective keyword search in relational databases, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, June 27-29, 2006, pp. 563-574.
[8]
L. Qin, J.X. Yu, L. Chang, Keyword search in databases: the power of rdbms, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29-July 2, ACM, New York, New York 10121, 2009, pp. 681-694.
[9]
A. Simitsis, G. Koutrika, Y.E. Ioannidis, Precis: from unstructured keywords as queries to structured databases as answers, VLDB J. 17 (1) (2008) 117-149.
[10]
T. Tran, H. Wang, S. Rudolph, P. Cimiano, Top-k exploration of query candidates for efficient keyword search on graph-shaped (rdf) data, in: Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, March 29-April 2, Shanghai, China, IEEE Computer Society, Washington, DC 20036-4928, 2009, pp. 405-416.
[11]
V.S. Uren, Y. Lei, E. Motta, Semsearch: Refining semantic search, in: The Semantic Web: Research and Applications, Proceedings of the 5th European Semantic Web Conference, ESWC 2008, Tenerife, Canary Islands, Spain, June 1 -5, Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 2008, pp. 874-878.
[12]
Q. Zhou, C. Wang, M. Xiong, H. Wang, Y. Yu, Spark: adapting keyword query to semantic search, in: 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11-15, 2007, pp. 694-707.
[13]
S. Bergamaschi, E. Domnori, F. Guerra, R. Trillo-Lado, Y. Velegrakis, Keyword search over relational databases: a metadata approach, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12-16, ACM, New York, New York 10121, 2011, pp. 565-576.
[14]
S. Bergamaschi, E. Domnori, F. Guerra, M. Orsini, R. Trillo-Lado, Y. Velegrakis, Keymantic: semantic keyword-based searching in data integration systems, Proc. VLDB Endow. 3 (2) (2010) 1637-1640.
[15]
S. Bergamaschi, F. Guerra, S. Rota, Y. Velegrakis, A hidden Markov model approach to keyword-based search over relational databases, in: Proceedings of the 30th International Conference on Conceptual Modeling, ER 2011, Brussels, Belgium, October 31-November 3, Lecture Notes in Computer Science, vol. 6998, Springer, Berlin, Heidelberg, 2011, pp. 411-420.
[16]
S. Rota, S. Bergamaschi, F. Guerra, The list viterbi training algorithm and its application to keyword search over databases, in: Proceedings of the 20th ACM Conference on Information and Knowledge Management, CIKM 2011, Glasgow, United Kingdom, October 24-28, ACM, New York, New York 10121, 2011, pp. 1601-1606.
[17]
S. Bergamaschi, F. Guerra, M. Interlandi, R. Trillo-Lado, Y. Velegrakis, QUEST: a keyword search system for relational data based on semantic and machine learning techniques, Proc. VLDB Endow. 6 (12) (2013) 1222-1225.
[18]
S. Abiteboul, R. Hull, V. Vianu, Foundations of Databases, Addison-Wesley, Boston, USA, 1995.
[19]
L. Popa, Y. Velegrakis, R.J. Miller, M.A. Hernandez, R. Fagin, Translating web data, in: Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002, August 20-23, Hong Kong, China, 2002, pp. 598-609.
[20]
R. Fagin, L.M. Haas, M.A. Hernández, R.J. Miller, L. Popa, Y. Velegrakis, Clio: schema mapping creation and data exchange, in: Conceptual Modeling: Foundations and Applications - Essays in Honor of John Mylopoulos, Lecture Notes in Computer Science, vol. 5600, Springer, Berlin, Heidelberg, 2009, pp. 198-236.
[21]
R. Kumar, A. Tomkins, A characterization of online search behavior, IEEE Data Eng. Bull. 32 (2) (2009) 3-11.
[22]
A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.: Ser. B 39 (1977) 1-38.
[23]
J.M. Kleinberg, Authoritative sources in a hyperlinked environment, J. ACM 46 (September (5)) (1999) 604-632.
[24]
B. Ding, J.X. Yu, S. Wang, L. Qin, X. Zhang, X. Lin, Finding top-k min-cost connected trees in databases, in: Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, April 15-20, IEEE, Washington, DC 20036-4928, 2007, pp. 836-845.
[25]
X. Yang, C.M. Procopiuc, D. Srivastava, Summary graphs for relational database schemas, Proc. VLDB Endow. 4 (11) (2011) 899-910.
[26]
K. Golenberg, B. Kimelfeld, Y. Sagiv, Keyword proximity search in complex data graphs, in: SIGMOD, New York, NY, USA, ACM, New York, New York 10121, 2008, pp. 927-940.
[27]
R. Fagin, R. Kumar, M. Mahdian, D. Sivakumar, E. Vee, Comparing and aggregating rankings with ties, in: Proceedings of the 23rd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 14-16, Paris, France, 2004, pp. 47-58.
[28]
L. Liu, R.R. Yager, Classic works of the Dempster-Shafer theory of belief functions: an introduction, in: Classic Works of the Dempster-Shafer Theory of Belief Functions, Studies in Fuzziness and Soft Computing, Springer, Berlin, Heidelberg, 2008, vol. 219, pp. 1-34.
[29]
D. Mottin, M. Lissandrini, Y. Velegrakis, T. Palpanas, Exemplar queries: give me an example of what you need, Proc. VLDB Endow. 7 (5) (2014) 365-376.
[30]
A. Bonifati, Y. Velegrakis, Schema matching and mapping: from usage to evaluation, in: Proceedings of the 14th International Conference on Extending Database Technology, EDBT 2011, Uppsala, Sweden, March 21-24, 2011, pp. 527-529.
[31]
H. He, H. Wang, J. Yang, P.S. Yu, Blinks: ranked keyword searches on graphs, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, China, June 12-14, ACM, New York, New York 10121, 2007, pp. 305-316.
[32]
G. Kasneci, M. Ramanath, M. Sozio, F.M. Suchanek, G. Weikum, Star: Steiner-tree approximation in relationship graphs, in: Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, Ontario, Canada, October 26-30, IEEE, New York, New York 10121, 2009, pp. 868-879.
[33]
J. Fan, G. Li, L. Zhou, Interactive sql query suggestion: making databases user-friendly, in: Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11-16, Hannover, Germany, IEEE Computer Society, Washington, DC 20036-4928, 2011, pp. 351-362.
[34]
G. Li, J. Fan, H. Wu, J. Wang, J. Feng, Dbease: making databases user-friendly and easily accessible, in: Online Proceedings of the 5th Biennial Conference on Innovative Data Systems Research, CIDR 2011, Asilomar, CA, USA, January 9-12, 2011, pp. 45-56.
[35]
L. Blunschi, C. Jossen, D. Kossmann, M. Mori, K. Stockinger, Soda: generating sql for business users, Proc. VLDB Endow. 5 (10) (2012) 932-943.
[36]
D. Braga, A. Campi, S. Ceri, XQBE (XQuery By Example): a visual interface to the standard XML query language, ACM Trans. Database Syst. 30 (2) (2005) 398-443.
[37]
D. Mottin, A. Marascu, S.B. Roy, G. Das, T. Palpanas, Y. Velegrakis, A probabilistic optimization framework for the empty-answer problem, Proc. VLDB Endow. 6 (14) (2013) 1762-1773.
[38]
C. Mishra, N. Koudas, Interactive query refinement, in: M.L. Kersten, B. Novikov, J. Teubner, V. Polutin, S. Manegold (Eds.), Proceedings of the 12th International Conference on Extending Database Technology, EDBT 2009, Saint Petersburg, Russia, March 24-26, ACM International Conference Proceeding Series, vol. 360, ACM, New York, New York 10121, 2009, pp. 862-873.
[39]
J. Coffman, A.C. Weaver, An empirical performance evaluation of relational keyword search techniques, IEEE Trans. Knowl. Data Eng. 26 (1) (2014) 30-42.
[40]
J. Coffman, A.C. Weaver, A framework for evaluating database keyword search strategies, in: Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, Ontario, Canada, October 26-30, ACM, New York, New York 10121, 2010, pp. 729-738.
[41]
W. Webber, Evaluating the effectiveness of keyword search, IEEE Data Eng. Bull. 33 (1) (2010) 55-60.
[42]
S. Bergamaschi, N. Ferro, F. Guerra, G. Silvello, Keyword search and evaluation over relational databases: an outlook to the future, in: 7th International Workshop on Ranking in Databases, DBRank 2013, Riva del Garda, Italy, August 30, ACM, New York, New York 10121, 2013, p. 8.

Cited By

View all
  • (2024)A Technical/Scientific Document Management PlatformNatural Scientific Language Processing and Research Knowledge Graphs10.1007/978-3-031-65794-8_9(134-146)Online publication date: 26-May-2024
  • (2023)Towards Natural Language Interfaces for Data Visualization: A SurveyIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.314800729:6(3121-3144)Online publication date: 1-Jun-2023
  • (2022)A comparative survey of recent natural language interfaces for databasesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00567-828:5(793-819)Online publication date: 11-Mar-2022
  • Show More Cited By
  1. Combining user and database perspective for solving keyword queries over relational databases

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Information Systems
    Information Systems  Volume 55, Issue C
    January 2016
    73 pages

    Publisher

    Elsevier Science Ltd.

    United Kingdom

    Publication History

    Published: 01 January 2016

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Technical/Scientific Document Management PlatformNatural Scientific Language Processing and Research Knowledge Graphs10.1007/978-3-031-65794-8_9(134-146)Online publication date: 26-May-2024
    • (2023)Towards Natural Language Interfaces for Data Visualization: A SurveyIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.314800729:6(3121-3144)Online publication date: 1-Jun-2023
    • (2022)A comparative survey of recent natural language interfaces for databasesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00567-828:5(793-819)Online publication date: 11-Mar-2022
    • (2021)Keyword search over schema-less RDF datasets by SPARQL query compilationInformation Systems10.1016/j.is.2021.101814102:COnline publication date: 1-Dec-2021
    • (2020)State of the Art and Open Challenges in Natural Language Interfaces to DataProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3383128(2629-2636)Online publication date: 11-Jun-2020
    • (2018)QUIOW: A Keyword-Based Query Processing Tool for RDF Datasets and Relational DatabasesDatabase and Expert Systems Applications10.1007/978-3-319-98812-2_22(259-269)Online publication date: 3-Sep-2018
    • (2016)Entity-Based Keyword Search in Web DocumentsTransactions on Computational Collective Intelligence XXI - Volume 963010.5555/3090176.3090178(21-49)Online publication date: 1-Jan-2016
    • (2016)QueryGenInformation Sciences: an International Journal10.1016/j.ins.2015.09.013329:C(412-433)Online publication date: 1-Feb-2016

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media