Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Temporal JSON Keyword Search

Published: 30 May 2024 Publication History

Abstract

JSON keyword search searches the current versions of documents in a collection. However, JSON documents change over time due to edits. Some applications, such as data forensics and auditing, need to search past versions of documents and for changes to documents. This paper introduces a system called Temporal JSON Keyword Search (TJKS) for search in a collection of JSON documents that vary over time. TJKS lets users control which temporal slice, or part of the history, can be searched using a temporal search semantics; we support both of the major temporal semantics: sequenced and nonsequenced search. This paper presents the semantics of temporal JSON keyword search, discusses an efficient implementation, and evaluates the implementation. Our extensions are largely orthogonal to specific keyword search techniques, so this research provides a blueprint for extending keyword search to include time and potentially other kinds of metadata.

References

[1]
[n. d.]. DB-Engines Ranking. https://db-engines.com/en/ranking. Accessed: 2023--10--10.
[2]
[n. d.]. JSONPath. https://github.com/json-path/JsonPath. Accessed: 2023--10--10.
[3]
Manoj K Agarwal and Krithi Ramamritham. 2015. Enabling generic keyword search over raw XML data. In 2015 IEEE 31st International Conference on Data Engineering. 1496--1499. https://doi.org/10.1109/ICDE.2015.7113410
[4]
James F. Allen. 1983. Maintaining Knowledge about Temporal Intervals. Commun. ACM 26, 11 (1983), 832--843. https://doi.org/10.1145/182.358434
[5]
Toshiyuki Amagasa, Masatoshi Yoshikawa, and Shunsuke Uemura. 2000. A Data Model for Temporal XML Documents. In DEXA. 334--344. https://doi.org/10.1007/3--540--44469--6_31
[6]
Avishek Anand, Srikanta J. Bedathur, Klaus Berberich, and Ralf Schenkel. 2010. Efficient temporal keyword search over versioned text. In CIKM. ACM, 699--708. https://doi.org/10.1145/1871437.1871528
[7]
Nikolaus Augsten, Michael H. Böhlen, Curtis E. Dyreson, and Johann Gamper. 2012. Windowed PQ-Grams for Approximate Joins of Data-centric XML. VLDB J. 21, 4 (2012), 463--488. https://doi.org/10.1007/s00778-011-0254--6
[8]
Zhifeng Bao, Jiaheng Lu, Tok Wang Ling, and Bo Chen. 2010. Towards an Effective XML Keyword Search. IEEE TKDE 22, 8 (2010), 1077--1092.
[9]
Rasha Bin-Thalab, Neamat El-Tazi, and Mohamed E. El-Sharkawi. [n. d.]. TMIX: Temporal Model for Indexing XML Documents. IEEE, 1--8. https://doi.org/10.1109/AICCSA.2013.6616483
[10]
Tobias Bleifuß, Leon Bornemann, Theodore Johnson, Dmitri V. Kalashnikov, Felix Naumann, and Divesh Srivastava. 2018. Exploring Change: A New Dimension of Data Analytics. Proc. VLDB Endow. 12, 2 (2018), 85--98. https: //doi.org/10.14778/3282495.3282496
[11]
Michael H Bohlen, Renato Busatto, and Christian S Jensen. 1998. Point-versus Interval-based Temporal Data Models. In Proceedings 14th International Conference on Data Engineering. IEEE, 192--200.
[12]
Michael H. Böhlen and Christian S. Jensen. 2009. Sequenced Semantics. In Encyclopedia of Database Systems. 2619--2621. https://doi.org/10.1007/978-0--387--39940--9_1053
[13]
Michael H. Böhlen, Christian S. Jensen, and Richard T. Snodgrass. 2000. Temporal Statement Modifiers. ACM Trans. Database Syst. 25, 4 (2000), 407--456. http://portal.acm.org/citation.cfm?id=377674.377665
[14]
Michael H Böhlen, Christian S Jensen, and Richard Thomas Snodgrass. 2000. Temporal Statement Modifiers. ACM Transactions on Database Systems (TODS) 25, 4 (2000), 407--456.
[15]
Michael H. Böhlen, Christian S. Jensen, and Richard T. Snodgrass. 2009. Nonsequenced Semantics. In Encyclopedia of Database Systems. 1913--1915. https://doi.org/10.1007/978-0--387--39940--9_1052
[16]
Zouhaier Brahmia, Fabio Grandi, Safa Brahmia, and Rafik Bouaziz. 2021. A Graphical Conceptual Model for Conventional and Time-varying JSON Data. Procedia Computer Science 184 (2021), 823--828. https://doi.org/10.1016/j.procs. 2021.03.102 The 12th International Conference on Ambient Systems, Networks and Technologies (ANT) / The 4th International Conference on Emerging Data and Industry 4.0 (EDI40) / Affiliated Workshops.
[17]
Jan P Buchmann, Mathieu Fourment, and Edward C Holmes. 2018. The Biological Object Notation (BON): a structured file format for biological data. Scientific reports 8, 1 (2018), 1--8.
[18]
Sudarshan S. Chawathe, Serge Abiteboul, and Jennifer Widom. 1998. Representing and Querying Changes in Semistructured Data. In ICDE. 4--13. https://doi.org/10.1109/ICDE.1998.655752
[19]
Cindy Xinmin Chen and Carlo Zaniolo. 2000. SQLST : A Spatio-Temporal Data Model and Query Language. In ER. 96--111. https://doi.org/10.1007/3--540--45393--8_8
[20]
Shu-Yao Chien, Vassilis J. Tsotras, and Carlo Zaniolo. 2002. Efficient Schemes for Managing Multiversion XML Documents. VLDB J. 11, 4 (2002), 332--353. https://doi.org/10.1007/s00778-002-0079--4
[21]
Mohamed L Chouder, Stefano Rizzi, and Rachid Chalal. 2019. EXODuS: exploratory OLAP over document stores. Information Systems 79 (2019), 44--57.
[22]
J. Clifford, C. E. Dyreson, T. Isakowitz, C. S. Jensen, and R. T. Snodgrass. 1997. On the Semantics of ?Now" in Databases. ACM Transactions on Database Systems 22, 2 (1997), 171--214.
[23]
Sara Cohen, Jonathan Mamou, Yaron Kanza, and Yehoshua Sagiv. 2003. XSEarch: A Semantic Search Engine for XML. In VLDB. 45--56.
[24]
Faiz Currim, Sabah Currim, Curtis Dyreson, and Richard T. Snodgrass. 2004. A Tale of Two Schemas: Creating a Temporal XML Schema from a Snapshot Schema with ??XSchema. In EDBT. Vol. 2992. Springer Berlin Heidelberg, 348--365. http://link.springer.com/10.1007/978--3--540--24741--8_21
[25]
Faiz Currim, Sabah Currim, Curtis E. Dyreson, Richard T. Snodgrass, StephenW. Thomas, and Rui Zhang. 2012. Adding Temporal Constraints to XML Schema. IEEE Trans. Knowl. Data Eng. 24, 8 (2012), 1361--1377. https://doi.org/10.1109/ TKDE.2011.74
[26]
Gao Dandan, Wang Xinjun, and Deng Li. [n. d.]. Indexing Temporal XML Using Interval-Tree Index. IEEE, 689--691. https://doi.org/10.1109/CSSE.2008.1223
[27]
Anton Dignös, Michael H. Böhlen, and Johann Gamper. 2012. Temporal Alignment. In SIGMOD. 433--444. https: //doi.org/10.1145/2213836.2213886
[28]
Curtis E. Dyreson and Fabio Grandi. 2009. Temporal XML. In Encyclopedia of Database Systems. 3032--3035.
[29]
Curtis E. Dyreson, Venkata A. Rani, and Amani Shatnawi. 2015. Unifying Sequenced and Non-sequenced Semantics. In TIME. IEEE Computer Society, 38--46. https://doi.org/10.1109/TIME.2015.22
[30]
Vahid Ghadakchi, Abtin Khodadadi, and Arash Termehchy. 2019. Less Data Delivers Higher Search Effectiveness for Keyword Queries. In SSDBM. Association for Computing Machinery, New York, NY, USA, 109--120. https: //doi.org/10.1145/3335783.3335794
[31]
Konstantin Golenberg, Benny Kimelfeld, and Yehoshua Sagiv. 2008. Keyword proximity search in complex data graphs. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08). New York, NY, USA, 927--940. https://doi.org/10.1145/1376616.1376708
[32]
Gang Gou and Rada Chirkova. 2007. Efficiently Querying Large XML Data Repositories: A Survey. IEEE Trans. Knowl. Data Eng. 19, 10 (2007), 1381--1403.
[33]
Aayush Goyal and Curtis Dyreson. 2019. Temporal JSON. In 2019 IEEE 5th International Conference on Collaboration and Internet Computing (CIC). 135--144. https://doi.org/10.1109/CIC48465.2019.00025
[34]
Fabio Grandi. 2010. T-SPARQL: A TSQL2-like Temporal Query Language for RDF. In ADIBIS. 21--30. http://ceurws. org/Vol-639/021-grandi.pdf
[35]
Fabio Grandi, Federica Mandreoli, and Paolo Tiberio. [n. d.]. Temporal Modelling and Management of Normative Documents in XML Format. 54, 3 ([n. d.]), 327--354. https://doi.org/10.1016/j.datak.2004.11.002
[36]
Thomas Hütter, Nikolaus Augsten, Christoph M. Kirsch, Michael J. Carey, and Chen Li. 2022. JEDI: These Aren't the JSON Documents You're Looking For.... In SIGMOD. Association for Computing Machinery, New York, NY, USA, 1584--1597. https://doi.org/10.1145/3514221.3517850
[37]
C. S. Jensen and C. E. Dyreson (editors). 1998. A Consensus Glossary of Temporal Database Concepts - February 1998 Version. In Temporal Databases: Research and Practice, LNCS 1399. Springer-Verlag, 367--405.
[38]
Nattiya Kanhabua and Avishek Anand. 2016. Temporal Information Retrieval. In SIGIR. ACM, 1235--1238. https: //doi.org/10.1145/2911451.2914805
[39]
Lingbo Kong, Rémi Gilleron, and Aurélien Lemay. 2009. Retrieving Meaningful Relaxed Tightest Fragments for XML Keyword Search. In EDBT.
[40]
Krishna G. Kulkarni and Jan-Eike Michels. 2012. Temporal features in SQL: 2011. SIGMOD Record 41, 3 (2012), 34--43. https://doi.org/10.1145/2380776.2380786
[41]
Changqing Li and TokWang Ling. 2005. An Improved Prefix Labeling Scheme: A Binary String Approach for Dynamic Ordered XML. In DASFAA. 125--137.
[42]
Guoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong Wang, and Lizhu Zhou. 2008. EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-Structured and Structured Data. In SIGMOD. Association for Computing Machinery, New York, NY, USA, 903--914. https://doi.org/10.1145/1376616.1376706
[43]
Jianxin Li, Chengfei Liu, Rui Zhou, and Wei Wang. 2010. Suggestion of Promising Result Types for XML Keyword Search. In EDBT. 561--572.
[44]
Yunyao Li, Cong Yu, and H. V. Jagadish. 2004. Schema-Free XQuery. In VLDB. 72--83.
[45]
Chunbin Lin, Jianguo Wang, and Chuitian Rong. 2017. Towards Heterogeneous Keyword Search. In Proceedings of the ACM Turing 50th Celebration Conference - China (ACM TURC '17). Association for Computing Machinery, New York, NY, USA, Article 46, 6 pages. https://doi.org/10.1145/3063955.3064802
[46]
Ziyang Liu and Yi Chen. 2007. Identifying Meaningful Return Information for XML Keyword Search. In SIGMOD. 329--340.
[47]
Ziyang Liu and Yi Chen. 2008. Reasoning and Identifying Relevant Matches for XML Keyword Search. PVLDB 1, 1 (2008), 921--932.
[48]
Ziyang Liu, Chong Wang, and Yi Chen. 2017. Keyword Search on Temporal Graphs. IEEE Trans. Knowl. Data Eng. 29, 8 (2017), 1667--1680. https://doi.org/10.1109/TKDE.2017.2690637
[49]
Vijini Mallawaarachchi, Lakmal Meegahapola, Roshan Madhushanka, Eranga Heshan, Dulani Meedeniya, and Sampath Jayarathna. 2020. Change Detection and Notification of Web Pages: A Survey. ACM Comput. Surv. 53, 1, Article 15 (feb 2020), 35 pages. https://doi.org/10.1145/3369876
[50]
Federica Mandreoli, Riccardo Martoglia, and Enrico Ronchetti. 2006. Supporting Temporal Slicing in XML Databases. In EDBT 2006 (Lecture Notes in Computer Science), Vol. 3896. Springer, 295--312. https://doi.org/10.1007/11687238_20
[51]
Edimar Manica, Carina F. Dorneles, and Renata Galante. [n. d.]. Supporting Temporal Queries on XML Keyword Search Engines. 1, 3 ([n. d.]), 471. https://seer.lcc.ufmg.br/index.php/jidm/article/view/53@misc
[52]
Alberto O. Mendelzon, Flavio Rizzolo, and Alejandro Vaisman. [n. d.]. Indexing temporal XML documents. In VLDB (2004). VLDB Endowment, 216--227. http://dl.acm.org/citation.cfm?id=1316710
[53]
Mehdi Naseriparsa, Md. Saiful Islam, Chengfei Liu, and Irene Moser. 2018. No-but-semantic-match: computing semantically matched xml keyword search results. World Wide Web 21, 5 (2018), 1223--1257. https://doi.org/10.1007/ s11280-017-0503--8
[54]
Flavio Rizzolo and Alejandro A. Vaisman. 2008. Temporal XML: Modeling, Indexing, and Query Processing. VLDB J. 17, 5 (2008), 1179--1212. https://doi.org/10.1007/s00778-007-0058-x
[55]
Simonas Saltenis and Christian S. Jensen. 2002. Indexing of Now-Relative Spatio-Bitemporal Data. VLDB J. 11, 1 (2002), 1--16. https://doi.org/10.1007/s007780100058
[56]
Virginie Sans and Dominique Laurent. 2008. Prefix based numbering schemes for XML: techniques, applications and performances. PVLDB 1, 2 (2008), 1564--1573.
[57]
Richard T. Snodgrass. 1987. The Temporal Query Language TQuel. ACM Trans. Database Syst. 12, 2 (1987), 247--298. https://doi.org/10.1145/22952.22956
[58]
Richard T. Snodgrass (Ed.). 1995. The TSQL2 Temporal Query Language. Kluwer.
[59]
Richard T. Snodgrass, Michael H. Böhlen, Christian S. Jensen, and Andreas Steiner. 1997. Transitioning Temporal Support in TSQL2 to SQL3. In Temporal Databases, Dagstuhl. 150--194. https://doi.org/10.1007/BFb0053702
[60]
Ba Quan Truong, Sourav S. Bhowmick, Curtis E. Dyreson, and Aixin Sun. 2013. MESSIAH: missing element-conscious SLCA nodes search in XML data. In SIGMOD. ACM, 37--48. https://doi.org/10.1145/2463676.2463699
[61]
Alejandro A. Vaisman and Alberto O. Mendelzon. 2001. A Temporal Query Language for OLAP: Implementation and a Case Study. In DBPL. 78--96. https://doi.org/10.1007/3--540--46093--4_5
[62]
Santiago Vargas, Utkarsh Goel, Moritz Steiner, and Aruna Balasubramanian. 2019. Characterizing JSON Traffic Patterns on a CDN. In Proceedings of the Internet Measurement Conference (IMC '19). Association for Computing Machinery, New York, NY, USA, 195--201. https://doi.org/10.1145/3355369.3355594
[63]
Fusheng Wang and Carlo Zaniolo. 2004. XBiT: an XML-based Bitemporal Data Model. In Conceptual Modeling at ER 2004. Springer, 810--824. http://link.springer.com/chapter/10.1007/978--3--540--30464--7_60
[64]
FushengWang and Carlo Zaniolo. 2005. An XML-Based Approach to Publishing and Querying the History of Databases. World Wide Web 8, 3 (2005), 233--259. https://doi.org/10.1007/s11280-005--1317--7
[65]
Jingjing Wang and Shengli Wu. 2017. Information Retrieval with Implicitly Temporal Queries. In IDEAL (Lecture Notes in Computer Science), Vol. 10585. Springer, 103--111. https://doi.org/10.1007/978--3--319--68935--7_12
[66]
Yu Xu and Yannis Papakonstantinou. 2005. Efficient Keyword Search for Smallest LCAs in XML Databases. In SIGMOD. 537--538.
[67]
Li Yan, Ruizhe Ma, and Zhangbing Hu. 2022. A Temporal JSON Data Model and Its Query Languages. J. Database Manage. 33, 1 (may 2022), 1--29. https://doi.org/10.4018/JDM.299556
[68]
Jeffrey Xu Yu, Daofeng Luo, Xiaofeng Meng, and Hongjun Lu. 2005. Dynamically Updating XML Data: Numbering Scheme Revisited. World Wide Web 8, 1 (2005), 5--26.
[69]
Gongsheng Yuan, Jiaheng Lu, and Peifeng Su. 2021. Quantum-Inspired Keyword Search on Multi-model Databases. In Database Systems for Advanced Applications. Springer International Publishing, Cham, 585--602.
[70]
Peisen Yuan, Chaofeng Sha, Xiaoling Wang, Bin Yang, Aoying Zhou, and Su Yang. 2010. XML Structural Similarity Search Using MapReduce. In WAIM. 169--181. https://doi.org/10.1007/978--3--642--14246--8_19
[71]
Feng Zhang, Xinjun Wang, and Shaolong Ma. 2009. Temporal XML Indexing Based on Suffix Tree. IEEE, 140--144. https://doi.org/10.1109/SERA.2009.20
[72]
Junfeng Zhou, Wei Wang, Ziyang Chen, Jeffrey Xu Yu, Xian Tang, Yifei Lu, and Yukun Li. 2016. Top-Down XML Keyword Query Processing. IEEE Transactions on Knowledge and Data Engineering 28, 5 (2016), 1340--1353. https://doi.org/10.1109/TKDE.2016.2516536

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 2, Issue 3
SIGMOD
June 2024
1953 pages
EISSN:2836-6573
DOI:10.1145/3670010
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2024
Published in PACMMOD Volume 2, Issue 3

Author Tags

  1. json
  2. keyword search
  3. sequenced
  4. temporal

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 475
    Total Downloads
  • Downloads (Last 12 months)475
  • Downloads (Last 6 weeks)57
Reflects downloads up to 05 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media