Enumeration, Tagged Unions, Tuples, and Collections: A Novel Approach to Extracting JSON Schema
Resumo
Recently, JSON became a trendy data format for representing datasets. Its success is due to embodying structure and data in the same representation. Moreover, it has a loose structure, i.e., the structure (aka schema) is not rigid. While the absence of a rigid schema brings several advantages, it is impossible to exploit some benefits of knowing the schema in advance, such as query and storage optimization and improving data curation. In this paper, we propose JFUSE, a tool to deal with the problem of discovering a schema from JSON collections. Besides inferring basic types (e.g., atomic types, arrays, and objects), JFUSE also discovers enumeration, tagged unions, metadata as data, objects as collections, and arrays as tuples. We propose a metamodel that can be easily transformed into any schema language (e.g., JSON Schema). Our experiments show that the proposed approach infers concise and correct schemas from (huge) JSON collections.
Palavras-chave:
JSON, schema discovery, schema extraction, metamodel
Referências
Abdelhedi, F., Brahim, A. A., Rajhi, H., Ferhat, R. T., and Zurfluh, G. (2021). Automatic extraction of a document-oriented nosql schema. In ICEIS (1), pages 192–199.
Baazizi, M.-A., Colazzo, D., Ghelli, G., and Sartiani, C. (2019). Parametric schema inference for massive JSON datasets. The VLDB Journal, 28:497–521.
Bouchou, B. and Duarte, D. (2007). Assisting XML schema evolution that preserve validity. In Brazilian Database Symposium, pages 270–284.
Bourhis, P., Reutter, J. L., Suárez, F., and Vrgoč, D. (2017). JSON: data model, query languages and schema specification. In Proceedings of the 36th ACM SIGMOD-SIGACT.
Cánovas Izquierdo, J. L. and Cabot, J. (2013). Discovering implicit schemas in json data. In Web Engineering: 13th International Conference, ICWE 2013, Aalborg, Denmark, July 8-12, 2013. Proceedings 13, pages 68–83. Springer.
Frozza, A. A., dos Santos Mello, R., and da Costa, F. d. S. (2018). An approach for schema extraction of JSON and extended JSON document collections. In IRI. IEEE.
Klessinger, S., Klettke, M., Störl, U., and Scherzinger, S. (2023). Extracting JSON schemas with tagged unions. arXiv preprint arXiv:2306.07085.
Maiwald, B., Riedle, B., and Scherzinger, S. (2019). What are real json schemas like? In International Conference on Conceptual Modeling, pages 95–105. Springer.
Namba, J. (2021). Enhancing JSON schema discovery by uncovering hidden data. In VLDB 2021 PhD Workshop.
Peng, D., Cao, L., and Xu, W. (2011). Using json for data exchanging in web service applications. Journal of Computational Information Systems, 7(16):5883–5890.
Pezoa, F., Reutter, J. L., Suarez, F., Ugarte, M., and Vrgoč, D. (2016). Foundations of json schema. In International World Wide Web Conferences, WWW ’16.
Spoth, W., Kennedy, O., Lu, Y., Hammerschmidt, B., and Liu, Z. H. (2021). Reducing ambiguity in JSON schema discovery. In Proceedings of the 2021 SIGMOD.
Baazizi, M.-A., Colazzo, D., Ghelli, G., and Sartiani, C. (2019). Parametric schema inference for massive JSON datasets. The VLDB Journal, 28:497–521.
Bouchou, B. and Duarte, D. (2007). Assisting XML schema evolution that preserve validity. In Brazilian Database Symposium, pages 270–284.
Bourhis, P., Reutter, J. L., Suárez, F., and Vrgoč, D. (2017). JSON: data model, query languages and schema specification. In Proceedings of the 36th ACM SIGMOD-SIGACT.
Cánovas Izquierdo, J. L. and Cabot, J. (2013). Discovering implicit schemas in json data. In Web Engineering: 13th International Conference, ICWE 2013, Aalborg, Denmark, July 8-12, 2013. Proceedings 13, pages 68–83. Springer.
Frozza, A. A., dos Santos Mello, R., and da Costa, F. d. S. (2018). An approach for schema extraction of JSON and extended JSON document collections. In IRI. IEEE.
Klessinger, S., Klettke, M., Störl, U., and Scherzinger, S. (2023). Extracting JSON schemas with tagged unions. arXiv preprint arXiv:2306.07085.
Maiwald, B., Riedle, B., and Scherzinger, S. (2019). What are real json schemas like? In International Conference on Conceptual Modeling, pages 95–105. Springer.
Namba, J. (2021). Enhancing JSON schema discovery by uncovering hidden data. In VLDB 2021 PhD Workshop.
Peng, D., Cao, L., and Xu, W. (2011). Using json for data exchanging in web service applications. Journal of Computational Information Systems, 7(16):5883–5890.
Pezoa, F., Reutter, J. L., Suarez, F., Ugarte, M., and Vrgoč, D. (2016). Foundations of json schema. In International World Wide Web Conferences, WWW ’16.
Spoth, W., Kennedy, O., Lu, Y., Hammerschmidt, B., and Liu, Z. H. (2021). Reducing ambiguity in JSON schema discovery. In Proceedings of the 2021 SIGMOD.
Publicado
14/10/2024
Como Citar
BANHARA, Natália; SCHREINER, Geomar A.; FEITOSA, Samuel da Silva; DUARTE, Denio.
Enumeration, Tagged Unions, Tuples, and Collections: A Novel Approach to Extracting JSON Schema. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 39. , 2024, Florianópolis/SC.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 234-246.
ISSN 2763-8979.
DOI: https://doi.org/10.5753/sbbd.2024.240239.