Abstract
Controlled vocabularies have proved to be critical for data interoperability and accessibility. In the cultural heritage (CH) domain, description of artworks are often given as free text, thus making filtering and searching burdensome (e.g. listing all artworks of a specific type). Despite being multi-language and quite detailed, the Getty’s Art & Architecture Thesaurus –a de facto standard for describing artworks– has a low coverage for languages different than English and sometimes does not reach the required degree of granularity to describe specific niche artworks. We build upon the Italian Vocabulary of Artworks, developed by the Italian Ministry of Cultural Heritage (MIC) and a set of free text descriptions from ArCO, the knowledge graph of the Italian CH, to propose an extension of the Vocabulary of Artworks and align it to the Getty’s thesaurus. Our framework relies on text matching and natural language processing tools for suggesting candidate alignments between free text and terms and between cross-vocabulary terms, with a human in the loop for validation and refinement. We produce 1.166 new terms (31% more w.r.t. the original vocabulary) and 1.330 links to the Getty’s thesaurus, with estimated coverage of 21%.
This work was supported by the project POR FESR Lazio 2014–2020: “ReAD - Representation of Architectural Data”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
http://www.iccd.beniculturali.it/it/sigecweb.
- 13.
- 14.
- 15.
- 16.
References
Aloia, N., et al.: Enabling european archaeological research: the ariadne e-infrastructure. Internet Archaeol. 43 (2017)
Binding, C., Tudhope, D.: Improving interoperability using vocabulary linked data. Int. J. Digit. Libr. 17(1), 5–21 (2015). https://doi.org/10.1007/s00799-015-0166-y
Carriero, V.A., et al.: ArCo: the italian cultural heritage knowledge graph. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 36–52. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_3
Cobb, J.: The journey to linked open data: the getty vocabularies. J. Libr. Metadata 15(3–4), 142–156 (2015)
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019)
Euzenat, J., Shvaiko, P.: Ontology matching. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38721-0
Fellbaum, C.: Wordnet: An electronic lexical database: Bradford book. MIT Press, Cambridge (1998)
Feng, F.a.o.: Language-agnostic bert sentence embedding. arXiv preprint arXiv:2007.01852 (2020)
Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 22–30. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24775-3_5
Golub, K., et al.: Automated classification of textual documents based on a controlled vocabulary in engineering. KO 34(4), 247–263 (2007)
Hakak, S., et al.: Exact string matching algorithms: survey, issues, and future research directions. IEEE Access 7, 69614–69637 (2019)
Harpring, P.: Introduction to controlled vocabularies: terminology for art, architecture, and other cultural works. Getty Publications (2010)
Levenshtein, V.I., et al.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady, vol. 10, pp. 707–710. Soviet Union (1966)
Liu, Y., et al.: Multilingual denoising pre-training for neural machine translation. Trans. Assoc. Comput. Linguist. 8, 726–742 (2020)
Luan, Y., et al.: Sparse, dense, and attentional representations for text retrieval. Trans. Assoc. Comput. Linguist. 9, 329–345 (2021)
Morshed, A.u., Sini, M.: Creating and aligning controlled vocabularies. In: Workshop on AT4DL 2009, p. 50 (2009)
Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019)
Tordai, A., et al.: Aligning large skos-like vocabularies: Two case studies. In: ESWC (2010)
Vrandečić, D.: Wikidata: A new platform for collaborative data collection. In: Proceedings of the 21st International Conference on World Wide Web, pp. 1063–1064 (2012)
Zad, S., et al.: A survey of deep learning methods on semantic similarity and sentence modeling. In: 12th IEMCON, pp. 0466–0472. IEEE (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Bulla, L. et al. (2022). Developing and Aligning a Detailed Controlled Vocabulary for Artwork. In: Chiusano, S., et al. New Trends in Database and Information Systems. ADBIS 2022. Communications in Computer and Information Science, vol 1652. Springer, Cham. https://doi.org/10.1007/978-3-031-15743-1_48
Download citation
DOI: https://doi.org/10.1007/978-3-031-15743-1_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15742-4
Online ISBN: 978-3-031-15743-1
eBook Packages: Computer ScienceComputer Science (R0)