Abstract
We work on converting the metadata of 13 American art museums and archives into Linked Data, to be able to integrate and query the resulting data. While there are many good sources of artist data, no single source covers all artists. We thus address the challenge of building a comprehensive knowledge graph of artists that we can then use to link the data from each of the individual museums. We present a framework to construct and incrementally extend a knowledge graph, describe and evaluate techniques for efficiently building knowledge graphs through the use of the MinHash/LSH algorithm for generating candidate matches, and conduct an evaluation that demonstrates our approach can efficiently and accurately build a knowledge graph about artists.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
Please note that not all of the people in DBpedia and VIAF are artists.
- 4.
The 2-gram of the first name ‘Roy’ consists of {_R, Ro, oy, y_}.
- 5.
The Jaccard similarity between sets S and T is defined as \(\frac{\mid S \cap T \mid }{\mid S \cup T \mid }\).
- 6.
- 7.
- 8.
- 9.
- 10.
For example, the series of workshops on Automated Knowledge Base Construction (AKBC), http://www.akbc.ws/.
- 11.
References
Alexander, G., Ororbia, I., Wu, J., Giles, C.L: CiteSeerX: intelligent information extraction and knowledge creation from web-based data. In: Proceedings of the 4th Workshop on Automated Knowledge Base Construction at NIPS (2014)
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2008)
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E.R.J., Mitchell, T.: Toward an architecture for never-ending language learning. In: Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI) (2010)
Doan, A., Halevy, A., Ives, Z.: Principles of Data Integration. Elsevier, Amsterdam (2012)
Dong, X.L., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014)
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in NLP and Computational Natural Language Learning (EMNLP) (2011)
Fan, J., Ferrucci, D., Gondek, D., Kalyanpur, A.: PRISMATIC: inducing knowledge from a large scale lexicalized relation resource. In: Proceedings of the 1st International Workshop on Formalisms and Methodology for Learning by Reading (2010)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases (1999)
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. J. 194, 28–61 (2013)
Knoblock, C.A., Szekely, P., Ambite, J.L., Goel, A., Gupta, S., Lerman, K., Muslea, M., Taheriyan, M., Mallick, P.: Semi-automatically mapping structured sources into the semantic web. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 375–390. Springer, Heidelberg (2012)
Mausam, Schmitz, M., Bart, R., Soderland, S., Etzioni, O.: Open language learning for information extraction. In: Proceedings of the Conference on Empirical Methods on NLP and Computational Natural Language Learning (EMNLP) (2012)
Pujara, J., Getoor, L.: Building dynamic knowledge graphs. In: Proceedings of the Knowledge Extraction Workshop at NAACL-HLT (2014)
Pujara, J., Miao, H., Getoor, L., Cohen, W.: Knowledge graph identification. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 542–557. Springer, Heidelberg (2013)
Schultz, A., Matteini, A., Isele, R., Mendes, P., Bizer, C., Becker, C.: LDIF - a framework for large-scale linked data integration graphs. In: Proceedings of 21st International Conference on World Wide Web (2012)
Suchanek, F., Kasneci, G., Weikum, G.: YAGO - a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web (2007)
Szekely, P., et al.: Building and using a knowledge graph to combat human trafficking. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 205–221. Springer, Heidelberg (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Gawriljuk, G., Harth, A., Knoblock, C.A., Szekely, P. (2016). A Scalable Approach to Incrementally Building Knowledge Graphs. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2016. Lecture Notes in Computer Science(), vol 9819. Springer, Cham. https://doi.org/10.1007/978-3-319-43997-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-43997-6_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43996-9
Online ISBN: 978-3-319-43997-6
eBook Packages: Computer ScienceComputer Science (R0)