Abstract
Many ontologies have been published on the Semantic Web, to be shared to describe resources. Among them, large ontologies of real-world areas have the scalability problem in presenting semantic technologies such as ontology matching (OM). This either suffers from too long run time or has strong hypotheses on the running environment. To deal with this issue, we propose a three-stage MapReduce-based approach V-Doc+ for matching large ontologies, based on the MapReduce framework and virtual document technique. Specifically, two MapReduce processes are performed in the first stage to extract the textual descriptions of named entities (classes, properties, and instances) and blank nodes, respectively. In the second stage, the extracted descriptions are exchanged with neighbors in Resource Description Framework (RDF) graphs to construct virtual documents. This extraction process also benefits from the MapReduce-based implementation. A word-weight-based partitioning method is proposed in the third stage to conduct parallel similarity calculation using the term frequency-inverse document frequency (TF-IDF) model. Experimental results on two large-scale real datasets and the benchmark testbed from Ontology Alignment Evaluation Initiative (OAEI) are reported, showing that the proposed approach significantly reduces the run time with minor loss in precision and recall.
Similar content being viewed by others
References
Bethea, W.L., Fink, C.R., Beecher-Deighan, J.S., 2006. JHU/APL Onto-Mapology Results for OAEI 2006. Proc. ISWC Workshop on Ontology Matching, p.144–152.
Castano, S., Ferrara, A., Messa, G., 2006. Results of the HMatch Ontology Matchmaker in OAEI 2006. Proc. ISWC Workshop on Ontology Matching, p.134–143.
Dean, J., Ghemawat, S., 2008. MapReduce: simplified data processing on large clusters. Commun. ACM, 51(1):107–113. [doi:10.1145/1327452.1327492]
Do, H.H., Rahm, E., 2007. Matching large schemas: approaches and evaluation. Inform. Syst., 32(6):857–885. [doi:10.1016/j.is.2006.09.002]
Euzenat, J., Shvaiko, P., 2007. Ontology Matching. Springer, Heidelberg, Germany. [doi:10.1007/978-3-540-49612-0]
Euzenat, J., Ferrara, A., Meilicke, C., Nikolov, A., Pane, J., Scharffe, F., Shvaiko, P., Stuckenschmidt, H., Šváb-Zamazal, O., Svátek, V., et al., 2010. Results of the Ontology Alignment Evaluation Initiative 2010. Proc. ISWC Workshop on Ontology Matching, p.85–117.
Gross, A., Hartung, M., Kirsten, T., Rahm, E., 2010. On matching large life science ontologies in parallel. LNCS, 6254:35–49. [doi:10.1007/978-3-642-15120-0_4]
Hu, W., Qu, Y.Z., Cheng, G., 2008. Matching large ontologies: a divide-and-conquer approach. Data Knowl. Eng., 67(1): 140–160. [doi:10.1016/j.datak.2008.06.003]
Kotis, K., Valarakos, A.G., Vouros, G.A., 2006. AUTOMS: Automated Ontology Mapping Through Synthesis of Methods. Proc. ISWC Workshop on Ontology Matching, p.96–106.
Li, J.Z., Tang, J., Li, Y., Luo, Q., 2009. RiMOM: a dynamic multistrategy ontology alignment framework. IEEE Trans. Knowl. Data Eng., 21(8):1218–1232. [doi:10.1109/TKDE.2008.202]
Mao, M., Peng, Y.F., Spring, M., 2010. An adaptive ontology mapping approach with neural network based constraint satisfaction. Web Semant., 8(1):14–25. [doi:10.1016/j.websem.2009.11.002]
Mork, P., Bernstein, P., 2004. Adapting a Generic Match Algorithm to Align Ontologies of Human Anatomy. Proc. 20th Int. Conf. on Data Engineering, p.787–790. [doi:10.1109/ICDE.2004.1320047]
Nagy, M., Vargas-Vera, M., 2011. Multi-agent ontology mapping framework for the semantic Web. IEEE Trans. Syst. Man Cybern. A, 41(4):693–704. [doi:10.1109/TSMCA.2011.2132704]
Qu, Y.Z., Hu, W., Cheng, G., 2006. Constructing Virtual Documents for Ontology Matching. Proc. 15th Int. Conf. on World Wide Web, p.23–31. [doi:10.1145/1135777.1135786]
Rahm, E., 2011. Towards Large-Scale Schema and Ontology Matching. In: Bellahsene, Z., Bonifati, A., Rahm, E. (Eds.), Schema Matching and Mapping. Springer, Heidelberg, Germany, p.3–27. [doi:10.1007/978-3-642-16518-4_1]
Rosse, C., Mejino, J.L.V., 2008. The foundational model of anatomy ontology. Comput. Biol., 6(1):59–117. [doi:10.1007/978-1-84628-885-2_4]
Salton, G., McGill, M.J., 1986. Introduction to Modern Information Retrieval. McGraw-Hill, NY, USA.
Shvaiko, P., Euzenat, J., 2008. Ten challenges for ontology matching. LNCS, 5332:1164–1182. [doi:10.1007/978-3-540-88873-4_18]
van Hage, W.R., Sini, M., Finch, L., Kolb, H., Schreiber, G., 2010. The OAEI food task: an analysis of a thesaurus alignment task. Appl. Ontol., 5(1):1–28. [doi:10.3233/AO-2010-0072]
Vernica, R., Carey, M., Li, C., 2010. Efficient Parallel Set-Similarity Joins Using MapReduce. Proc. Int. Conf. on Management of Data, p.495–506. [doi:10.1145/1807167.1807222]
Wang, P., Zhou, Y.M., Xu, B.W., 2011. Matching Large Ontologies Based on Reduction Anchors. Proc. 22nd Int. Joint Conf. on Artificial Intelligence, p.2343–2348. [doi:10.5591/978-1-57735-516-8/IJCAI11-390]
Watters, C., 1999. Information retrieval and the virtual document. J. Am. Soc. Inform. Sci., 50(11):1028–1029. [doi:10.1002/(SICI)1097-4571(1999)50:11〈1028::AID-ASI8〉3.0.CO;2-0]
Zhang, H., Hu, W., Qu, Y.Z., 2011. Constructing virtual documents for ontology matching using MapReduce. LNCS, 7185:48–63.
Author information
Authors and Affiliations
Corresponding author
Additional information
Project supported by the National Natural Science Foundation of China (No. 61003018), the Natural Science Foundation of Jiangsu Province, China (No. BK2011189), and the National Social Science Foundation of China (No. 11AZD121)
Rights and permissions
About this article
Cite this article
Zhang, H., Hu, W. & Qu, Yz. VDoc+: a virtual document based approach for matching large ontologies using MapReduce. J. Zhejiang Univ. - Sci. C 13, 257–267 (2012). https://doi.org/10.1631/jzus.C1101007
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/jzus.C1101007