Abstract
Linked Open Data comprises very many and often large public data sets, which are mostly presented in the Rdf triple structure of subject, predicate, and object. However, the heterogeneity of available open data requires significant integration steps before it can be used in applications. A promising and novel technique to explore such data is the use of association rule mining. We introduce “mining configurations”, which allow us to mine Rdf data sets in various ways. Different configurations enable us to identify schema and value dependencies that in combination result in interesting use cases. We present rule-based approaches for predicate suggestion, data enrichment, ontology improvement, and query relaxation. On the one hand we prevent inconsistencies in the data through predicate suggestion, enrichment with missing facts, and alignment of the corresponding ontology. On the other hand we support users to handle inconsistencies during query formulation through predicate expansion techniques. Based on these approaches, we show that association rule mining benefits the integration and usability of Rdf data.
Similar content being viewed by others
References
Abedjan Z, Lorey J, Naumann F (2012) Reconciling ontologies and the web of data. In: Proceedings of the international conference on information and knowledge management (CIKM), New York, NY, USA, pp 1532–1536
Abedjan Z, Naumann F (2011) Context and target configurations for mining RDF data (2 pp.). In: Proceedings of the international workshop on search and mining entity-relationship data (SMER), Glasgow
Abedjan Z, Naumann F (2013) Synonym analysis for predicate expansion. In: Proceedings of the extended semantic web conference (ESWC), Montpellier, France
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM international conference on management of data (SIGMOD), Washington, DC, USA, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the international conference on very large databases (VLDB), Santiago de Chile, Chile, pp 487–499
Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley/Longman, Boston
Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia—a crystallization point for the web of data. J Web Semant 7:154–165
Böhm C, Freitag M, Heise A, Lehmann C, Mascher A, Naumann F, Ercegovac V, Hernandez M, Haase P, Schmidt M (2012) GovWILD: integrating open government data for transparency. In: Proceedings of the international world wide web conference (WWW). Demo
Buitelaar P, Cimiano P (eds) (2008) Ontology learning and population: bridging the gap between text and knowledge. Frontiers in artificial intelligence and applications, vol 167. IOS Press, Amsterdam
Cafarella MJ, Halevy A, Wang DZ, Wu E, Zhang Y (2008) WebTables: exploring the power of tables on the web. In: Proceedings of the VLDB endowment, vol 1, pp 538–549
Elbassuoni S, Ramanath M, Weikum G (2012) RDF Xpress: a flexible expressive RDF search engine. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval. ACM, New York, p 1013
Fleischhacker D, Völker J, Stuckenschmidt H (2012) Mining RDF data for property axioms. In: Meersman R, Panetto H, Dillon T, Rinderle-Ma S, Dadam P, Zhou X, Pearson S, Ferscha A, Bergamaschi S, Cruz I (eds) On the move to meaningful internet systems: OTM 2012. Lecture notes in computer science, vol 7566. Springer, Berlin, pp 718–735
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM international conference on management of data (SIGMOD), pp 1–12
Heath T, Bizer C (2011) Linked data: evolving the web into a global data space, 1st edn, Morgan & Claypool
Józefowska J, Lawrynowicz A, Lukaszewski T (2010) The role of semantics in mining frequent patterns from knowledge bases in description logics with rules. Theory Pract Log Program 10:251–289
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the IEEE international conference on data mining (ICDM), Washington, DC, pp 313–320
Lange D, Böhm C, Naumann F (2010) Extracting structured information from Wikipedia articles to populate infoboxes. In: Proceedings of the international conference on information and knowledge management (CIKM). ACM, New York, pp 1661–1664
Maedche A, Staab S (2001) Ontology learning for the semantic web. IEEE Intell Syst 16:72–79
Nebot V, Berlanga R (2010) Mining association rules from semantic web data. In: Proceedings of the international conference on industrial engineering and other applications of applied intelligent systems (IEA/AIE), Cordoba, Spain, vol 2, pp 504–513
Völker J, Niepert M (2011) Statistical schema induction. In: Proceedings of the extended semantic web conference (ESWC), Heraklion, Greece, pp 124–138
Wu F, Weld DS (2007) Autonomously semantifying Wikipedia. In: Proceedings of the international conference on information and knowledge management (CIKM). ACM, New York, pp 41–50
Wu F, Weld DS (2008) Automatically refining the Wikipedia infobox ontology. In: Proceedings of the international world wide web conference (WWW), Beijing, China, pp 635–644
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12:372–390
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abedjan, Z., Naumann, F. Improving RDF Data Through Association Rule Mining. Datenbank Spektrum 13, 111–120 (2013). https://doi.org/10.1007/s13222-013-0126-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13222-013-0126-x