column

Text-Mining, Structured Queries, and Knowledge Management on Web Document Corpora

Authors:

Maurizio Atzori,

Carlo ZanioloAuthors Info & Claims

ACM SIGMOD Record, Volume 43, Issue 3

Pages 48 - 54

https://doi.org/10.1145/2694428.2694437

Published: 04 December 2014 Publication History

Abstract

Wikipedia's InfoBoxes play a crucial role in advanced applications and provide the main knowledge source for DBpedia and the powerful structured queries it supports. However, InfoBoxes, which were created by crowdsourcing for human rather than computer consumption, suffer from incompleteness, inconsistencies, and inaccuracies. To overcome these problems, we have developed (i) the IBminer system that extracts InfoBox information by text-mining Wikipedia pages, (ii) the IKBStore system that integrates the information derived by IBminer with that of DBpedia, YAGO2,WikiData,WordNet, and other sources, and (iii) SWiPE and InfoBox Editor (IBE) that provide a user-friendly interfaces for querying and revising the knowledge base. Thus, IBminer uses a deep NLP-based approach to extract from text a semantic representation structure called TextGraph from which the system detects patterns and derives subject-attribute-value relations, as well as domain-specific synonyms for the knowledge base. IKBStore and IBE complement the powerful, user-friendly, by-example structured queries of SWiPE by supporting the validation and provenance history for the information contained in the knowledge base, along with the ability of upgrading its knowledge when this is found incomplete, incorrect, or outdated.

References

[1]

Apache Jena. http://jena.apache.org/.

[2]

Geonames. http://www.geonames.org/.

[3]

Hoffman2 Cluster, UCLA. http://hpc.ucla.edu/hoffman2/.

[4]

Musicbrainz. http://musicbrainz.org/.

[5]

Opencyc. http://www.cyc.com/platform/opencyc

[6]

Semantic web information management system (swims). http://semscape.cs.ucla.edu/.

[7]

Wikidata. http://www.wikidata.org.

[8]

M. Atzori and C. Zaniolo. Swipe: searching wikipedia by example. In WWW (Companion Volume), pages 309--312, 2012.

Digital Library

[9]

M. Atzori and C. Zaniolo. Expressivity and accuracy of by-example structure queries on wikipedia. CSD Technical Report #140017, UCLA, 2014.

[10]

C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. Dbpedia - a crystallization point for the web of data. J. Web Sem., 7(3):154--165, 2009.

Digital Library

[11]

K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, 2008.

Digital Library

[12]

A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. H. Jr., and T. M. Mitchell. Toward an architecture for never-ending language learning. In AAAI, 2010.

Digital Library

[13]

J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. Yago2: A spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell., 194:28--61, 2013.

Digital Library

[14]

R. Huang and L. Zou. Natural language question answering over rdf data. In SIGMOD Conference, pages 1289--1290, 2013.

Digital Library

[15]

Lei Zou et al. Natural language question answering over rdf: a graph data driven approach. In SIGMOD Conference, pages 313--324, 2014.

Digital Library

[16]

H. Mousavi. Summarizing Massive Information for Querying Web Sources and Data Streams. PhD thesis, UCLA, 2014.

[17]

H. Mousavi, S. Gao, and C. Zaniolo. Discovering attribute and entity synonyms for knowledge integration and semantic web search. 3rd International Workshop on Semantic Search over The Web, 2013.

Digital Library

[18]

H. Mousavi, S. Gao, and C. Zaniolo. Ibminer: A text mining tool for constructing and populating infobox databases and knowledge bases. PVLDB, 6(12):1330--1333, 2013.

Digital Library

[19]

H. Mousavi, D. Kerr, M. Iseli, and C. Zaniolo. Deducing infoboxes from unstructured text in wikipedia pages. In CSD Technical Report #130001), UCLA, 2013.

[20]

H. Mousavi, D. Kerr, M. Iseli, and C. Zaniolo. Ontoharvester: An unsupervised ontology generator from free text. In CSD Technical Report #130003), UCLA, 2013.

[21]

H. Mousavi, D. Kerr, M. Iseli, and C. Zaniolo. Harvesting domain specific ontologies from text. In ICSC, 2014.

Digital Library

[22]

H. Mousavi, D. Kerr, M. Iseli, and C. Zaniolo. Mining semantic structures from syntactic structures in free text documents. In ICSC, 2014.

Digital Library

[23]

P. Singh, T. Lin, E. T. Mueller, G. Lim, T. Perkins, and W. L. Zhu. Open mind common sense: Knowledge acquisition from the general public. In Confederated International Conferences DOA, CoopIS and ODBASE, London, UK, 2002.

Digital Library

[24]

M. M. Stark and R. F. Riesenfeld. Wordnet: An electronic lexical database. In Proceedings of 11th Eurographics Workshop on Rendering. MIT Press, 1998.

[25]

W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: a probabilistic taxonomy for text understanding. In SIGMOD Conference, 2012.

Digital Library

Cited By

Hao JChen MYu WSun YWang WTeredesai AKumar VLi YRosales RTerzi EKarypis G(2019)Universal Representation Learning of Knowledge Bases by Jointly Embedding Instances and Ontological ConceptsProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330838(1709-1719)Online publication date: 25-Jul-2019
https://dl.acm.org/doi/10.1145/3292500.3330838
Zaniolo CGao SAtzori MChen MGu J(2018)User-friendly temporal queries on historical knowledge basesInformation and Computation10.1016/j.ic.2017.08.012259(444-459)Online publication date: Apr-2018
https://doi.org/10.1016/j.ic.2017.08.012
Dessi AAtzori M(2018)A machine-learning approach to ranking RDF propertiesFuture Generation Computer Systems10.1016/j.future.2015.04.01854:C(366-377)Online publication date: 30-Dec-2018
https://dl.acm.org/doi/10.1016/j.future.2015.04.018
Show More Cited By

Text-Mining, Structured Queries, and Knowledge Management on Web Document Corpora
1. Information systems

Recommendations

Extracting structured knowledge for semantic web by mining Wikipedia
ISWC-PD'08: Proceedings of the 2007 International Conference on Posters and Demonstrations - Volume 401

Since Wikipedia has become a huge scale database storing wide-range of human knowledge, it is a promising corpus for knowledge extraction. A considerable number of researches on Wikipedia mining have been conducted and the fact that Wikipedia is an ...
Mining entity translations from comparable corpora: a holistic graph mapping approach
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

This paper addresses the problem of mining named entity translations from comparable corpora, specifically, mining English and Chinese named entity translation. We first observe that existing approaches use one or more of the following named entity ...
Mining knowledge from text using information extraction
Natural language processing and text mining

An important approach to text mining involves the use of natural-language information extraction. Information extraction (IE) distills structured data or knowledge from unstructured text by identifying references to named entities as well as stated ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record

ACM SIGMOD Record Volume 43, Issue 3

September 2014

70 pages

ISSN:0163-5808

DOI:10.1145/2694428

Editors:
Yanlei Diao
University of Massachusetts Amherst
,
Pablo Barceló
Universidad de Chile
,
Vanessa Braganholo
Universidade Federal Fluminense
,
Marco Brambilla
Politecnico di Milano
,
Chee Yong Chan
National University of Singapore
,
Rada Chirkova
North Carolina State University
,
Anish Das Sarma
Google Research
,
Alkis Simitsis
HP Labs
,
Nesime Tatbul
ETH Zurich
,
Marianne Winslett
University of Illinois

Issue’s Table of Contents

Copyright © 2014 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 December 2014

Published in SIGMOD Volume 43, Issue 3

Check for updates

Qualifiers

Column

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
210
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hao JChen MYu WSun YWang WTeredesai AKumar VLi YRosales RTerzi EKarypis G(2019)Universal Representation Learning of Knowledge Bases by Jointly Embedding Instances and Ontological ConceptsProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330838(1709-1719)Online publication date: 25-Jul-2019
https://dl.acm.org/doi/10.1145/3292500.3330838
Zaniolo CGao SAtzori MChen MGu J(2018)User-friendly temporal queries on historical knowledge basesInformation and Computation10.1016/j.ic.2017.08.012259(444-459)Online publication date: Apr-2018
https://doi.org/10.1016/j.ic.2017.08.012
Dessi AAtzori M(2018)A machine-learning approach to ranking RDF propertiesFuture Generation Computer Systems10.1016/j.future.2015.04.01854:C(366-377)Online publication date: 30-Dec-2018
https://dl.acm.org/doi/10.1016/j.future.2015.04.018
Chen MMeng CHuang GZaniolo C(2018)Neural Article Pair Modeling for Wikipedia Sub-article MatchingMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-10997-4_1(3-19)Online publication date: 10-Sep-2018
https://dl.acm.org/doi/10.1007/978-3-030-10997-4_1
Zaniolo C(2015)Historical Queries on WikipediaProceedings of the 2015 22nd International Symposium on Temporal Representation and Reasoning (TIME)10.1109/TIME.2015.28(1-1)Online publication date: 23-Sep-2015
https://dl.acm.org/doi/10.1109/TIME.2015.28

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents