Nothing Special   »   [go: up one dir, main page]

skip to main content
column

Text-Mining, Structured Queries, and Knowledge Management on Web Document Corpora

Published: 04 December 2014 Publication History

Abstract

Wikipedia's InfoBoxes play a crucial role in advanced applications and provide the main knowledge source for DBpedia and the powerful structured queries it supports. However, InfoBoxes, which were created by crowdsourcing for human rather than computer consumption, suffer from incompleteness, inconsistencies, and inaccuracies. To overcome these problems, we have developed (i) the IBminer system that extracts InfoBox information by text-mining Wikipedia pages, (ii) the IKBStore system that integrates the information derived by IBminer with that of DBpedia, YAGO2,WikiData,WordNet, and other sources, and (iii) SWiPE and InfoBox Editor (IBE) that provide a user-friendly interfaces for querying and revising the knowledge base. Thus, IBminer uses a deep NLP-based approach to extract from text a semantic representation structure called TextGraph from which the system detects patterns and derives subject-attribute-value relations, as well as domain-specific synonyms for the knowledge base. IKBStore and IBE complement the powerful, user-friendly, by-example structured queries of SWiPE by supporting the validation and provenance history for the information contained in the knowledge base, along with the ability of upgrading its knowledge when this is found incomplete, incorrect, or outdated.

References

[1]
Apache Jena. http://jena.apache.org/.
[2]
Geonames. http://www.geonames.org/.
[3]
Hoffman2 Cluster, UCLA. http://hpc.ucla.edu/hoffman2/.
[4]
Musicbrainz. http://musicbrainz.org/.
[5]
Opencyc. http://www.cyc.com/platform/opencyc
[6]
Semantic web information management system (swims). http://semscape.cs.ucla.edu/.
[7]
Wikidata. http://www.wikidata.org.
[8]
M. Atzori and C. Zaniolo. Swipe: searching wikipedia by example. In WWW (Companion Volume), pages 309--312, 2012.
[9]
M. Atzori and C. Zaniolo. Expressivity and accuracy of by-example structure queries on wikipedia. CSD Technical Report #140017, UCLA, 2014.
[10]
C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. Dbpedia - a crystallization point for the web of data. J. Web Sem., 7(3):154--165, 2009.
[11]
K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, 2008.
[12]
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. H. Jr., and T. M. Mitchell. Toward an architecture for never-ending language learning. In AAAI, 2010.
[13]
J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. Yago2: A spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell., 194:28--61, 2013.
[14]
R. Huang and L. Zou. Natural language question answering over rdf data. In SIGMOD Conference, pages 1289--1290, 2013.
[15]
Lei Zou et al. Natural language question answering over rdf: a graph data driven approach. In SIGMOD Conference, pages 313--324, 2014.
[16]
H. Mousavi. Summarizing Massive Information for Querying Web Sources and Data Streams. PhD thesis, UCLA, 2014.
[17]
H. Mousavi, S. Gao, and C. Zaniolo. Discovering attribute and entity synonyms for knowledge integration and semantic web search. 3rd International Workshop on Semantic Search over The Web, 2013.
[18]
H. Mousavi, S. Gao, and C. Zaniolo. Ibminer: A text mining tool for constructing and populating infobox databases and knowledge bases. PVLDB, 6(12):1330--1333, 2013.
[19]
H. Mousavi, D. Kerr, M. Iseli, and C. Zaniolo. Deducing infoboxes from unstructured text in wikipedia pages. In CSD Technical Report #130001), UCLA, 2013.
[20]
H. Mousavi, D. Kerr, M. Iseli, and C. Zaniolo. Ontoharvester: An unsupervised ontology generator from free text. In CSD Technical Report #130003), UCLA, 2013.
[21]
H. Mousavi, D. Kerr, M. Iseli, and C. Zaniolo. Harvesting domain specific ontologies from text. In ICSC, 2014.
[22]
H. Mousavi, D. Kerr, M. Iseli, and C. Zaniolo. Mining semantic structures from syntactic structures in free text documents. In ICSC, 2014.
[23]
P. Singh, T. Lin, E. T. Mueller, G. Lim, T. Perkins, and W. L. Zhu. Open mind common sense: Knowledge acquisition from the general public. In Confederated International Conferences DOA, CoopIS and ODBASE, London, UK, 2002.
[24]
M. M. Stark and R. F. Riesenfeld. Wordnet: An electronic lexical database. In Proceedings of 11th Eurographics Workshop on Rendering. MIT Press, 1998.
[25]
W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: a probabilistic taxonomy for text understanding. In SIGMOD Conference, 2012.

Cited By

View all
  • (2019)Universal Representation Learning of Knowledge Bases by Jointly Embedding Instances and Ontological ConceptsProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330838(1709-1719)Online publication date: 25-Jul-2019
  • (2018)User-friendly temporal queries on historical knowledge basesInformation and Computation10.1016/j.ic.2017.08.012259(444-459)Online publication date: Apr-2018
  • (2018)A machine-learning approach to ranking RDF propertiesFuture Generation Computer Systems10.1016/j.future.2015.04.01854:C(366-377)Online publication date: 30-Dec-2018
  • Show More Cited By
  1. Text-Mining, Structured Queries, and Knowledge Management on Web Document Corpora

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGMOD Record
    ACM SIGMOD Record  Volume 43, Issue 3
    September 2014
    70 pages
    ISSN:0163-5808
    DOI:10.1145/2694428
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 December 2014
    Published in SIGMOD Volume 43, Issue 3

    Check for updates

    Qualifiers

    • Column

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Universal Representation Learning of Knowledge Bases by Jointly Embedding Instances and Ontological ConceptsProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330838(1709-1719)Online publication date: 25-Jul-2019
    • (2018)User-friendly temporal queries on historical knowledge basesInformation and Computation10.1016/j.ic.2017.08.012259(444-459)Online publication date: Apr-2018
    • (2018)A machine-learning approach to ranking RDF propertiesFuture Generation Computer Systems10.1016/j.future.2015.04.01854:C(366-377)Online publication date: 30-Dec-2018
    • (2018)Neural Article Pair Modeling for Wikipedia Sub-article MatchingMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-10997-4_1(3-19)Online publication date: 10-Sep-2018
    • (2015)Historical Queries on WikipediaProceedings of the 2015 22nd International Symposium on Temporal Representation and Reasoning (TIME)10.1109/TIME.2015.28(1-1)Online publication date: 23-Sep-2015

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media