Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2556195.2556266acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Using linked data to mine RDF from wikipedia's tables

Published: 24 February 2014 Publication History

Abstract

The tables embedded in Wikipedia articles contain rich, semi-structured encyclopaedic content. However, the cumulative content of these tables cannot be queried against. We thus propose methods to recover the semantics of Wikipedia tables and, in particular, to extract facts from them in the form of RDF triples. Our core method uses an existing Linked Data knowledge-base to find pre-existing relations between entities in Wikipedia tables, suggesting the same relations as holding for other entities in analogous columns on different rows. We find that such an approach extracts RDF triples from Wikipedia's tables at a raw precision of 40%. To improve the raw precision, we define a set of features for extracted triples that are tracked during the extraction phase. Using a manually labelled gold standard, we then test a variety of machine learning methods for classifying correct/incorrect triples. One such method extracts 7.9 million unique and novel RDF triples from over one million Wikipedia tables at an estimated precision of 81.5%.

References

[1]
M. Arenas, A. Bertails, E. Prud'hommeaux, and J. Sequeda. A Direct Mapping of Relational Data to RDF. W3C Recommendation, September 2012.
[2]
C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia -- a crystallization point for the Web of Data. J. Web Sem., 7(3):154--165, 2009.
[3]
K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In J. T.-L. Wang, editor, SIGMOD Conference, pages 1247--1250. ACM, 2008.
[4]
M. J. Cafarella, A. Y. Halevy, D. Z. Wang, E. Wu, and Y. Zhang. WebTables: exploring the power of tables on the Web. PVLDB, 1(1):538--549, 2008.
[5]
E. Crestan and P. Pantel. A fine-grained taxonomy of tables on the Web. In J. Huang, N. Koudas, G. J. F. Jones, X. Wu, K. Collins-Thompson, and A. An, editors, CIKM, pages 1405--1408. ACM, 2010.
[6]
E. Crestan and P. Pantel. Web-scale table census and classification. In I. King, W. Nejdl, and H. Li, editors, WSDM, pages 545--554. ACM, 2011.
[7]
S. Das, S. Sundara, and R. Cyganiak. R2RML: RDB to RDF Mapping Language. W3C Recommendation, September 2012.
[8]
L. Ding, D. DiFranzo, A. Graves, J. Michaelis, X. Li, D. L. McGuinness, and J. Hendler. Data-gov Wiki: Towards Linking Government Data. In AAAI Spring Symposium: Linked Data Meets Artificial Intelligence. AAAI, 2010.
[9]
S. Harris and A. Seaborne. SPARQL 1.1 Query Language. W3C Recommendation, March 2013.
[10]
J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell., 194:28--61, 2013.
[11]
M. Hurst. Layout and language: Challenges for table understanding on the Web. In Workshop on Web Document Analysis (WDA), pages 27--30, 2001.
[12]
G. Limaye, S. Sarawagi, and S. Chakrabarti. Annotating and searching Web tables using entities, types and relationships. In PVLDB, 2010.
[13]
B. Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Data-Centric Systems and Applications. Springer, 2011.
[14]
F. Manola and E. Miller. RDF Primer. W3C Recommendation, February 2004.
[15]
P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. DBpedia spotlight: shedding light on the web of documents. In I-SEMANTICS, pages 1--8, 2011.
[16]
V. Mulwad, T. Finin, and A. Joshi. Semantic message passing for generating Linked Data from tables. In ISWC, pages 363--378, 2013.
[17]
V. Mulwad, T. Finin, Z. Syed, and A. Joshi. T2LD: Interpreting and Representing Tables as Linked Data. In ISWC Posters&Demos, 2010.
[18]
E. Muñoz, A. Hogan, and A. Mileo. Triplifying Wikipedia's Tables. In Linked Data for Information Extraction (LD4IE) Workshop, ISWC. CEUR, 2013.
[19]
A. Pivk, P. Cimiano, Y. Sure, M. Gams, V. Rajkovic, and R. Studer. Transforming arbitrary tables into logical form with TARTAR. Data Knowl. Eng., 60(3):567--595, 2007.
[20]
Z. Syed, T. Finin, V. Mulwad, and A. Joshi. Exploiting a Web of Semantic Data for Interpreting Tables. In WebSci10, Raleigh NC, USA, April 26--27th 2010.
[21]
P. Venetis, A. Halevy, J. Madhavan, M. Paşca, W. Shen, F. Wu, G. Miao, and C. Wu. Recovering semantics of tables on the Web. PVLDB, 4(9):528--538, June 2011.
[22]
Y. Wang and J. Hu. A machine learning based approach for table detection on the web. In WWW, pages 242--250, New York, NY, USA, 2002. ACM.
[23]
I. H. Witten, E. Frank, and M. A. Hall. Data Mining: Practical Machine Learning Tools and Techniques, Third Edition. Data Mining, the Morgan Kaufmann Ser. in Data Management Systems Series. Elsevier Science, 2011.
[24]
M. Yoshida, K. Torisawa, and J. Tsujii. A method to integrate tables of the World Wide Web. In Workshop on Web Document Analysis (WDA), pages 31--34, 2001.
[25]
S. Zwicklbauer, C. Einsiedler, M. Granitzer, and C. Seifert. Towards disambiguating Web tables. In ISWC (P&D), pages 205--208, 2013.

Cited By

View all
  • (2023)A Comprehensive Survey on Automatic Knowledge Graph ConstructionACM Computing Surveys10.1145/361829556:4(1-62)Online publication date: 5-Sep-2023
  • (2022)Annotating Columns with Pre-trained Language ModelsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517906(1493-1503)Online publication date: 10-Jun-2022
  • (2022)StruBERT: Structure-aware BERT for Table Search and MatchingProceedings of the ACM Web Conference 202210.1145/3485447.3511972(442-451)Online publication date: 25-Apr-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining
February 2014
712 pages
ISBN:9781450323512
DOI:10.1145/2556195
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data mining
  2. linked data
  3. web tables
  4. wikipedia

Qualifiers

  • Research-article

Conference

WSDM 2014

Acceptance Rates

WSDM '14 Paper Acceptance Rate 64 of 355 submissions, 18%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)4
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Comprehensive Survey on Automatic Knowledge Graph ConstructionACM Computing Surveys10.1145/361829556:4(1-62)Online publication date: 5-Sep-2023
  • (2022)Annotating Columns with Pre-trained Language ModelsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517906(1493-1503)Online publication date: 10-Jun-2022
  • (2022)StruBERT: Structure-aware BERT for Table Search and MatchingProceedings of the ACM Web Conference 202210.1145/3485447.3511972(442-451)Online publication date: 25-Apr-2022
  • (2022)CFCT: The cell function classification method for complex tables2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00326(2206-2213)Online publication date: Dec-2022
  • (2022)LinkingPark: An automatic semantic table interpretation systemJournal of Web Semantics10.1016/j.websem.2022.10073374(100733)Online publication date: Oct-2022
  • (2022)Rule-based spreadsheet data transformation from arbitrary to relational tablesInformation Systems10.1016/j.is.2017.08.00471:C(123-136)Online publication date: 13-Apr-2022
  • (2022)Table understanding: Problem overviewWIREs Data Mining and Knowledge Discovery10.1002/widm.148213:1Online publication date: 21-Nov-2022
  • (2021)Reengineering Legacy Systems Towards New TechnologiesEncyclopedia of Information Science and Technology, Fifth Edition10.4018/978-1-7998-3479-3.ch084(1214-1230)Online publication date: 2021
  • (2021)TCN: Table Convolutional Network for Web Table InterpretationProceedings of the Web Conference 202110.1145/3442381.3450090(4020-4032)Online publication date: 19-Apr-2021
  • (2021)Semantic Table Retrieval Using Keyword and Table QueriesACM Transactions on the Web10.1145/344169015:3(1-33)Online publication date: 13-May-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media