Towards Efficient and Effective Semantic Table Interpretation

Ziqi Zhang²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8796))

Included in the following conference series:

International Semantic Web Conference

3875 Accesses
26 Citations
7 Altmetric

Abstract

This paper describes TableMiner, the first semantic Table Interpretation method that adopts an incremental, mutually recursive and bootstrapping learning approach seeded by automatically selected ‘partial’ data from a table. TableMiner labels columns containing named entity mentions with semantic concepts that best describe data in columns, and disambiguates entity content cells in these columns. TableMiner is able to use various types of contextual information outside tables for Table Interpretation, including semantic markups (e.g., RDFa/microdata annotations) that to the best of our knowledge, have never been used in Natural Language Processing tasks. Evaluation on two datasets shows that compared to two baselines, TableMiner consistently obtains the best performance. In the classification task, it achieves significant improvements of between 0.08 and 0.38 F1 depending on different baseline methods; in the disambiguation task, it outperforms both baselines by between 0.19 and 0.37 in Precision on one dataset, and between 0.02 and 0.03 F1 on the other dataset. Observation also shows that the bootstrapping learning approach adopted by TableMiner can potentially deliver computational savings of between 24 and 60% against classic methods that ‘exhaustively’ processes the entire table content to build features for interpretation.

Download to read the full chapter text

Chapter PDF

Learning with Partial Data for Semantic Table Interpretation

MantisTable: A Tool for Creating Semantic Annotations on Tabular Data

A Human-Machine Method for Web Table Understanding

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Adelfio, M.D., Samet, H.: Schema extraction for tabular data on the web. Proc. VLDB Endow. 6(6), 421–432 (2013)
Article Google Scholar
Ahmad, A., Eldad, L., Aline, S., Corentin, F., Raphaël, T., David, T.: Improving schema matching with linked data. In: First International Workshop on Open Data (2012)
Google Scholar
Bhagavatula, C.S., Noraset, T., Downey, D.: Methods for exploring and mining tables on wikipedia. In: Proceedings of the ACM SIGKDD Interative Data Exploration and Analysis (IDEA), IDEA 2013 (2013)
Google Scholar
Cafarella, M.J., Halevy, A., Madhavan, J.: Structured data on the web. Communications of the ACM 54(2), 72–79 (2011)
Article Google Scholar
Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. Proceedings of VLDB Endowment 1(1), 538–549 (2008)
Article Google Scholar
Ciravegna, F., Gentile, A.L., Zhang, Z.: Lodie: Linked open data for web-scale information extraction. In: Maynard, D., van Erp, M., Davis, B. (eds.) SWAIE. CEUR Workshop Proceedings, vol. 925, pp. 11–22. CEUR-WS.org (2012)
Google Scholar
Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 708–716. Association for Computational Linguistics, Prague (2007)
Google Scholar
Gentile, A.L., Zhang, Z., Augenstein, I., Ciravegna, F.: Unsupervised wrapper induction using linked data. In: Proceedings of the Seventh International Conference on Knowledge Capture, K-CAP 2013. ACM, New York (2013)
Google Scholar
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, COLING 1992, vol. 2, pp. 539–545. Association for Computational Linguistics, Stroudsburg (1992)
Chapter Google Scholar
Krishnan, V., Manning, C.D.: An effective two-stage model for exploiting non-local dependencies in named entity recognition. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1121–1128. ACL-44, Association for Computational Linguistics, Stroudsburg (2006)
Google Scholar
Kushmerick, N., Weld, D.S., Doorenbos, R.: Wrapper induction for information extraction. In: Proc. IJCAI 1997 (1997)
Google Scholar
Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proceedings of the VLDB Endowment 3(1-2), 1338–1347 (2010)
Article Google Scholar
Ling, X., Halevy, A., Wu, F., Yu, C.: Synthesizing union tables from the web. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence, pp. 2677–2683 (2013)
Google Scholar
Lu, C., Bing, L., Lam, W., Chan, K., Gu, Y.: Web entity detection for semi-structured text data records with unlabeled data. International Journal of Computational Linguistics and Applications (2013)
Google Scholar
Mulwad, V., Finin, T., Joshi, A.: Automatically generating government linked data from tables. In: Working notes of AAAI Fall Symposium on Open Government Knowledge: AI Opportunities and Challenges (November 2011)
Google Scholar
Mulwad, V., Finin, T., Joshi, A.: Semantic message passing for generating linked data from tables. In: Alani, H., et al. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 363–378. Springer, Heidelberg (2013)
Chapter Google Scholar
Mulwad, V., Finin, T., Syed, Z., Joshi, A.: T2ld: Interpreting and representing tables as linked data. In: Polleres, A., Chen, H. (eds.) ISWC Posters and Demos. CEUR Workshop Proceedings. CEUR-WS.org (2010)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investigationes 30(1), 3–26 (2007), Publisher: John Benjamins Publishing Company
Article Google Scholar
Sarawagi, S.: Information extraction. Found. Trends Databases 1(3), 261–377 (2008)
Article Google Scholar
Sarawagi, S., Cohen, W.W.: Semi-markov conditional random fields for information extraction. In: Advances in Neural Information Processing Systems 17, pp. 1185–1192 (2004)
Google Scholar
Syed, Z., Finin, T., Mulwad, V., Joshi, A.: Exploiting a web of semantic data for interpreting tables. In: Proceedings of the Second Web Science Conference (April 2010)
Google Scholar
Venetis, P., Halevy, A., Madhavan, J., Paşca, M., Shen, W., Wu, F., Miao, G., Wu, C.: Recovering semantics of tables on the web. Proceedings of VLDB Endowment 4(9), 528–538 (2011)
Article Google Scholar
Wang, J., Wang, H., Wang, Z., Zhu, K.Q.: Understanding tables on the web. In: Atzeni, P., Cheung, D., Ram, S. (eds.) ER 2012 Main Conference 2012. LNCS, vol. 7532, pp. 141–155. Springer, Heidelberg (2012)
Chapter Google Scholar
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, pp. 481–492. ACM, New York (2012)
Chapter Google Scholar
Zanibbi, R., Blostein, D., Cordy, J.: A survey of table recognition: Models, observations, transformations, and inferences. International Journal of Document Analysis and Recognition 7, 1–16 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Sheffield, UK
Ziqi Zhang

Authors

Ziqi Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Yahoo Labs, Diagonal 177, 08018, Barcelona, Spain
Peter Mika
Stanford University, 1265 Welch Road, 94305, Stanford, CA, USA
Tania Tudorache
University of Zurich, DDIS, Zurich, Switzerland
Abraham Bernstein
IBM Research, Yorktown Heights, NY, USA
Chris Welty
Information Sciences Institute and Department of Computer Science, University of Southern California, Los Angeles, CA, USA
Craig Knoblock
Google, USA
Denny Vrandečić & Natasha Noy &
VU University Amsterdam, The Netherlands
Paul Groth
University of California, Santa Barbara, CA, USA
Krzysztof Janowicz
School of Computer Science, The University of Manchester, Manchester, UK
Carole Goble

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z. (2014). Towards Efficient and Effective Semantic Table Interpretation. In: Mika, P., et al. The Semantic Web – ISWC 2014. ISWC 2014. Lecture Notes in Computer Science, vol 8796. Springer, Cham. https://doi.org/10.1007/978-3-319-11964-9_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-11964-9_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11963-2
Online ISBN: 978-3-319-11964-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Efficient and Effective Semantic Table Interpretation

Abstract

Chapter PDF

Similar content being viewed by others

Learning with Partial Data for Semantic Table Interpretation

MantisTable: A Tool for Creating Semantic Annotations on Tabular Data

A Human-Machine Method for Web Table Understanding

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Towards Efficient and Effective Semantic Table Interpretation

Abstract

Chapter PDF

Similar content being viewed by others

Learning with Partial Data for Semantic Table Interpretation

MantisTable: A Tool for Creating Semantic Annotations on Tabular Data

A Human-Machine Method for Web Table Understanding

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation