Nothing Special   »   [go: up one dir, main page]

skip to main content
department
Open access

Knowledge Base Construction in the Machine-learning Era: Three critical design points: Joint-learning, weak supervision, and new representations

Published: 01 June 2018 Publication History

Abstract

More information is accessible today than at any other time in human history. From a software perspective, however, the vast majority of this data is unusable, as it is locked away in unstructured formats such as text, PDFs, web pages, images, and other hard-to-parse formats. The goal of knowledge base construction is to extract structured information automatically from this "dark data," so that it can be used in downstream applications for search, question-answering, link prediction, visualization, modeling and much more. Today, knowledge bases are the central components of systems that help fight human trafficking, accelerate biomedical discovery, and, increasingly, power web-search and question-answering technologies.

References

[1]
Bunescu, R. C., Mooney, R. J. 2007. Learning to extract relations from the web using minimal supervision. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics: 576?583.
[2]
Cafarella, M. J., Downey, D., Soderland, S., Etzioni, O. 2005. KnowItNow: fast, scalable information extraction from the web. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing: 563?570.
[3]
Caruana, R. 1993. Multitask learning: A knowledge-based source of inductive bias. In Proceedings of the 10th International Conference on Machine Learning: 41-48.
[4]
Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W. 2014. Knowledge Vault: a web-scale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 601?610.
[5]
Grover, A., Leskovec, J. 2016. node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 855?864.
[6]
Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L., Weld, D. S. 2011. Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL)?Human Language Technologies, Volume 1: 541-550.
[7]
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C. 2014. DBpedia?a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6(2): 167?195.
[8]
Mahdisoltani, F., Biega, J. Suchanek, F. M. 2013. YAGO3: a knowledge base from multilingual wikipedias. In the 7th Biennial Conference on Innovative Data Systems Research (CIDR).
[9]
Mallory, E. K., Zhang, C., Ré, C., Altman, R. B. 2015. Large-scale extraction of gene interactions from full-text literature using DeepDive. Bioinformatics 32(1):106?113.
[10]
Mann, G. S., McCallum, A. 2010. Generalized expectation criteria for semi-supervised learning with weakly labeled data. Journal of Machine Learning Research 11(Feb):955?984.
[11]
Manning, C. 2017. Representations for language: from word embeddings to sentence meanings. Presented at Simons Institute for the Theory of Computing, UC Berkeley; https://nlp.stanford.edu/manning/talks/Simons-Institute-Manning-2017.pdf.
[12]
Mikolov, T., Chen, K., Corrado, G., Dean, J. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[13]
Mintz, M., Bills, S., Snow, R., Jurafsky, D. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics (ACL) and the 4th Conference of the Asian Federation of Natural Language Processing (AFNLP): 1003?1011.
[14]
Mintz, M., Bills, S., Snow, R., Jurafsky, D. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics (ACL) and the 4th Conference of the Asian Federation of Natural Language Processing (AFNLP): 1003?1011.
[15]
Ratner, A., Bach, S., Varma, P., Ré, C. Weak supervision: the new programming paradigm for machine learning. Hazy Research; https://hazyresearch.github.io/snorkel/blog/ws_blog_post.html.
[16]
Ren, X., He, W., Qu, M., Voss, C. R., Ji, H., Han, J. 2016. Label noise reduction in entity typing by heterogeneous partial-label embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 1825?1834.
[17]
Ruder, S. 2017. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv: 1706.05098.
[18]
Zhang, C., Shin, J., Ré, C., Cafarella, M., Niu, F. 2016. Extracting databases from dark data with DeepDive. In Proceedings of the International Conference on Management of Data: 847?859.
[19]
Zhang, C., Ré, C., Cafarella, M., De Sa, C., Ratner, A., Shin, J., Wang, F., Wu, S. 2017. DeepDive: declarative knowledge base construction. Communications of the ACM 60(5):93?102.

Cited By

View all
  • (2024)Ontology Enrichment for Effective Fine-grained Entity TypingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671857(2318-2327)Online publication date: 25-Aug-2024
  • (2023)Towards Adaptive User-centered Neuro-symbolic Learning for Multimodal Interaction with Autonomous SystemsProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3616121(689-694)Online publication date: 9-Oct-2023
  • (2022)Character-level inclusive transformer architecture for information gain in low resource code-mixed languageNeural Computing and Applications10.1007/s00521-022-06983-2Online publication date: 9-Mar-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Queue
Queue  Volume 16, Issue 3
Machine Learning
May-June 2018
118 pages
ISSN:1542-7730
EISSN:1542-7749
DOI:10.1145/3236386
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2018
Published in QUEUE Volume 16, Issue 3

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Department
  • Popular
  • Editor picked

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,149
  • Downloads (Last 6 weeks)108
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Ontology Enrichment for Effective Fine-grained Entity TypingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671857(2318-2327)Online publication date: 25-Aug-2024
  • (2023)Towards Adaptive User-centered Neuro-symbolic Learning for Multimodal Interaction with Autonomous SystemsProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3616121(689-694)Online publication date: 9-Oct-2023
  • (2022)Character-level inclusive transformer architecture for information gain in low resource code-mixed languageNeural Computing and Applications10.1007/s00521-022-06983-2Online publication date: 9-Mar-2022
  • (2021)Using Wikipedia's Big Data for creation of Knowledge BasesInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT217546(11-18)Online publication date: 1-Nov-2021
  • (undefined)Wikipedia Infoboxes: The Big Data Source for Knowledge Bases behind Alexa and Siri Virtual AssistantsSSRN Electronic Journal10.2139/ssrn.4187145

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media