Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3534678.3539077acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

TAG: Toward Accurate Social Media Content Tagging with a Concept Graph

Published: 14 August 2022 Publication History

Abstract

Although conceptualization has been widely studied in semantics and knowledge representation, it is still challenging to find the most accurate concept terms to tag fast-growing social media content. This is partly attributed to the fact that most traditional knowledge bases contain general terms of the world, such as trees and cars, which are not interesting to users, and do not have the defining power for social media content. Another reason is that the intricate use of tense, negation and grammar in social media content may change the logic or emphasis of the content, thus focusing on different main ideas. In this paper, we present TAG, a high-quality concept matching dataset consisting of 10,000 labeled pairs of fine-grained concepts and web-styled natural language sentences, mined from open-domain social media content. The concepts we provide are the trending terms on social media and have the right granularity to define user interests, e.g., highly educated actors instead of just actors. In the meantime, TAG offers a concept graph which interconnects these fine-grained concepts and entities to provide contextual information. We evaluate a wide range of neural text matching models as well as pre-trained language models for the concept matching task on TAG, and point out their insufficiency to tag social media content to characterize its main idea. We further propose a novel graph-graph matching framework that demonstrates superior abstraction and generalization performance by better utilizing both the structural information in the concept graph and logic interactions between semantic units in the natural language sentence via syntactic dependency parsing.

Supplemental Material

MP4 File
We released TAG, a high-quality concept matching dataset consisting of 10,000 labeled pairs of fine-grained concepts and web-styled natural language sentences, mined from the open-domain social media. Each concept has a local concept graph to provide context information. We propose a novel graph-graph matching method that demonstrates superior abstraction and generalization performance by better utilizing both the structural context in the concept graph and logic interactions between semantic units in the sentence via syntactic dependency parsing.

References

[1]
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. The semantic web (2007), 722--735.
[2]
Wei Bi and James Kwok. 2013. Efficient multi-label classification with many labels. In ICML. 405--413.
[3]
Raghavendra Chalapathy, Ehsan Zare Borzeshi, and Massimo Piccardi. 2016. Bidirectional LSTM-CRF for clinical concept extraction. arXiv preprint arXiv:1611.08373 (2016).
[4]
Sheng Chen, Akshay Soni, Aasish Pappu, and Yashar Mehdad. 2017. Doctag2vec: An embedding based multi-label learning approach for document tagging. arXiv preprint arXiv:1707.04596 (2017).
[5]
Paul-Alexandru Chirita, Stefania Costache, Wolfgang Nejdl, and Siegfried Handschuh. 2007. P-tag: large scale automatic generation of personalized annotation tags for the web. In Proc. WWW. 845--854.
[6]
Zhuyun Dai, Chenyan Xiong, Jamie Callan, and Zhiyuan Liu. 2018. Convolutional neural networks for soft-matching n-grams in ad-hoc search. In Proc. ACM WSDM. 126--134.
[7]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[8]
Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds.
[9]
Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In Proc. ICML-Volume 70. JMLR. org, 1263--1272.
[10]
Jacopo Gobbi, Evgeny Stepanov, and Giuseppe Riccardi. 2018. Concept Tagging for Natural Language Understanding: Two Decadelong Algorithm Development. arXiv preprint arXiv:1807.10661 (2018).
[11]
Jiafeng Guo, Yixing Fan, Xiang Ji, and Xueqi Cheng. 2019. MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching. In Proc. ACM SIGIR (Paris, France) (SIGIR'19). ACM, New York, NY, USA, 1297--1300.
[12]
Daniel J Hsu, Sham M Kakade, John Langford, and Tong Zhang. 2009. Multi-label prediction via compressed sensing. In NIPS. 772--780.
[13]
Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In NIPS. 2042--2050.
[14]
Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015).
[15]
Lei Ji, Yujing Wang, Botian Shi, Dawei Zhang, Zhongyuan Wang, and Jun Yan. 2019. Microsoft concept graph: Mining semantic concepts for short text understanding. Data Intelligence 1, 3 (2019), 238--270.
[16]
Immanuel Kant. [n.d.]. Logic. Dover Publications.
[17]
Dongwoo Kim, Haixun Wang, and Alice Oh. 2013. Context-dependent conceptualization. In IJCAI.
[18]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[19]
Jiaqing Liang, Yanghua Xiao, Haixun Wang, Yi Zhang, and Wei Wang. 2017. Probase+: Inferring missing links in conceptual taxonomies. IEEE TKDE 29, 6 (2017), 1281--1295.
[20]
Bang Liu, Weidong Guo, Di Niu, Jinwen Luo, Chaoyue Wang, Zhen Wen, and Yu Xu. 2020. GIANT: Scalable Creation of a Web-scale Ontology. In Proc. ACM SIGMOD. 393--409.
[21]
Bang Liu, Weidong Guo, Di Niu, Chaoyue Wang, Shunnan Xu, Jinghong Lin, Kunfeng Lai, and Yu Xu. 2019. A User-Centered Concept Mining System for Query and Document Understanding at Tencent. In Proc. ACM SIGKDD. 1831--1841.
[22]
Bang Liu, Di Niu, Haojie Wei, Jinghong Lin, Yancheng He, Kunfeng Lai, and Yu Xu. 2019. Matching Article Pairs with Graphical Decomposition and Convolutions. In Proc. ACL. 6284--6294.
[23]
Bang Liu, Ting Zhang, Fred X Han, Di Niu, Kunfeng Lai, and Yu Xu. 2018. Matching natural language sentences with hierarchical sentence factorization. In Proc. WWW. IW3C2, 1237--1246.
[24]
Xusheng Luo, Luxin Liu, Yonghua Yang, Le Bo, Yuanpeng Cao, Jinhang Wu, Qiang Li, Keping Yang, and Kenny Q Zhu. 2020. AliCoCo: Alibaba E-commerce Cognitive Concept Net. arXiv preprint arXiv:2003.13230 (2020).
[25]
Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proc. ACL: system demonstrations. 55--60.
[26]
Jonas Mueller and Aditya Thyagarajan. 2016. Siamese recurrent architectures for learning sentence similarity. In AAAI Conference on Artificial Intelligence.
[27]
Giannis Nikolentzos, Polykarpos Meladianos, François Rousseau, Yannis Stavrakas, and Michalis Vazirgiannis. 2017. Shortest-path graph kernels for document similarity. In Proc. EMNLP. 1890--1900.
[28]
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text Matching as Image Recognition. In AAAI. 2793--2799.
[29]
Christian Paul, Achim Rettinger, Aditya Mogadala, Craig A Knoblock, and Pedro Szekely. 2016. Efficient graph-based document similarity. In ESWC. Springer, 334--349.
[30]
Yashoteja Prabhu and Manik Varma. 2014. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proc. ACM SIGKDD. 263--272.
[31]
Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D Manning. 2020. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. arXiv preprint arXiv:2003.07082 (2020).
[32]
Steffen Rendle, Leandro Balby Marinho, Alexandros Nanopoulos, and Lars Schmidt-Thieme. 2009. Learning optimal ranking with tensor factorization for tag recommendation. In Proc. ACM SIGKDD. 727--736.
[33]
Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In ESWC. Springer, 593--607.
[34]
Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to rank short text pairs with convolutional deep neural networks. In Proc. ACM SIGIR. ACM, 373--382.
[35]
Yan Song, Shuming Shi, Jing Li, and Haisong Zhang. 2018. Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings. In Proc. NAACL (Short Papers). 175--180.
[36]
Yangqiu Song, Haixun Wang, Zhongyuan Wang, Hongsong Li, and Weizhu Chen. 2011. Short text conceptualization using a probabilistic knowledgebase. In IJCAI.
[37]
Yangqiu Song, Shusen Wang, and Haixun Wang. 2015. Open domain short text conceptualization: A generative+ descriptive modeling approach. In IJCAI.
[38]
Yang Song, Lu Zhang, and C Lee Giles. 2011. Automatic tag recommendation algorithms for social recommender systems. ACM TWEB 5, 1 (2011), 1--31.
[39]
Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In Proc. WWW. 697--706.
[40]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In NIPS. 3104--3112.
[41]
Shengxian Wan, Yanyan Lan, Jiafeng Guo, Jun Xu, Liang Pang, and Xueqi Cheng. 2016. A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations. In AAAI, Vol. 16. 2835--2841.
[42]
Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. arXiv preprint arXiv:1702.03814 (2017).
[43]
Zhongyuan Wang and Haixun Wang. 2016. Understanding short texts. (2016).
[44]
Zhongyuan Wang, Kejun Zhao, Haixun Wang, Xiaofeng Meng, and Ji-Rong Wen. 2015. Query understanding through knowledge-based conceptualization. In IJCAI.
[45]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R'emi Louf, Morgan Funtowicz, and Jamie Brew. 2019. HuggingFace's Transformers: State-of-the-art Natural Language Processing. ArXiv abs/1910.03771 (2019).
[46]
Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q Zhu. 2012. Probase: A probabilistic taxonomy for text understanding. In Proc. ACM SIGMOD. 481--492.
[47]
Bo Xu, Yong Xu, Jiaqing Liang, Chenhao Xie, Bin Liang, Wanyun Cui, and Yanghua Xiao. 2017. CN-DBpedia: A never-ending Chinese knowledge extraction system. In IEA/AIE. Springer, 428--438.
[48]
Hsiang-Fu Yu, Prateek Jain, Purushottam Kar, and Inderjit Dhillon. 2014. Large-scale multi-label learning with missing labels. In ICML. 593--601.

Index Terms

  1. TAG: Toward Accurate Social Media Content Tagging with a Concept Graph

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
    August 2022
    5033 pages
    ISBN:9781450393850
    DOI:10.1145/3534678
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 August 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. concept-sentence matching
    2. datasets
    3. graph-graph matching
    4. heterogeneous graph

    Qualifiers

    • Research-article

    Funding Sources

    • NSERC Discovery Grant

    Conference

    KDD '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 223
      Total Downloads
    • Downloads (Last 12 months)33
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media