Although conceptualization has been widely studied in semantics and knowledge representation, it is still challenging to find the most accurate concept terms to tag fast-growing social media content. This is partly attributed to the fact that most traditional knowledge bases contain general terms of the world, such as trees and cars, which are not interesting to users, and do not have the defining power for social media content. Another reason is that the intricate use of tense, negation and grammar in social media content may change the logic or emphasis of the content, thus focusing on different main ideas. In this paper, we present TAG, a high-quality concept matching dataset consisting of 10,000 labeled pairs of fine-grained concepts and web-styled natural language sentences, mined from open-domain social media content. The concepts we provide are the trending terms on social media and have the right granularity to define user interests, e.g., highly educated actors instead of just actors. In the meantime, TAG offers a concept graph which interconnects these fine-grained concepts and entities to provide contextual information. We evaluate a wide range of neural text matching models as well as pre-trained language models for the concept matching task on TAG, and point out their insufficiency to tag social media content to characterize its main idea. We further propose a novel graph-graph matching framework that demonstrates superior abstraction and generalization performance by better utilizing both the structural information in the concept graph and logic interactions between semantic units in the natural language sentence via syntactic dependency parsing.

Supplemental Material

MP4 File

We released TAG, a high-quality concept matching dataset consisting of 10,000 labeled pairs of fine-grained concepts and web-styled natural language sentences, mined from the open-domain social media. Each concept has a local concept graph to provide context information. We propose a novel graph-graph matching method that demonstrates superior abstraction and generalization performance by better utilizing both the structural context in the concept graph and logic interactions between semantic units in the sentence via syntactic dependency parsing.

Download
125.92 MB

References

[1]

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. The semantic web (2007), 722--735.

Abstract

Supplemental Material

References

Index Terms

Recommendations

Tag suggestion using visual content and social tag

Social media user classification: based on social capital expectation, susceptibility, and compulsion loop

Online Social Media Content Delivery: A Data-Driven Approach

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations