Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2487788.2488003acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

FS-NER: a lightweight filter-stream approach to named entity recognition on twitter data

Published: 13 May 2013 Publication History

Abstract

Microblog platforms such as Twitter are being increasingly adopted by Web users, yielding an important source of data for web search and mining applications. Tasks such as Named Entity Recognition are at the core of many of these applications, but the effectiveness of existing tools is seriously compromised when applied to Twitter data, since messages are terse, poorly worded and posted in many different languages. Also, Twitter follows a streaming paradigm, imposing that entities must be recognized in real-time. In view of these challenges and the inappropriateness of existing tools, we propose a novel approach for Named Entity Recognition on Twitter data called FS-NER (Filter-Stream Named Entity Recognition). FS-NER is characterized by the use of filters that process unlabeled Twitter messages, being much more practical than existing supervised CRF-based approaches. Such filters can be combined either in sequence or in parallel in a flexible way. Moreover, because these filters are not language dependent, FS-NER can be applied to different languages without requiring a laborious adaptation. Through a systematic evaluation using three Twitter collections and considering seven types of entity, we show that FS-NER performs 3% better than a CRF-based baseline, besides being orders of magnitude faster and much more practical.

References

[1]
E. Amigó, J. Artiles, J. Gonzalo, D. Spina, B. Liu, and A. Corujo. WePS3 Evaluation Campaign: Overview of the On-line Reputation Management Task. In Proc of CLEF, 2010.
[2]
G. Crane and A. Jones. The Challenge of Virginia Banks: An Evaluation of Named Entity Analysis in a 19th-Century Newspaper Collection. In Proc. of JCDL, pages 31--40, 2006.
[3]
G. Doddington, A. Mitchell, M. Przybocki, L. Ramshaw, S. Strassel, and R. Weischedel. The Automatic Content Extraction (ACE) Program - Tasks, Data, and Evaluation. In Proc. of LREC, pages 837--840, 2004.
[4]
A. Ekbal and S. Saha. Maximum Entropy Classifier Ensembling using Genetic Algorithm for NER in Bengali. In Proc. of LREC, 2010.
[5]
T. Finin, W. Murnane, A. Karandikar, N. Keller, J. Martineau, and M. Dredze. Annotating named entities in Twitter data with crowdsourcing. In Proc. of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pages 80--88, 2010.
[6]
K. Gimpel, N. Schneider, B. O'Connor, D. Das, D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N. A. Smith. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments. In Proc. of ACL (Short Papers), pages 42--47, 2011.
[7]
L. Hong, G. Convertino, and E. H. Chi. Language Matters In Twitter: A Large Scale Study. In Proc. of ICWSM, 2011.
[8]
W. Hua, D. T. Huynh, S. Hosseini, J. Lu, and X. Zhou. Information Extraction From Microblogs: A Survey. Int. J. Soft. and Informatics, 6(4):495--522, 2012.
[9]
J. J. Jung. Online Named Entity Recognition Method for Microtexts in Social Networking Services: A Case Study of Twitter. Expert Systems with Applications, 39(9):8066--8070, 2012.
[10]
C. Li, J. Weng, Q. He, Y. Yao, A. Datta, A. Sun, and B.-S. Lee. TwiNER: named entity recognition in targeted twitter stream. In Proc. of SIGIR, pages 721--730, 2012.
[11]
X. Liu, S. Zhang, F. Wei, and M. Zhou. Recognizing Named Entities in Tweets. In Proc. of ACL, pages 359--367, 2011.
[12]
B. Locke and J. Martin. Named Entity Recognition: Adapting to Microblogging. Technical report, University of Colorado, 2009.
[13]
M. Michelson and S. A. Macskassy. Discovering Users' Topics of Interest on Twitter: a First Look. In Proc. of the Fourth workshop on Analytics for Noisy Unstructured Text Data, pages 73--80, Oct. 2010.
[14]
D. Nadeau and S. Sekine. A Survey of Named Entity Recognition and Classification. Linguisticae Investigationes, 30(1):3--26, 2007.
[15]
D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-Labeled Corpora. In Proc. of EMNLP, pages 248--256, 2009.
[16]
A. Ritter, S. Clark, Mausam, and O. Etzioni. Named Entity Recognition in Tweets: An Experimental Study. In Proc. of EMNLP, pages 1524--1534, 2011.
[17]
M. Rössler. Using Markov Models for Named Entity Recognition in German Newspapers. In Proc. of the Workshop on Machine Learning Approaches in Computational Linguistics, pages 29--37, 2002.

Cited By

View all
  • (2022)A Multi-Task BERT-BiLSTM-AM-CRF Strategy for Chinese Named Entity RecognitionNeural Processing Letters10.1007/s11063-022-10933-355:2(1209-1229)Online publication date: 12-Jul-2022
  • (2022)Review of Research on Named Entity RecognitionAdvances in Artificial Intelligence and Security10.1007/978-3-031-06761-7_21(256-267)Online publication date: 8-Jul-2022
  • (2021)Chinese Named Entity Recognition Method Based on BERTArtificial Intelligence and Robotics Research10.12677/AIRR.2021.10302110:03(215-223)Online publication date: 2021
  • Show More Cited By

Index Terms

  1. FS-NER: a lightweight filter-stream approach to named entity recognition on twitter data

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web
    May 2013
    1636 pages
    ISBN:9781450320382
    DOI:10.1145/2487788

    Sponsors

    • NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
    • CGIBR: Comite Gestor da Internet no Brazil

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 May 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. crf
    2. fs-ner
    3. named entity recognition
    4. twitter

    Qualifiers

    • Research-article

    Conference

    WWW '13
    Sponsor:
    • NICBR
    • CGIBR
    WWW '13: 22nd International World Wide Web Conference
    May 13 - 17, 2013
    Rio de Janeiro, Brazil

    Acceptance Rates

    WWW '13 Companion Paper Acceptance Rate 831 of 1,250 submissions, 66%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 28 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)A Multi-Task BERT-BiLSTM-AM-CRF Strategy for Chinese Named Entity RecognitionNeural Processing Letters10.1007/s11063-022-10933-355:2(1209-1229)Online publication date: 12-Jul-2022
    • (2022)Review of Research on Named Entity RecognitionAdvances in Artificial Intelligence and Security10.1007/978-3-031-06761-7_21(256-267)Online publication date: 8-Jul-2022
    • (2021)Chinese Named Entity Recognition Method Based on BERTArtificial Intelligence and Robotics Research10.12677/AIRR.2021.10302110:03(215-223)Online publication date: 2021
    • (2021)Named Entity Recognition and Relation ExtractionACM Computing Surveys10.1145/344596554:1(1-39)Online publication date: 11-Feb-2021
    • (2020)BLAC: A Named Entity Recognition Model Incorporating Part-of-Speech Attention in Irregular Short Text2020 IEEE International Conference on Real-time Computing and Robotics (RCAR)10.1109/RCAR49640.2020.9303256(56-61)Online publication date: 28-Sep-2020
    • (2018)Adaptive co-attention network for named entity recognition in tweetsProceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence10.5555/3504035.3504731(5674-5681)Online publication date: 2-Feb-2018
    • (2018)Ego-Centric Analysis of Supportive NetworksProceedings of the 10th ACM Conference on Web Science10.1145/3201064.3201099(281-285)Online publication date: 15-May-2018
    • (2018)An Attention Factor Graph Model for Tweet Entity LinkingProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186012(1135-1144)Online publication date: 10-Apr-2018
    • (2018)Learning Transferable Features For Open-Domain Question Answering2018 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2018.8489057(1-8)Online publication date: Jul-2018
    • (2018)A Social Media Platform for Infectious Disease AnalyticsComputational Science and Its Applications – ICCSA 201810.1007/978-3-319-95162-1_36(526-540)Online publication date: 4-Jul-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media