Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1076034.1076095acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Using term informativeness for named entity detection

Published: 15 August 2005 Publication History

Abstract

Informal communication (e-mail, bulletin boards) poses a difficult learning environment because traditional grammatical and lexical information are noisy. Other information is necessary for tasks such as named entity detection. How topic-centric, or informative, a word is can be valuable information. It is well known that informative words are best modeled by "heavy-tailed" distributions, such as mixture models. However, informativeness scores do not take full advantage of this fact. We introduce a new informativeness score that directly utilizes mixture model likelihood to identify informative words. We use the task of extracting restaurant names from bulletin board posts as a way to determine effectiveness. We find that our "mixture score" is weakly effective alone and highly effective when combined with Inverse Document Frequency. We compare against other informativeness criteria and find that only Residual IDF is competitive against our combined IDF/Mixture score.

References

[1]
D. M. Bikel, R. L. Schwartz, and R. M. Weischedel. An algorithm that learns what's in a name. Machine Learning, 34:211--231, 1999.
[2]
A. Bookstein and D. R. Swanson. Probabilistic models for automatic indexing. Journal of the American Society for Information Science, 25(5):312--318, 1974.
[3]
B. C. Brookes. The measure of information retrieval effectivenss proposed by Swets. Journal of Documentation, 24:41--54, 1968.
[4]
K. W. Church and W. A. Gale. Inverse document frequency (IDF): A measure of deviation from poisson. In Proceedings of the Third Workshop on Very Large Corpora, pages 121--130, 1995.
[5]
K. W. Church and W. A. Gale. Poisson mixtures. Journal of Natural Language Engineering, 1995.
[6]
C. Clifton and R. Cooley. TopCat: Data mining for topic identification in a text corpus. In Proceedings of the 3rd European Conference of Principles and Practice of Knowledge Discovery in Databases, 1999.
[7]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society series B, 39:1--38, 1977.
[8]
S. P. Harter. A probabilistic approach to automatic keyword indexing: Part I. On the distribution of specialty words in a technical literature. Journal of the American Society for Information Science, 26(4):197--206, 1975.
[9]
M. Hollander and D. A. Wolfe. Nonparametric Statistical Methods. John Wiley & Sons, 1999.
[10]
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the Tenth European Conference on Machine Learning, 1998.
[11]
K. S. Jones. Index term weighting. Information Storage and Retrieval, 9:619--633, 1973.
[12]
K. Papineni. Why inverse document frequency. In Proceedings of the NAACL, 2001.
[13]
J. D. M. Rennie, L. Shih, J. Teevan, and D. R. Karger. Tackling the poor assumptions of naive bayes text classifiers. In Proceedings of the Twentieth International Conference on Machine Learning, 2003.
[14]
R. Rifkin. Everything Old Is New Again: A Fresh Look at Historical Approaches in Machine Learning. PhD thesis, Massachusetts Institute of Technology, 2002.
[15]
C. J. van Rijsbergen. Information Retireval. Butterworths, London, 1979.
[16]
F. Wilcoxon. Individual comparisons by ranking methods. Biometrics, 1:80--83, 1945.

Cited By

View all
  • (2023)A Comprehensive Python Library for Deep Learning-Based Event Detection in Multivariate Time Series Data2023 International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA58977.2023.00211(1399-1404)Online publication date: 15-Dec-2023
  • (2022)Active Learning Strategies Based on Text Informativeness2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)10.1109/WI-IAT55865.2022.00015(32-39)Online publication date: Nov-2022
  • (2020)Textual Statistics and Named Entity Recognition Applied to Game of Thrones NovelsWeb, Artificial Intelligence and Network Applications10.1007/978-3-030-44038-1_20(214-222)Online publication date: 31-Mar-2020
  • Show More Cited By

Index Terms

  1. Using term informativeness for named entity detection

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
      August 2005
      708 pages
      ISBN:1595930345
      DOI:10.1145/1076034
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 August 2005

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. inverse document frequency
      2. mixture models
      3. named entity extraction
      4. term frequency distribution

      Qualifiers

      • Article

      Conference

      SIGIR05
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)A Comprehensive Python Library for Deep Learning-Based Event Detection in Multivariate Time Series Data2023 International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA58977.2023.00211(1399-1404)Online publication date: 15-Dec-2023
      • (2022)Active Learning Strategies Based on Text Informativeness2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)10.1109/WI-IAT55865.2022.00015(32-39)Online publication date: Nov-2022
      • (2020)Textual Statistics and Named Entity Recognition Applied to Game of Thrones NovelsWeb, Artificial Intelligence and Network Applications10.1007/978-3-030-44038-1_20(214-222)Online publication date: 31-Mar-2020
      • (2019)SurfKE: A Graph-Based Feature Learning Framework for Keyphrase Extractionundefined10.12794/metadc1538730Online publication date: Aug-2019
      • (2018)A Multimodal-Sensor-Enabled Room for Unobtrusive Group Meeting AnalysisProceedings of the 20th ACM International Conference on Multimodal Interaction10.1145/3242969.3243022(347-355)Online publication date: 2-Oct-2018
      • (2018)Comparative Analysis of the Informativeness and Encyclopedic Style of the Popular Web Information SourcesBusiness Information Systems10.1007/978-3-319-93931-5_24(333-344)Online publication date: 16-Jun-2018
      • (2017)IDF for Word N-gramsACM Transactions on Information Systems10.1145/305277536:1(1-38)Online publication date: 5-Jun-2017
      • (2017)Deep keyphrase generation with a convolutional sequence to sequence model2017 4th International Conference on Systems and Informatics (ICSAI)10.1109/ICSAI.2017.8248519(1477-1485)Online publication date: Nov-2017
      • (2016)Machine Learning and Decision Support in Critical CareProceedings of the IEEE10.1109/JPROC.2015.2501978104:2(444-466)Online publication date: Feb-2016
      • (2015)Discovering expansion entities for keyword-based entity search in linked dataJournal of Information Science10.1177/016555151456270441:2(209-227)Online publication date: 6-Jan-2015
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media