Abstract
This paper presents an SVM-based learning system for information extraction (IE). One distinctive feature of our system is the use of a variant of the SVM, the SVM with uneven margins, which is particularly helpful for small training datasets. In addition, our approach needs fewer SVM classifiers to be trained than other recent SVM-based systems. The paper also compares our approach to several state-of-the-art systems (including rule learning and statistical learning algorithms) on three IE benchmark datasets: CoNLL-2003, CMU seminars, and the software jobs corpus. The experimental results show that our system outperforms a recent SVM-based system on CoNLL-2003, achieves the highest score on eight out of 17 categories on the jobs corpus, and is second best on the remaining nine.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bender, O., Och, F.J., Ney, H.: Maximum entropy models for named entity recognition. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL 2003, Edmonton, Canada, pp. 148–151 (2003)
Califf, M.E.: Relational learning techniques for natural language information extraction. PhD thesis, University of Texas at Austin (1998)
Chieu, H.L., Ng, H.T.: A Maximum Entropy Approach to Information Extraction from Semi-Structured and Free Text. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence, pp. 786–791 (2002)
Chieu, H.L., Ng, H.T.: Named entity recognition: A maximum entropy approach using global information. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan (2002)
Chieu, H.L., Ng, H.T.: Named entity recognition with a maximum entropy approach. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL 2003, Edmonton, Canada, pp. 160–163 (2003)
Cimiano, P., Handschuh, S., Staab, S.: Towards the self-Annotating Web. In: Proceedings of WWW 2004 (2004)
Ciravegna, F.: (LP)2, an adaptive algorithm for information extraction from web related texts. In: Proceedings of the IJCAI 2001 Workshop on Adaptive Text Extraction and Mining, Seattle (2001)
Ciravegna, F. (LP)2, Rule Induction for Information Extraction Using Linguistic Constraints. Technical Report CS-03-07, Department of Computer Science, University of Sheffield, Sheffield (September 2003)
Ciravegna, F., Wilks, Y.: Designing adaptive information extraction for the semantic Web in Amilcare. In: Handschuh, S., Staab, S. (eds.) Annotation for the Semantic Web. IOS Press, Amsterdam (2003)
Cunningham, H.: Information extraction, automatic. Encyclopedia of Language and Linguistics, 2nd edn. (2005)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, ACL 2002 (2002)
Curran, J.R., Clark, S.: Language independent NER using a maximum entropy tagger. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL 2003, Edmonton, Canada, pp. 164–167 (2003)
Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL 2003, Edmonton, Canada, pp. 168–171 (2003)
Freigtag, D., McCallum, A.K.: Information extraction with HMMs and shrinkage. In: Proceedings of Workshop on Machine Learning for Information Extraction, pp. 31–36 (1999)
Freitag, D.: Information extraction from html: Application of a general learning approach. In: Proceedings of the Fifteenth Conference on Artificial Intelligence AAAI 1998, pp. 517–523 (1998)
Freitag, D.: Machine Learning for Information Extraction in Informal Domains. PhD thesis, Carnegie Mellon University (1998)
Freitag, D.: Machine Learning for Information Extraction in Informal Domains. Machine Learning 39, 169–202 (2000)
Freitag, D., Kushmerick, N.: Boosted Wrapper Induction. In: Proceedings of AAAI 2000 (2000)
Isozaki, H., Kazawa, H.: Efficient Support Vector Classifiers for Named Entity Recognition. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, pp. 390–396 (2002)
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)
Li, Y., Shawe-Taylor, J.: The SVM with uneven margins and Chinese document categorization. In: Proceedings of The 17th Pacific Asia Conference on Language, Information and Computation (PACLIC17), Singapore, October 2003, pp. 216–227 (2003)
Mayfield, J., McNamee, P., Piatko, C.: Named entity recognition using hundreds of thousands of features. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL 2003, Edmonton, Canada, pp. 184–187 (2003)
Roth, D., Yih, W.T.: Relational learning via propositional algorithms: an information extraction case study. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI 2001), pp. 1257–1263 (2001)
SAIC. Proceedings of the Seventh Message Understanding Conference, MUC-7 (1998), http://www.itl.nist.gov/iaui/894.02/relatedprojects/muc/index.html
Sang, E.F., Meulder, F.D.: Introduction to the CoNLL 2003 shared task: language-independent named entity recognition. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL 2003, Edmonton, Canada, pp. 142–147 (2003)
Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34, 233–272 (1999)
Song, Y., Yi, E., Kim, E., Lee, G.G.: POSBIOTM-NER: a machine learning approach for bio-named entity recognition. In: Workshop on a critical assessment of text mining methods in molecular biology, Granada, Spain (2004), http://www.pdg.cnb.uam.es/BioLINK/workshopBioCreative04/
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, Y., Bontcheva, K., Cunningham, H. (2005). SVM Based Learning System for Information Extraction. In: Winkler, J., Niranjan, M., Lawrence, N. (eds) Deterministic and Statistical Methods in Machine Learning. DSMML 2004. Lecture Notes in Computer Science(), vol 3635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11559887_19
Download citation
DOI: https://doi.org/10.1007/11559887_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29073-5
Online ISBN: 978-3-540-31728-9
eBook Packages: Computer ScienceComputer Science (R0)