Sentence Role Identification in Medline Abstracts: Training Classifier with Structured Abstracts

Masashi Shimbo²²,
Takahiro Yamasaki²² &
Yuji Matsumoto²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3430))

764 Accesses
1 Citations

Abstract

The abstract of a scientific paper typically consists of sentences describing the background of study, its objective, experimental method and results, and conclusions. We discuss the task of identifying which of these “structural roles” each sentence in abstracts plays, with a particular focus on its application in building a literature retrieval system. By annotating sentences in an abstract collection with role labels, we can build a literature retrieval system in which users can specify the roles of the sentences in which query terms should be sought. We argue that this facility enables more goal-oriented search, and also makes it easier to narrow down search results when adding extra query terms does not work. To build such a system, two issues need to be addressed: (1) how we should determine the set of structural roles presented to users from which they can choose the target search area, and (2) how we should classify each sentence in abstracts by their structural roles, without relying too much on human supervision. We view the task of role identification as that of text classification based on supervised machine learning. Our approach is characterized by the use of structured abstracts for building training data. In structured abstracts, which is a format of abstracts popular in biomedical domains, sections are explicitly marked with headings indicating their structural roles, and hence they provide us with an inexpensive way to collect training data for sentence classifiers. Statistics on the structured abstracts contained in Medline give an insight on determining the set of sections to be presented to users as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts

Towards Constructing a Corpus for Studying the Effects of Treatments and Substances Reported in PubMed Abstracts

Introduction to Biomedical Literature Text Mining: Context and Objectives

References

Ad Hoc Working Group for Critical Appraisal of Medical Literature. A proposal for more informative abstracts of clinical articles 106(4), 598–604 (1987)
Google Scholar
Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of the Second Meeting of North American Chapter of Association for Computational Linguistics (NAACL 2000), pp. 132–139 (2000)
Google Scholar
Collins, M.: Head-Driven Statistical Models for Natural Language Processing. PhD dissertation, University of Pennsylvania (1999)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20, 273–297 (1995)
MATH Google Scholar
Halliday, M.A.K., Hasan, R.: Cohesion in English. Longman, London (1976)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning (ICML 2001), pp. 282–289. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Lawrence, S., Giles, C.L., Bollacker, K.: Digital libraries and autonomous citation indexing. IEEE Computer 32(6), 67–71 (1999)
Google Scholar
Lee, L., Pereira, F.: Measures of distributional similarity. In: Proceedings of the 37th Annual Meeting of the Association for Comutational Linguistics (ACL 1999), pp. 25–32 (1999)
Google Scholar
MEDLINE, U.S. National Library of Medicine (2002–2003) http://www.nlm.nih.gov/databases/databases_medline.html
Namazu (2000), http://www.namazu.org/
PubMed. U.S. National Library of Medicine (2003), http://www.ncbi.nlm.nih.gov/PubMed/
Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)
Google Scholar
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of the Human Language Technology Conference North American Chapter of Association for Computational Linguistics (HLT NAACL 2003), Edmonton, Alberta, Canada, pp. 213–220. Association for Computational Linguistics (2003)
Google Scholar
UIUC sentence splitter software, University of Illinois at Urbana-Champaign (2001), http://l2r.cs.uiuc.edu/~cogcomp/cc-software.htm
Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, Chichester (1998)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, 630-0192, Japan
Masashi Shimbo, Takahiro Yamasaki & Yuji Matsumoto

Authors

Masashi Shimbo
View author publications
You can also search for this author in PubMed Google Scholar
Takahiro Yamasaki
View author publications
You can also search for this author in PubMed Google Scholar
Yuji Matsumoto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Shimane University, 89-1 Enya-cho Izumo, 6938501, Shimane, Japan
Shusaku Tsumoto
Faculty of Science and Technology, Keio University, 3-14-1 Hiyoshi Kohoku-ku, 223-8522, Yokohama, Japan
Takahira Yamaguchi
The Institute of Scientific and Industrial Research, Osaka University, Japan
Masayuki Numao
Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki, 567-0047, Osaka, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shimbo, M., Yamasaki, T., Matsumoto, Y. (2005). Sentence Role Identification in Medline Abstracts: Training Classifier with Structured Abstracts. In: Tsumoto, S., Yamaguchi, T., Numao, M., Motoda, H. (eds) Active Mining. Lecture Notes in Computer Science(), vol 3430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11423270_13

Download citation

DOI: https://doi.org/10.1007/11423270_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26157-5
Online ISBN: 978-3-540-31933-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Sentence Role Identification in Medline Abstracts: Training Classifier with Structured Abstracts

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts

Towards Constructing a Corpus for Studying the Effects of Treatments and Substances Reported in PubMed Abstracts

Introduction to Biomedical Literature Text Mining: Context and Objectives

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Sentence Role Identification in Medline Abstracts: Training Classifier with Structured Abstracts

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts

Towards Constructing a Corpus for Studying the Effects of Treatments and Substances Reported in PubMed Abstracts

Introduction to Biomedical Literature Text Mining: Context and Objectives

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation