Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2512089.2512101acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Document sublanguage clustering to detect medical specialty in cross-institutional clinical texts

Published: 01 November 2013 Publication History

Abstract

This paper reports on a set of studies designed to identify sublanguages in documents for domain-specific processing across institutions. Psychological evidence indicates that humans use context-specific linguistic information when they read. Natural Language Processing (NLP) pipelines are successful within specific domains (i.e., contexts). To limit the number of domain-specific NLP systems, a natural focus would be on sublanguages. Sublanguages are identified by shared lexical and semantic features.[1] Patterson and Hurdle[2] developed a sublanguage identification system that functioned well for 12 clinical specialties at the University of Utah. The current work compares sublanguages across institutions. Using a clinical NLP pipeline augmented by a new document corpus from the University of Pittsburg (UPitt), new documents were assigned to clusters based on the minimum cosine-distance to a Utah cluster centroid. The UPitt documents were divided into a nine-group specialty corpus. Across institutions, five of the specialty groups fell within the expected clusters. We find that clustering encounters difficulty due to documents with mixed sublanguages; naming convention differences across institutions; and document types used across specialties. The findings indicate that clinical specialty sublanguages can be identified across institutions.

References

[1]
Z. S. Harris, A theory of language and information: A mathematical approach. Oxford and New York: Clarendon Press, 1991.
[2]
O. Patterson and J. F. Hurdle, "Document clustering of clinical narratives: a systematic study of clinical sublanguages." AMIA Annu Symp Proc, vol. 2011, pp. 1099--1107, 2011.
[3]
P. Jindal and D. Roth, "Using domain knowledge and domain-inspired discourse model for coreference resolution for clinical narratives," JAMI, vol. 20, no. 2, pp. 356--362, Feb. 2013.
[4]
N. A. Smith and A. F. T. Martins, "Linguistic structure prediction with the sparseptron," XRDS, vol. 19, no. 3, p. 44, Mar. 2013.
[5]
H. Daumé III and D. Marcu, "Domain Adaptation for Statistical Classifiers.," J. Artif. Intell. Res.(JAIR), vol. 26, pp. 101--126, 2006.
[6]
M. Walenski and M. T. Ullman, "The science of language," The Linguistic Review, vol. 22, no. 2, pp. 327--346, 2005.
[7]
T. A. Farmer, A. B. Fine, and T. F. Jaeger, "Implicit context-specific learning leads to rapid shifts in syntactic expectations," Proc of the 33rd Annu Meet of the Cogn Science Socy, pp. 2055--2060, 2011.
[8]
M. Traxler, Introduction to Psycholinguistics. Wiley-Blackwell, 2012.
[9]
M. Krallinger, A. Valencia, and L. Hirschman, "Linking genes to literature: text mining, information extraction, and retrieval applications for biology," Genome Biol, vol. 9, no. 2, p. S8, 2008.
[10]
Z. S. Harris, "The structure of science information," J Biomed Inform, vol. 35, no. 4, pp. 215--221, Aug. 2002.
[11]
D. A. Campbell and S. B. Johnson, "Comparing syntactic complexity in medical and non-medical corpora.," Proc AMIA Symp, pp. 90--94, 2001.
[12]
C. Friedman, P. Kra, and A. Rzhetsky, "Two biomedical sublanguages: a description based on the theories of Zellig Harris," J Biomed Inform, vol. 35, no. 4, pp. 222--235, Aug. 2002.
[13]
Q. T. Zeng, D. Redd, G. Divita, S. Jarad, and C. Brandt, "Characterizing Clinical Text and Sublanguage: A Case Study of the VA Clinical Notes," J Health Med Informat S, vol. 3, p. 2, 2011.
[14]
A. R. Aronson and F.-M. Lang, "An overview of MetaMap: historical perspective and recent advances.," JAMIA, vol. 17, no. 3, pp. 229--236, May 2010.
[15]
O. Patterson, S. Igo, and J. F. Hurdle, "Automatic acquisition of sublanguage semantic schema: towards the word sense disambiguation of clinical narratives," AMIA Annu Symp Proc,vol. 2010, p. 612, 2010.
[16]
Y. Zhao and G. Karypis, "Data clustering in life sciences.," Mol. Biotechnol., vol. 31, no. 1, pp. 55--80, Sep. 2005.
[17]
S. B. Johnson, S. Bakken, D. Dine, S. HYUN, E. Mendonca, F. Morrison, T. Bright, T. Van Vleck, J. Wrenn, and P. Stetson, "An Electronic Health Record Based on Structured Narrative," JAMIA, vol. 15, no. 1, pp. 54--64, Oct. 2007.
[18]
L. L. Weed, "The Problem Oriented Record as a Basic Tool in Medical Education, Patient Care, and Research," Ann Clin Res, vol. 3, no. 3, Jan. 1971.
[19]
R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. New York: John Wiley & Sons, 1973.

Cited By

View all
  • (2022)Sublanguage Characteristics of Clinical Documents2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM55620.2022.9995620(3280-3286)Online publication date: 6-Dec-2022
  • (2021)Collecting specialty-related medical terms: Development and evaluation of a resource for SpanishBMC Medical Informatics and Decision Making10.1186/s12911-021-01495-w21:1Online publication date: 4-May-2021
  • (2020)A Clustering Algorithm Based on Document Embedding to Identify Clinical Note TemplatesAnnals of Data Science10.1007/s40745-020-00296-8Online publication date: 6-Jun-2020
  • Show More Cited By

Index Terms

  1. Document sublanguage clustering to detect medical specialty in cross-institutional clinical texts

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DTMBIO '13: Proceedings of the 7th international workshop on Data and text mining in biomedical informatics
    November 2013
    38 pages
    ISBN:9781450324199
    DOI:10.1145/2512089
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 November 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cognitive science
    2. medical informatics applications
    3. natural language processing

    Qualifiers

    • Research-article

    Conference

    CIKM'13
    Sponsor:

    Acceptance Rates

    DTMBIO '13 Paper Acceptance Rate 11 of 18 submissions, 61%;
    Overall Acceptance Rate 41 of 247 submissions, 17%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Sublanguage Characteristics of Clinical Documents2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM55620.2022.9995620(3280-3286)Online publication date: 6-Dec-2022
    • (2021)Collecting specialty-related medical terms: Development and evaluation of a resource for SpanishBMC Medical Informatics and Decision Making10.1186/s12911-021-01495-w21:1Online publication date: 4-May-2021
    • (2020)A Clustering Algorithm Based on Document Embedding to Identify Clinical Note TemplatesAnnals of Data Science10.1007/s40745-020-00296-8Online publication date: 6-Jun-2020
    • (2019)Discovering Sublanguages in a Large Clinical Corpus through Unsupervised Machine Learning and Information Gain2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9006492(4889-4898)Online publication date: Dec-2019
    • (2019)Ensembles of natural language processing systems for portable phenotyping solutionsJournal of Biomedical Informatics10.1016/j.jbi.2019.103318100(103318)Online publication date: Dec-2019
    • (2019)Detecting Secular Trends in Clinical Treatment through Temporal AnalysisJournal of Medical Systems10.1007/s10916-019-1173-043:3(1-7)Online publication date: 1-Mar-2019
    • (2018)Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalableJournal of Biomedical Informatics10.1016/j.jbi.2018.08.00285(106-113)Online publication date: Sep-2018
    • (2018)Natural Language Processing and Its Implications for the Future of Medication Safety: A Narrative Review of Recent Advances and ChallengesPharmacotherapy: The Journal of Human Pharmacology and Drug Therapy10.1002/phar.215138:8(822-841)Online publication date: 22-Jul-2018
    • (2017)Detection of disease from radiology2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS)10.1109/ICIIECS.2017.8276174(1-4)Online publication date: Mar-2017
    • (2015)Clinical Documents Clustering Based on Medication/Symptom Names Using Multi-View Nonnegative Matrix FactorizationIEEE Transactions on NanoBioscience10.1109/TNB.2015.242261214:5(500-504)Online publication date: Jul-2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media