research-article

Trie-based rule processing for clinical NLP: : A use-case study of n-trie, making the ConText algorithm more efficient and scalable

Authors:

Jianlin Shi,

John F. HurdleAuthors Info & Claims

Volume 85, Issue C

Pages 106 - 113

https://doi.org/10.1016/j.jbi.2018.08.002

Published: 01 September 2018 Publication History

Graphical abstract

Display Omitted

Highlights

•

N-trie, a new hash trie IE rule engine designed for clinical NLP, is introduced.

•

The engine was designed to increase the efficiency/scalability of rule-base NLP.

•

Using the ConText algorithm as a use case, N-trie exhibits superior execution time.

•

N-trie gracefully accommodates the addition of new rules to improve accuracy.

•

Trie-based hashing has significant potential in other rule-based NLP tasks.

Abstract

Objective

To develop and evaluate an efficient Trie structure for large-scale, rule-based clinical natural language processing (NLP), which we call n-trie.

Background

Despite the popularity of machine learning techniques in natural language processing, rule-based systems boast important advantages: distinctive transparency, ease of incorporating external knowledge, and less demanding annotation requirements. However, processing efficiency remains a major obstacle for adopting standard rule-base NLP solutions in big data analyses.

Methods

We developed n-trie to specifically address the token-based nature of context detection, an important facet of clinical NLP that is known to slow down NLP pipelines. N-trie, a new rule processing engine using a revised Trie structure, allows fast execution of lexicon-based NLP rules. To determine its applicability and evaluate its performance, we applied the n-trie engine in an implementation (called FastContext) of the ConText algorithm and compared its processing speed and accuracy with JavaConText and GeneralConText, two widely used Java ConText implementations, as well as with a standalone machine learning NegEx implementation, NegScope.

Results

The n-trie engine ran two orders of magnitude faster and was far less sensitive to rule set size than the comparison implementations, and it proved faster than the best machine learning negation detector. Additionally, the engine consistently gained accuracy improvement as the rule set increased (the desired outcome of adding new rules), while the other implementations did not.

Conclusions

The n-trie engine is an efficient, scalable engine to support NLP rule processing and shows the potential for application in other NLP tasks beyond context detection.

References

[1]

M.K. Ross, W. Wei, L. Ohno-Machado, “Big Data” and the electronic health record, Yearb. Med. Inform. 9 (2014) 97–104,.

Graphical abstract

Highlights

Abstract

Objective

Background

Methods

Results

Conclusions

References

Cited By

Index Terms

Recommendations

Evaluation of Word Embedding Models in Latvian NLP Tasks Based on Publicly Available Corpora

Comparative Analysis of Rule-Based, Dictionary-Based and Hybrid Stemmers for Gujarati Language

Algorithmic stemmers or morphological analysis? An evaluation

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations