Abstract
The identification and extraction of technical terms is one of the better understood and most robust natural language processing (NLP) technologies within the current state of the art of language engineering. What is particularly interesting here is the clear understanding how to derive, from their linguistic properties, computational procedures for reliable identification and extraction of terms from technical, scientific, prose. In generic information management contexts, terms have been associated both with procedures seeking to identify a term set which uniquely distinguishes a document within a nearly homogenous document collection, and with procedures seeking to extract a representative terms sample which uniquely characterises a document's content. There is a wide range of uses for terminology, commonly identified with e.g. text indexing, computational lexicology, and machine-assisted translation; most of these employ the notion of terminology being representative of a given domain. This paper discusses some specific extensions of the terminology identification technology to make it fully capable of domain specification; it also presents extensions of the technology beyond domain specification, to the purpose of document characterisation. These extensions make terminology identification the foundation of an operational environment for document processing and content characterisation and abstraction; more generally, it becomes an immensely empowering technology in the age of growing information overload.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Apple Computer, Inc., 20525 Mariani Avenue, Cupertino, CA 95014-6299. Macintosh User's Guide, 1994.
B. Boguraev. WORDWEB and APPLE GUIDE: a comparative evaluation. Technical report, Internal Report, Advanced Technologies Group, Apple Computer, 1995.
B. Boguraev. Content analysis via lexical semantics. The Apple Research Labs Review, pages 2–13, September 1996.
B. Boguraev and C. Kennedy. Salience-based content characterisation of text documents. In Proceedings of ACL'97 Workshop on Intelligent, Scalable Text Summarisation, Madrid, Spain, 1997.
B. Boguraev and J. Pustejovsky, editors. Corpus processing for lexical acquisition. MIT Press, Cambridge, Mass, 1996.
D. Bourigault. Surface grammatical analysis for the extraction of terminological noun phrases. In 14th International Conference on Computational Linguistics, Nantes, France, 1992.
J. Buchan. Heart's journey in winter. Harvill Collins, London, 1996.
I. Dagan and K. Church. Termight: identifying and translating technical terminology. In 4th Conference on Applied Natural Language Processing, Stuttgart, Germany, 1995.
M. Hearst. Multi-paragraph segmentation of expository text. In 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, 1994.
I. Heim. The semantics of definite and indefinite noun phrases. PhD thesis, University of Massachusetts, Department of Linguistics, Amherst, MA, 1981. unpublished.
J. Hodges, S. Yie, R. Reighart, and L. Bogges. An automated system that assists in the generation of document indexes. Natural Language Engineering, 2:137–160, 1996.
N. Hutheesing. Gilbert Amelio's grand scheme to rescue Apple. Forbes Magazine, December 16, 1996.
M. Johnston, B. Boguraev, and J. Pustejovsky. The structure and interpretation of compound nominals. In AAAI Spring Symposium on Generativity and the Lexicon, Stanford, 1994.
J. S. Justeson and S. M. Katz. Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1(1):927, 1995.
L. Karttunen. Discourse referents. In J. McCawley, editor, Syntax and Semantics. Academic Press, New York, NY, 1968.
C. Kennedy and B. Boguraev. Anaphora for everyone: Pronominal anaphora resolution without a parser, In Proceedings of COLING-96 (16th International Conference on Computational Linguistics), Copenhagen, DK, 1996.
C. Kennedy and B. Boguraev. Anaphora in a wider context: Tracking discourse referents. In W. Wahlster, editor, Proceedings of ECAI-96 (12th European Conference on Artificial Intelligence), Budapest, Hungary, 1996. John Wiley and Sons, Ltd, London/New York.
S. Lappin and H. Leass. An algorithm for pronominal anaphora resolution. Computational Linguistics, 20(4):535–561, 1994.
I. Mani and T. R. MacMillan. Identifying unknown proper names in newswire text. In B. Boguraev and J. Pustejovsky, editors, Corpus Processing for Lexical Acquisition, pages 41–60. MIT Press, 1996.
M. M. McCord. Slot grammar: a system for simpler construction of practical natural language grammars. In R. Studer, editor, Natural language and logic: international scientific symposium, Lecture Notes in Computer Science, pages 118–145. Springer Verlag, Berlin, 1990.
G. Salton. Syntactic approaches to automatic book indexing. In 26th Annual Meeting of the Association for Computational Linguistics, Buffalo, New York, 1988.
G. Salton, Z. Zhao, and C. Buckley. A simple syntactic approach for the generation of indexing phrases. Technical Report 90-1137, Department of Computer Science, Cornell University, 1990.
S. Waterman. Distinguished usage. In B. Boguraev and J. Pustejovsky, editors, Corpus processing for domain acquisition, pages 143–172. MIT Press, Cambridge, MA, 1996.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boguraev, B., Kennedy, C. (1997). Technical terminology for domain specification and content characterisation. In: Pazienza, M.T. (eds) Information Extraction A Multidisciplinary Approach to an Emerging Information Technology. SCIE 1997. Lecture Notes in Computer Science, vol 1299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63438-X_5
Download citation
DOI: https://doi.org/10.1007/3-540-63438-X_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63438-6
Online ISBN: 978-3-540-69548-6
eBook Packages: Springer Book Archive