Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Full text indexing based on lexical relations an application: software libraries

Published: 01 May 1989 Publication History

Abstract

In contrast to other kinds of libraries, software libraries need to be conceptually organized. When looking for a component, the main concern of users is the functionality of the desired component; implementation details are secondary. Software reuse would be enhanced with conceptually organized large libraries of software components. In this paper, we present GURU, a tool that allows automatical building of such large software libraries from documented software components. We focus here on GURU's indexing component which extracts conceptual attributes from natural language documentation. This indexing method is based on words' co-occurrences. It first uses EXTRACT, a co-occurrence knowledge compiler for extracting potential attributes from textual documents. Conceptually relevant collocations are then selected according to their resolving power, which scales down the noise due to context words. This fully automated indexing tool thus goes further than keyword-based tools in the understanding of a document without the brittleness of knowledge based tools. The indexing component of GURU is fully implemented, and some results are given in the paper.

References

[1]
R.B. Ash, Information Theory. Interscience Tracts in Pure and Appl_ied Mathematics, No. 19, Interscience Publishers, New York, 1965.
[2]
M. Benson, E. Benson, R. Ilson, The BBI Combinatory Dictionary of English, A Guide to Word Combinations. Johrt Benjamin Publishing Company, Amsterdam/Philadelphia, 1986.
[3]
D.C. Blair and M.E. Maron, An Evaluation of Retrieval Effectiveness }or a Full- Text Document-retrieval System. Communications of the ACM 28:3, pp 289-299, March 1985.
[4]
Y. Choueka, Looking }or Needles in a Haystack. In Proceedings of the I#IAO, p:609-623, 1988.
[5]
P.R. Flass, Technical Correspondence. Communications of the ACM, 28(11), pp 1238, November 1985.
[6]
R. Garside, G. Leech and G. Sampson, (eds), The Computational Analysis of English: A Corpus Based Approach. Longman, London, 1987.
[7]
M.A.K. Halliday, Lexis as a Linguistic Level. In C.E. Bazell, J.C. Catford, M.A.K Halliday and R.H. Robins (eds.), In memory o} J.R. Firth, Longmans Linguistics Library, pp 148-162, London, 1966.
[8]
E. Horowitz and J. Munson, An Expensive View of Software Reuse. IEEE Transactions on Software Engineering, Vol SE- 10, September 1984.
[9]
R. Huddleston, lrttroduclion to Ihe Grammar of English. Cambridge Textbooks in Linguistics, Cambridge U.,#_:versity Press, 1984.
[10]
M. Luhn, The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development, Vol. 2, No. 2, pp 159-165, April 1958.
[11]
Y.S. Maarek and G.E. Kaiser, On the Use of Conceptual Clustering .for Classifying Reusable Ada Code. ACM SigAda international Conference on the Ada Programruing Language, pp 208-215, Boston, MA, December 1987.
[12]
Y.S. Maarek, Using Cluster Analysis for Assisting Maintenance of Large Software Systems. In Proceedings of the IEEE Israel Conference on Computer Systems and Software Engineering, pp 178-186, Tel Avlv, israel, June 1988.
[13]
Y.S. Maarek, Using Structural lnforma. tion for Managing Very Large Software Systems. D#c. Dissertat_#on, Computer Science Department, Technion, Israel Institute of Technology, Israel, January 1989.
[14]
W.J.R. Martin, B.P.F. Al and P.J.G van Sterkenburg, On the processing of a text corpus: .from textual data to lexicographical inIormalion. Lexicography: Principles and Practice, Ed. R.R.K Hartmann, Applied Language Studies Series, Academic Press, London, 1983.
[15]
M. Mauldin, Information Retrieval by Text Skimming. Thesis Proposal, Carnegie-Mellon University, Pittsburgh, May 1986.
[16]
M. Mauldin, J. Carbonell and R. Thomason, Knowledge-Based Information Retrieval. In Proceedings of the 29th Annual Corfference of the National Federation of Abstracting and Information Services, Elsevier Press, 1987.
[17]
I.A. Mel'#,uk, Ler, ical Funclions in Lexieographic Descriplion. In Proceedings of the Berkeley Linguistics Society, 8, 1973.
[18]
J.I. Rodale, and Staff, The Word Finder. Rodale Books, Inc. Emmaus, Pennsylvania, 1947.
[19]
G. Salton, The SMART Retrieval System - experiment'in Automatic Document processing. Prentice-Hall, New Jersey, 1971.
[20]
G. Salton and M.J. McGill, introduction lo Modern Information Retrieval. Mc Graw Hill Computer Series, Mc Graw Hill, New York, 1983.
[21]
F. De Saussure, Cours de Linguislique Generale, Qualri#rne edition. Librairie Payot, Paris, France, 1949.
[22]
F.A. Smadja, Lea:ical Co-occurrence: The Missirtg link. To appear in the Journal 0f the Association for Literary and Linguistic computing, 1989.
[23]
K. Sparck Jones, Synonymy and Semantic Classification. Edinburgh University Press, Scotland, 1986.

Cited By

View all
  • (2022)Evaluating the Use of Semantics for Identifying Task-relevant Textual Information2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00039(240-251)Online publication date: Mar-2022
  • (2021)Assessing Semantic Frames to Support Program Comprehension Activities2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC)10.1109/ICPC52881.2021.00011(13-24)Online publication date: May-2021
  • (2018)Term ProximityEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_937(4055-4055)Online publication date: 7-Dec-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGIR Forum
ACM SIGIR Forum  Volume 23, Issue SI
Special issue: Proceedings of the 12th annual international ACMSIGIR conference on Research and development in information retrieval, N.J. Belkin and C.J. van Rijsbergen (Eds.), June 25-28, 1989, Cambridge, MA.
June 1989
243 pages
ISSN:0163-5840
DOI:10.1145/75335
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGIR '89: Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
    May 1989
    257 pages
    ISBN:0897913213
    DOI:10.1145/75334
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1989
Published in SIGIR Volume 23, Issue SI

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)124
  • Downloads (Last 6 weeks)13
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Evaluating the Use of Semantics for Identifying Task-relevant Textual Information2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00039(240-251)Online publication date: Mar-2022
  • (2021)Assessing Semantic Frames to Support Program Comprehension Activities2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC)10.1109/ICPC52881.2021.00011(13-24)Online publication date: May-2021
  • (2018)Term ProximityEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_937(4055-4055)Online publication date: 7-Dec-2018
  • (2016)Term ProximityEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_937-2(1-2)Online publication date: 9-Dec-2016
  • (2009)Term ProximityEncyclopedia of Database Systems10.1007/978-0-387-39940-9_937(3036-3036)Online publication date: 2009
  • (2005)An experiment in software retrievalSoftware Engineering — ESEC '9310.1007/3-540-57209-0_26(380-396)Online publication date: 29-May-2005
  • (2019)Interactive Technologies Designed for Children with AutismACM Transactions on Accessible Computing10.1145/334228512:3(1-37)Online publication date: 12-Sep-2019
  • (2019)Find and SeekACM Transactions on Accessible Computing10.1145/334228212:3(1-23)Online publication date: 31-Aug-2019
  • (2018)Term ProximityEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_937(4055-4055)Online publication date: 7-Dec-2018
  • (2017)Algorithm 972ACM Transactions on Mathematical Software10.1145/300996843:3(1-22)Online publication date: 9-Jan-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media