Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Feed-forward and recurrent neural networks for source code informal information analysis

Published: 01 July 2003 Publication History

Abstract

Design recovery, which is a part of the reverse engineering process of source code, must supply programmers with all the information they need to fully understand a program or a system. In this paper, a connectionist method that can be used for design recovery in conjunction with more traditional approaches is proposed for analyzing the informal information (comments and mnemonics) in programs An approach based on artificial neural networks (ANNs) was chosen because of its property of being robust (capable of tolerating noisy inputs), because of its associative memory ability (capable of retrieving a concept given only the context of the input word that originally fired the concept), and because of its generalization power (ability to learn conceptually relevant micro-features of the domain). The proposed approach uses a combination of top down domain analysis (i.e., the creation of a concept hierarchy by a domain expert, to be used in the construction of the training set) and a bottom up approach (i.e., the analysis of the informal information using ANNs).A preprocessing system that extracts the relevant comments and identifier names and transforms them into an input for the ANNs has been developed. Feed-forward neural networks (FNNs) and recurrent neural networks (RNNs) were tried. RNN architectures are capable of learning sequences and are able to make use of the word ordering of the sentence. The networks were trained on part of the source code of an existing system and tested on a different portion of the system code. Test results, consisting of coverage and evaluation figures, are presented. They show a remarkably higher accuracy when ANNs, in general, are used as opposed to simple lexical methods. RNNs, in particular, also show higher coverage and accuracy than FNNs.

References

[1]
1. Choi SC, Scacchi W. Extracting and restructuring the design of large systems. IEEE Software 1990; 7(1):66-71.
[2]
2. Buss E, Henshaw J. Experiences in program understanding. CASCON-92, IBM Canada Ltd Laboratory, Centre for Advanced Studies, Toronto, Ontario, November 1992; 157-189.
[3]
3. Chikofsky E, Cross JH. Reverse engineering and design recovery: A taxonomy. IEEE Software 1990; 7(1):13-17.
[4]
4. Biggerstaff T. Design recovery for maintenance and reuse. IEEE Computer 1989; 22:36-49.
[5]
5. Rugaber S, Clayton R. The representation problem in reverse engineering. Working Conference on Reverse Engineering, Baltimore MD, May 1993. IEEE Computer Society Press: Los Alamitos CA, 1993; 8-16.
[6]
6. Rich C, Wills LM. Recognizing a program's design: A graph-parsing approach. IEEE Software 1990; 7(1):82-89.
[7]
7. Biggerstaff T, Mitbander BG, Webster D. The concept assignment problem in program understanding. Proceedings 15th International Conference on Software Engineering, Baltimore MD, 21-23 May 1993. IEEE Computer Society Press: Los Alamitos CA, 1993; 482-498.
[8]
8. Prieto-Diaz R. Implementing faccted classification for software reuse. Communications of the ACM 1991; 34(5):88-97.
[9]
9. Kozaczynski W, Ning JQ. Sre: A knowledge-based environment for large-scale software re-engineering activities. Proceedings 11th International Conference on Software Engineering, Pittsburgh PA, May 1989. IEEE Computer Society Press: Los Alamitos CA, 1989; 113-122.
[10]
10. De Mori R, Merlo E, Kontogiannis K, Girard JF, Levine J. Program understanding through structural and behavioral recognition, and user interaction. AI and Program Understanding Workshop AAAI-92, San Jose CA, 1992. AAAI Press, 1992; 51-52.
[11]
11. Simpson P. Artificial Neural Systems. Pergamon Press: New York, 1990.
[12]
12. Rumelhart DE, Hinton G, McClelland JL. A general framework for parallel distributed processing, in parallel distributed processing. Parallel Distributed Processing, vol. 1. MIT Press: Cambridge MA, 1986; 45-76.
[13]
13. Poggio T, Girosi F. A theory of networks for approximation and learning. MIT AI News 1989; 1140:87.
[14]
14. Gelenbe E (ed.). Special issue on neural network software and systems. IEEE Transactions on Software Engineering 1992; 18:551-653.
[15]
15. Wong SKH, Cai YJ, Yao YY. Computation of terms associations by a neural network. 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh PA, June 1993. Wiley: New York, 1993; 107-115.
[16]
16. Bengio Y, Gori M, De Mori R. Bps: A learning algorithm for capturing the dynamic nature of speech, International Joint Conference on Neural Networks, vol. 2. IEEE Computer Society Press: Los Alamitos CA, 1989; 417-424.
[17]
17. Merlo E, McAdam I, De Mori R. Source code informal information analysis using connectionist models. International Joint Conference on Artificial Intelligence, vol. 2, Chambery, France, 29 August-3 September 1993. Morgan Kaufmann: San Francisco CA, 1993; 1339-1344.
[18]
18. Kuhn R, De Mori R. Learning speech semantics with keyword classification trees. IEEE International Conference on Acoustics, Speech and Signal Processing, St. Paul MN, April 1993. IEEE Computer Society Press: Los Alamitos CA, 1993.
[19]
19. Zernik U, Dyer M. The self-extending phrasal lexicon. Computational Linguistics 1987; 13(3):308-327.
[20]
20. Hindle D. Noun classification from predicate argument structures. Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics, 1990; 268-275.
[21]
21. Smadja F, Mc Keown K. Automatically extracting and representing collocations for language generation. Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics, 1990; 252-259.
[22]
22. Brent HR. Automatic semantic classification of verbs from their syntactic contexts: An implemented classifier for stativity. Proceedings of the 5th Conference of the European Chapter of the Association for Computational Linguistics, Berlin, Germany, 1991; 222-226.
[23]
23. Pustejousky J. The acquisition of lexical semantic knowledge from large corpora. DARPA Spoken and Written Language Workshop. Morgan Kaufmann: San Francisco CA, 1992.
[24]
24. Miikkulainen R. A neural network model of script processing and memory. Technical Report ucla-ai-90-03, UCLA, Computer Science Department, 1990.
[25]
25. Sumida RA, Dyer MG. Propagating filters in pds networks for sequencing and ambiguity resolution. Neural Information Processing Systems 1992; 4:233-240.
[26]
26. Rumelhart DE, Hinton G, Williams RJ. Learning internal representation by error propagation, in parallel distributed processing. Parallel Distributed Processing, vol. 1. MIT Press: Cambridge MA, 1986; 318-362.
[27]
27. Mozer MC. Induction of multiscale temporal structure. Advance in Neural Information Processing Systems 1992; 4: 275-282.
[28]
28. Chetayav N. The Stability of Motion. Pergamon Press,: New York, 1961.
[29]
29. Pineda F. Generalization of back-propagation to recurrent neural networks. Physical Review Letters 1987; 19:2229-2232.
[30]
30. Pearlmutter BA. Learning state space trajectories in recurrent neural networks. Neural Computation 1989; 1:263-269.
[31]
31. Williams RJ, Zipser D. A learning algorithm for continually running fully recurrent neural networks. Neural Computation 1989; 1:270-280.
[32]
32. Bengio Y, De Moil R, Flammia G, Kompe R. Global optimization of a neural network--hidden Markov model hybrid. IEEE Transactions on Neural Networks 1992; 3(2):252-259.
[33]
33. Cardin R, Normandin Y, De Mori R. High performance connected digit recognition using codebook exponents. Conference on Acoustic, Speech and signal Processing, vol. 1, San Francisco CA, March 1992. IEEE Computer Society Press: Los Alamitos CA, 1992; 505-509.
[34]
34. Cardin R, Goupil D, Lacouture R, Miller E, Snow C, Normandin Y. Crim's spontaneous speech recognition system for the atis task. Proceedings of the 1992 Conference on Spoken Language Processing, Banff, Alberta, October 1992; 623-626.
[35]
35. Antoniol A, Canfora G, Casazza G, De Lucia A, Merlo E. Tracing object-oriented code into functional requirements. Proceedings International Workshop on Program Comprehension (IWPC). IEEE Computer Society Press: Los Alamitos CA, 2000; 79-86.
[36]
36. Anquetil N, Lethbridge TC. Recovering software architecture from the names of source files. Journal of Software Maintenance: Research and Practice 1999; 11:201-221.
[37]
37. Anquetil N, Lethbridge TC. Extracting concepts from file names: A new file clustering criterion. ICSE'98. IEEE Computer Society Press: Los Alamitos CA, 1998; 84-93.
[38]
38. Anquetil N, Lethbridge TC. Assessing the relevance of identifier names in a legacy software system. CASCON'98, Toronto, Ontario, 1998. IBM Canada Ltd., 1998; 213-222.
[39]
39. Anquetil N, Lethbridge TC. File clustering using naming conventions for legacy systems. CASCON'97, Toronto, Ontario, 1997. IBM Canada Ltd., 1997; 184-195.
[40]
40. Caprile B, Tonella P. Nomen est omen: Analyzing the language of function identifiers. Proceedings WCRE'99, Working Conference on Reverse Engineering. IEEE Computer Society Press: Los Alamitos CA, 1999; 112-122.
[41]
41. Nanduri S, Rugaber S. Requirements validation via automated natural language parsing. Journal of Management Information Systems 1995-96; 12(3):9-19.
[42]
42. Takang A, Grubb P, Macredie R. The effects of comments and identifier names on program comprehensibility: An experimental investigation. Journal of Programming Languages 1996; 4(3): 143-167.
[43]
43. Maletic JI, Marcus A. Support for software maintenance using latent semantic analysis. Proceedings of the IASTED International Conference on Software Engineering and Applications (SEA2000), Las Vegas NV, 6-9 November 2000. ACTA Press: Calgary AB, 2000; 250-255.
[44]
44. Etzkorn LH, Bowen LL, Davis CG. An approach to program understanding by natural language understanding. Natural Language Engineering 1999; 5(1): 1-18.
[45]
45. Maarek YS, Berry DM, Kaiser GE. An information retrieval approach for automatically constructing software libraries. IEEE Transactions on Software Engineering 1991; 17(8):800-813.
[46]
46. Matwin S, Ahmad A. Reuse of modular software with automated comment analysis. Proceedings International Conference on Software Maintenance (ICSM). IEEE Computer Society Press: Los Alamitos CA, 1994; 222-231.

Cited By

View all
  • (2016)Linguistic antipatternsEmpirical Software Engineering10.1007/s10664-014-9350-821:1(104-158)Online publication date: 1-Feb-2016
  • (2014)An experimental investigation on the effects of context on source code identifiers splitting and expansionEmpirical Software Engineering10.1007/s10664-013-9260-119:6(1706-1753)Online publication date: 1-Dec-2014
  • (2013)Normalizing source code vocabulary to support program comprehension and software qualityProceedings of the 2013 International Conference on Software Engineering10.5555/2486788.2487012(1385-1388)Online publication date: 18-May-2013
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Software Maintenance: Research and Practice
Journal of Software Maintenance: Research and Practice  Volume 15, Issue 4
July 2003
91 pages

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 01 July 2003

Author Tags

  1. design recovery
  2. feed-forward networks
  3. informal information analysis
  4. program understanding
  5. recurrent neural networks

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2016)Linguistic antipatternsEmpirical Software Engineering10.1007/s10664-014-9350-821:1(104-158)Online publication date: 1-Feb-2016
  • (2014)An experimental investigation on the effects of context on source code identifiers splitting and expansionEmpirical Software Engineering10.1007/s10664-013-9260-119:6(1706-1753)Online publication date: 1-Dec-2014
  • (2013)Normalizing source code vocabulary to support program comprehension and software qualityProceedings of the 2013 International Conference on Software Engineering10.5555/2486788.2487012(1385-1388)Online publication date: 18-May-2013
  • (2009)Automatically capturing source code context of NL-queries for software maintenance and reuseProceedings of the 31st International Conference on Software Engineering10.1109/ICSE.2009.5070524(232-242)Online publication date: 16-May-2009
  • (2009)An empirical analysis of information retrieval based concept location techniques in software comprehensionEmpirical Software Engineering10.1007/s10664-008-9095-314:1(93-130)Online publication date: 1-Feb-2009

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media