article

Feed-forward and recurrent neural networks for source code informal information analysis

Authors:

Renato De MoriAuthors Info & Claims

Journal of Software Maintenance: Research and Practice, Volume 15, Issue 4

Pages 205 - 244

https://doi.org/10.1002/smr.274

Published: 01 July 2003 Publication History

Abstract

Design recovery, which is a part of the reverse engineering process of source code, must supply programmers with all the information they need to fully understand a program or a system. In this paper, a connectionist method that can be used for design recovery in conjunction with more traditional approaches is proposed for analyzing the informal information (comments and mnemonics) in programs An approach based on artificial neural networks (ANNs) was chosen because of its property of being robust (capable of tolerating noisy inputs), because of its associative memory ability (capable of retrieving a concept given only the context of the input word that originally fired the concept), and because of its generalization power (ability to learn conceptually relevant micro-features of the domain). The proposed approach uses a combination of top down domain analysis (i.e., the creation of a concept hierarchy by a domain expert, to be used in the construction of the training set) and a bottom up approach (i.e., the analysis of the informal information using ANNs).A preprocessing system that extracts the relevant comments and identifier names and transforms them into an input for the ANNs has been developed. Feed-forward neural networks (FNNs) and recurrent neural networks (RNNs) were tried. RNN architectures are capable of learning sequences and are able to make use of the word ordering of the sentence. The networks were trained on part of the source code of an existing system and tested on a different portion of the system code. Test results, consisting of coverage and evaluation figures, are presented. They show a remarkably higher accuracy when ANNs, in general, are used as opposed to simple lexical methods. RNNs, in particular, also show higher coverage and accuracy than FNNs.

References

[1]

1. Choi SC, Scacchi W. Extracting and restructuring the design of large systems. IEEE Software 1990; 7(1):66-71.

[2]

2. Buss E, Henshaw J. Experiences in program understanding. CASCON-92, IBM Canada Ltd Laboratory, Centre for Advanced Studies, Toronto, Ontario, November 1992; 157-189.

[3]

3. Chikofsky E, Cross JH. Reverse engineering and design recovery: A taxonomy. IEEE Software 1990; 7(1):13-17.

[4]

4. Biggerstaff T. Design recovery for maintenance and reuse. IEEE Computer 1989; 22:36-49.

[5]

5. Rugaber S, Clayton R. The representation problem in reverse engineering. Working Conference on Reverse Engineering, Baltimore MD, May 1993. IEEE Computer Society Press: Los Alamitos CA, 1993; 8-16.

[6]

6. Rich C, Wills LM. Recognizing a program's design: A graph-parsing approach. IEEE Software 1990; 7(1):82-89.

[7]

7. Biggerstaff T, Mitbander BG, Webster D. The concept assignment problem in program understanding. Proceedings 15th International Conference on Software Engineering, Baltimore MD, 21-23 May 1993. IEEE Computer Society Press: Los Alamitos CA, 1993; 482-498.

Digital Library

[8]

8. Prieto-Diaz R. Implementing faccted classification for software reuse. Communications of the ACM 1991; 34(5):88-97.

[9]

9. Kozaczynski W, Ning JQ. Sre: A knowledge-based environment for large-scale software re-engineering activities. Proceedings 11th International Conference on Software Engineering, Pittsburgh PA, May 1989. IEEE Computer Society Press: Los Alamitos CA, 1989; 113-122.

[10]

10. De Mori R, Merlo E, Kontogiannis K, Girard JF, Levine J. Program understanding through structural and behavioral recognition, and user interaction. AI and Program Understanding Workshop AAAI-92, San Jose CA, 1992. AAAI Press, 1992; 51-52.

[11]

11. Simpson P. Artificial Neural Systems. Pergamon Press: New York, 1990.

[12]

12. Rumelhart DE, Hinton G, McClelland JL. A general framework for parallel distributed processing, in parallel distributed processing. Parallel Distributed Processing, vol. 1. MIT Press: Cambridge MA, 1986; 45-76.

[13]

13. Poggio T, Girosi F. A theory of networks for approximation and learning. MIT AI News 1989; 1140:87.

[14]

14. Gelenbe E (ed.). Special issue on neural network software and systems. IEEE Transactions on Software Engineering 1992; 18:551-653.

[15]

15. Wong SKH, Cai YJ, Yao YY. Computation of terms associations by a neural network. 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh PA, June 1993. Wiley: New York, 1993; 107-115.

Digital Library

[16]

16. Bengio Y, Gori M, De Mori R. Bps: A learning algorithm for capturing the dynamic nature of speech, International Joint Conference on Neural Networks, vol. 2. IEEE Computer Society Press: Los Alamitos CA, 1989; 417-424.

[17]

17. Merlo E, McAdam I, De Mori R. Source code informal information analysis using connectionist models. International Joint Conference on Artificial Intelligence, vol. 2, Chambery, France, 29 August-3 September 1993. Morgan Kaufmann: San Francisco CA, 1993; 1339-1344.

[18]

18. Kuhn R, De Mori R. Learning speech semantics with keyword classification trees. IEEE International Conference on Acoustics, Speech and Signal Processing, St. Paul MN, April 1993. IEEE Computer Society Press: Los Alamitos CA, 1993.

[19]

19. Zernik U, Dyer M. The self-extending phrasal lexicon. Computational Linguistics 1987; 13(3):308-327.

[20]

20. Hindle D. Noun classification from predicate argument structures. Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics, 1990; 268-275.

[21]

21. Smadja F, Mc Keown K. Automatically extracting and representing collocations for language generation. Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics, 1990; 252-259.

[22]

22. Brent HR. Automatic semantic classification of verbs from their syntactic contexts: An implemented classifier for stativity. Proceedings of the 5th Conference of the European Chapter of the Association for Computational Linguistics, Berlin, Germany, 1991; 222-226.

Digital Library

[23]

23. Pustejousky J. The acquisition of lexical semantic knowledge from large corpora. DARPA Spoken and Written Language Workshop. Morgan Kaufmann: San Francisco CA, 1992.

[24]

24. Miikkulainen R. A neural network model of script processing and memory. Technical Report ucla-ai-90-03, UCLA, Computer Science Department, 1990.

[25]

25. Sumida RA, Dyer MG. Propagating filters in pds networks for sequencing and ambiguity resolution. Neural Information Processing Systems 1992; 4:233-240.

[26]

26. Rumelhart DE, Hinton G, Williams RJ. Learning internal representation by error propagation, in parallel distributed processing. Parallel Distributed Processing, vol. 1. MIT Press: Cambridge MA, 1986; 318-362.

[27]

27. Mozer MC. Induction of multiscale temporal structure. Advance in Neural Information Processing Systems 1992; 4: 275-282.

[28]

28. Chetayav N. The Stability of Motion. Pergamon Press,: New York, 1961.

[29]

29. Pineda F. Generalization of back-propagation to recurrent neural networks. Physical Review Letters 1987; 19:2229-2232.

[30]

30. Pearlmutter BA. Learning state space trajectories in recurrent neural networks. Neural Computation 1989; 1:263-269.

[31]

31. Williams RJ, Zipser D. A learning algorithm for continually running fully recurrent neural networks. Neural Computation 1989; 1:270-280.

[32]

32. Bengio Y, De Moil R, Flammia G, Kompe R. Global optimization of a neural network--hidden Markov model hybrid. IEEE Transactions on Neural Networks 1992; 3(2):252-259.

[33]

33. Cardin R, Normandin Y, De Mori R. High performance connected digit recognition using codebook exponents. Conference on Acoustic, Speech and signal Processing, vol. 1, San Francisco CA, March 1992. IEEE Computer Society Press: Los Alamitos CA, 1992; 505-509.

[34]

34. Cardin R, Goupil D, Lacouture R, Miller E, Snow C, Normandin Y. Crim's spontaneous speech recognition system for the atis task. Proceedings of the 1992 Conference on Spoken Language Processing, Banff, Alberta, October 1992; 623-626.

[35]

35. Antoniol A, Canfora G, Casazza G, De Lucia A, Merlo E. Tracing object-oriented code into functional requirements. Proceedings International Workshop on Program Comprehension (IWPC). IEEE Computer Society Press: Los Alamitos CA, 2000; 79-86.

[36]

36. Anquetil N, Lethbridge TC. Recovering software architecture from the names of source files. Journal of Software Maintenance: Research and Practice 1999; 11:201-221.

[37]

37. Anquetil N, Lethbridge TC. Extracting concepts from file names: A new file clustering criterion. ICSE'98. IEEE Computer Society Press: Los Alamitos CA, 1998; 84-93.

[38]

38. Anquetil N, Lethbridge TC. Assessing the relevance of identifier names in a legacy software system. CASCON'98, Toronto, Ontario, 1998. IBM Canada Ltd., 1998; 213-222.

[39]

39. Anquetil N, Lethbridge TC. File clustering using naming conventions for legacy systems. CASCON'97, Toronto, Ontario, 1997. IBM Canada Ltd., 1997; 184-195.

[40]

40. Caprile B, Tonella P. Nomen est omen: Analyzing the language of function identifiers. Proceedings WCRE'99, Working Conference on Reverse Engineering. IEEE Computer Society Press: Los Alamitos CA, 1999; 112-122.

[41]

41. Nanduri S, Rugaber S. Requirements validation via automated natural language parsing. Journal of Management Information Systems 1995-96; 12(3):9-19.

[42]

42. Takang A, Grubb P, Macredie R. The effects of comments and identifier names on program comprehensibility: An experimental investigation. Journal of Programming Languages 1996; 4(3): 143-167.

[43]

43. Maletic JI, Marcus A. Support for software maintenance using latent semantic analysis. Proceedings of the IASTED International Conference on Software Engineering and Applications (SEA2000), Las Vegas NV, 6-9 November 2000. ACTA Press: Calgary AB, 2000; 250-255.

[44]

44. Etzkorn LH, Bowen LL, Davis CG. An approach to program understanding by natural language understanding. Natural Language Engineering 1999; 5(1): 1-18.

[45]

45. Maarek YS, Berry DM, Kaiser GE. An information retrieval approach for automatically constructing software libraries. IEEE Transactions on Software Engineering 1991; 17(8):800-813.

Digital Library

[46]

46. Matwin S, Ahmad A. Reuse of modular software with automated comment analysis. Proceedings International Conference on Software Maintenance (ICSM). IEEE Computer Society Press: Los Alamitos CA, 1994; 222-231.

Cited By

Arnaoudova VDi Penta MAntoniol G(2016)Linguistic antipatternsEmpirical Software Engineering10.1007/s10664-014-9350-821:1(104-158)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1007/s10664-014-9350-8
Guerrouj LPenta MGuéhéneuc YAntoniol G(2014)An experimental investigation on the effects of context on source code identifiers splitting and expansionEmpirical Software Engineering10.1007/s10664-013-9260-119:6(1706-1753)Online publication date: 1-Dec-2014
https://dl.acm.org/doi/10.1007/s10664-013-9260-1
Guerrouj LNotkin DCheng BPohl K(2013)Normalizing source code vocabulary to support program comprehension and software qualityProceedings of the 2013 International Conference on Software Engineering10.5555/2486788.2487012(1385-1388)Online publication date: 18-May-2013
https://dl.acm.org/doi/10.5555/2486788.2487012
Show More Cited By

Index Terms

Feed-forward and recurrent neural networks for source code informal information analysis

Recommendations

Feed-Forward Neural Networks: Vector Decomposition Analysis, Modelling and Analog Implementation
Exponential stability analysis of memristor-based recurrent neural networks with time-varying delays

This paper investigates the exponential stability problem about the memristor-based recurrent neural networks. Having more rich dynamic behaviors, neural networks based on the memristor will play a key role in the optimistic computation and associative ...
A Lyapunov-stability-based context-layered recurrent pi-sigma neural network for the identification of nonlinear systems
Abstract
A novel higher-order context-layered recurrent pi-sigma neural network (CLRPSNN) is presented for the identification of nonlinear dynamical systems. The proposed model is the modified form of the classical pi-sigma neural network (PSNN)...
Highlights
- A new higher-order context-layered recurrent pi-sigma neural network (CLRPSNN) structure is proposed for solving the identification problem.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Software Maintenance: Research and Practice

Journal of Software Maintenance: Research and Practice Volume 15, Issue 4

July 2003

91 pages

ISSN:1040-550X

Issue’s Table of Contents

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 01 July 2003

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Arnaoudova VDi Penta MAntoniol G(2016)Linguistic antipatternsEmpirical Software Engineering10.1007/s10664-014-9350-821:1(104-158)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1007/s10664-014-9350-8
Guerrouj LPenta MGuéhéneuc YAntoniol G(2014)An experimental investigation on the effects of context on source code identifiers splitting and expansionEmpirical Software Engineering10.1007/s10664-013-9260-119:6(1706-1753)Online publication date: 1-Dec-2014
https://dl.acm.org/doi/10.1007/s10664-013-9260-1
Guerrouj LNotkin DCheng BPohl K(2013)Normalizing source code vocabulary to support program comprehension and software qualityProceedings of the 2013 International Conference on Software Engineering10.5555/2486788.2487012(1385-1388)Online publication date: 18-May-2013
https://dl.acm.org/doi/10.5555/2486788.2487012
Hill EPollock LVijay-Shanker K(2009)Automatically capturing source code context of NL-queries for software maintenance and reuseProceedings of the 31st International Conference on Software Engineering10.1109/ICSE.2009.5070524(232-242)Online publication date: 16-May-2009
https://dl.acm.org/doi/10.1109/ICSE.2009.5070524
Cleary BExton CBuckley JEnglish M(2009)An empirical analysis of information retrieval based concept location techniques in software comprehensionEmpirical Software Engineering10.1007/s10664-008-9095-314:1(93-130)Online publication date: 1-Feb-2009
https://dl.acm.org/doi/10.1007/s10664-008-9095-3

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents