Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Automatic identification of key classes in a software system using webmining techniques

Published: 01 November 2008 Publication History

Abstract

Software engineers new to a project are often stuck sorting through hundreds of classes in order to find those few classes that offer a significant insight into the inner workings of the software project. To help stimulate this process, we propose a technique that can identify the most important classes in a system or the key classes of that system. Software engineers can use these classes to focus their understanding efforts when starting to work on a new software project. Those key classes are typically characterized with having a lot of 'control' within the application. In order to find these controlling classes, we present a detection approach that is based on dynamic coupling and webmining. We demonstrate the potential of our technique using two open-source software systems that have a rich documentation set. During the case studies we use dynamically gathered coupling information that vary between a number of coupling metrics. The case studies show that we are able to retrieve 90% of the classes deemed important by the original maintainers of the systems, while maintaining a level of precision of around 50%. Copyright © 2008 John Wiley & Sons, Ltd.

References

[1]
Lehman M, Belady L. Program Evolution: Processes of Software Change. Academic Press Professional, Inc.: San Diego CA, U.S.A., 1985.
[2]
Demeyer S, Ducasse S, Nierstrasz O. Object-oriented Reengineering Patterns. Morgan Kaufmann: Los Altos CA, U.S.A., 2003.
[3]
Ko AJ, Myers BA, Coblenz MJ, Aung HH. An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. IEEE Transactions on Software Engineering 2006; 32(12):971-987.
[4]
Corbi TA. Program understanding: Challenge for the 90s. IBM Systems Journal 1990; 28(2):294-306.
[5]
Spinellis D. Code Reading: The Open Source Perspective. Addison-Wesley: Boston MA, U.S.A., 2003.
[6]
Wilde N. Faster reuse and maintenance using software reconnaissance. Technical Report SERC-TR-75F, Software Engineering Research Center, CSE-301, CIS Department, University of Florida, Gainesville FL, 1994. Available at: http://citeseer.ist.psu.edu/wilde94faster.html {29 April 2008}.
[7]
Biggerstaff TJ, Mitbander BG, Webster D. The concept assignment problem in program understanding. Proceedings qf the International Conference on Software Engineering (ICSE). IEEE Computer Society: Los Alamitos CA, U.S.A., 1993; 482-498.
[8]
Lakhotia A. Understanding someone else's code: Analysis of experiences. Journal of Systems and Software 1993; 23(3):269-275.
[9]
von Mayrhauser A, Vans AM. Program comprehension during software maintenance and evolution. IEEE Computer 1995; 28(8):44-55.
[10]
Robillard ME Coelho W, Murphy GC. How effective developers investigate source code: An exploratory study. IEEE Transactions on Software Engineering 2004; 30(12):889-903.
[11]
Tahvildari L, Kontogiannis K. Improving design quality using meta-pattern transformations: A metric-based approach. Journal of Software Maintenance and Evolution: Research and Practice 2004; 16(4-5):331-361.
[12]
Wand Y, Weber R. An ontological model of an information system. IEEE Transactions on Software Engineering 1990; 16(11):1282-1292.
[13]
Selby RW, Basili VR. Analyzing error-prone system structure. IEEE Transactions on Software Engineering 1991; 17(2):141-152.
[14]
Lethbridge TC, Anquetil N. Experiments with coupling and cohesion metrics in a large system 1998. Working Paper, School of Information Technology and Engineering. Available at: http://www.site.uottawa.ca/~tcl/papers/metrics/ ExpWithCouplingCohesion.html {29 April 2008}.
[15]
Chidamber SR, Kemerer CF. A metrics suite for object oriented design. IEEE Transactions on Software Engineering 1994; 20(6):476-493.
[16]
Arisholm E, Briand L, Foyen A. Dynamic coupling measurement for object-oriented software. IEEE Transactions on Software Engineering 2004; 30(8):491-506.
[17]
Zaidman A, Calders T, Demeyer S, Paredaens J. Applying webmining techniques to execution traces to support the program comprehension process. Proceedings of the European Conference on Software Maintenance and Reengineering (CSMR). IEEE Computer Society: Los Alamitos CA, U.S.A., 2005; 134-142.
[18]
Yang HY, Tempero E, Berrigan R. Detecting indirect coupling. Proceedings of the Australian Software Engineering Conference (ASWEC). IEEE Computer Society: Los Alamitos CA, U.S.A., 2005; 212-221.
[19]
Briand LC, Daly JW, Wüst JK. A unified framework for coupling measurement in object-oriented systems. IEEE Transactions on Software Engineering 1999; 25(1):91-121.
[20]
Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Computer Networks 1998; 30(1-7):107-117.
[21]
Gibson D, Kleinberg JM, Raghavan P. Inferring Web Communities from Link Topology. ACM: New York NY, U.S.A., 1998; 225-234.
[22]
Kleinberg JM. Authoritative sources in a hyperlinked environment. Journal of the ACM 1999; 46(5):604-632.
[23]
Zaidman A. Scalability solutions for program comprehension through dynamic analysis. PhD Thesis, University of Antwerp, 2006.
[24]
Zaidman A, Demeyer S. Managing trace data volume through a heuristical clustering process based on event execution frequency. Proceedings of the European Conference on Software Maintenance and Reengineering (CSMR). IEEE Computer Society: Los Alamitos CA, U.S.A., 2004; 329-338.
[25]
Gamma E, Helm R, Johnson R, Vlissides J. Design Patterns: Elements of Reusable Object-oriented Software. Addison-Wesley: Reading MA, U.S.A., 1995.
[26]
Zaidman A, Du Bois B, Demeyer S. How webmining and coupling metrics can improve early program comprehension. Proceedings of the International Conference on Program Comprehension (ICPC). IEEE Computer Society: Los Alamitos CA, U.S.A., 2006; 74-78.
[27]
Stroulia E, Systä. T. Dynamic analysis for reverse engineering and program understanding. ACM SIGAPP Applied Computing Review 2002; 10(1):8-17.
[28]
Systä T. On the relationships between static and dynamic models in reverse engineering Java software. Proceedings of the Working Conference on Reverse Engineering (WCRE). IEEE Computer Society: Los Alamitos CA, U.S.A., 1999; 304-313.
[29]
Robillard MP. Automatic generation of suggestions for program investigation. SIGSOFT Software Engineering Notes 2005; 30(5):11-20.
[30]
Hamou-Lhadj A. Techniques to simplify the analysis of execution traces for program comprehension. PhD Thesis, University of Ottawa, Canada, 2005.
[31]
Hamou-Lhadj A, Braun E, Amyot D, Lethbridge T. Recovering behavioral design models from execution traces. Proceedings of the European Conference on Software Maintenance and Reengineering (CSMR). IEEE Computer Society: Los Alamitos CA, U.S.A., 2005; 112-121.
[32]
Hamou-Lhadj A, Lethbridge T. Summarizing the content of large traces to facilitate the understanding of the behaviour of a software system. Proceedings of the International Conference on Program Comprehension (ICPC). IEEE Computer Society: Los Alamitos CA, U.S.A., 2006; 181-190.
[33]
Greevy O, Ducasse S. Correlating features and code using a compact two-sided trace analysis approach. Proceedings of the European Conference on Software Maintenance and Reengineering (CSMR). IEEE Computer Society: Los Alamitos CA, U.S.A., 2005; 314-323.
[34]
Eisenbarth T, Koschke R, Simon D. Locating features in source code. IEEE Transactions on Software Engineering 2003; 29(3):210-224.
[35]
Reiss SP, Renieris M. Encoding program executions. Proceedings of the International Conference on Software Engineering (ICSE). ACM: New York NY, U.S.A., 2001; 221-230.
[36]
Richner T, Ducasse S. Using dynamic information for the iterative recovery of collaborations and roles. Proceedings of the International Conference on Software Maintenance (ICSM). IEEE Computer Society: Los Alamitos CA, U.S.A. 2002; 34-43.
[37]
Pauw WD, Jensen E, Mitchell N, Sevitsky G, Vlissides JM, Yang J. Visualizing the execution of java programs. Software Visualization, International Seminar Dagstuhl Castle (2001) (Lecture Notes in Computer Science, vol. 2269), Diehl S (ed.). Springer: Berlin, Heidelberg, Germany, 2002; 151-162.
[38]
Jerding DF, Stasko JT, Ball T. Visualizing interactions in program executions. Proceedings of the International Conference on Software Engineering (ICSE). ACM: New York NY, U.S.A., 1997; 360-370.
[39]
De Pauw W, Helm R, Kimelman D, Vlissides JM. Visualizing the behavior of object-oriented systems. Proceedings of the Conference on Object-oriented Programming Systems, Languages, and Applications (OOPSLA). ACM: New York NY, U.S.A., 1993; 326-337.
[40]
De Pauw W, Lorenz D, Vlissides J, Wegman M. Execution patterns in object-oriented visualization. Proceedings of the Conference on Object-oriented Technologies and Systems (COOTS). USENIX: Berkeley CA, U.S.A., 1998; 219-234.
[41]
Jerding DF, Stasko JT. The information mural: A technique for displaying and navigating large information spaces. IEEE Transactions on Visualization and Computer Graphics 1998; 4(3):257-271.
[42]
Kuhn A, Greevy O. Exploiting the analogy between traces and signal processing. Proceedings of the International Conference on Software Maintenance (ICSM). IEEE Computer Society: Los Alamitos CA, U.S.A., 2006; 320-329.
[43]
Cornelissen B, Holten D, Zaidman A, Moonen L, van Wijk JJ, van Deursen A. Understanding execution traces using massive sequence and circular bundle views. Proceedings of the International Conference on Program Comprehension (ICPC). IEEE Computer Society: Los Alamitos CA, U.S.A., 2007; 49-58.
[44]
Greevy O, Lanza M, Wysseier C. Visualizing live software systems in 3D. Proceedings of the Symposium on Software Visualization (SoftVis). ACM: New York NY, U.S.A., 2006; 47-56.
[45]
Ducasse S, Lanza M, Bertuli R. High-level polymetric views of condensed run-time information. Proceedings of the European Conference on Software Maintenance and Reengineering (CSMR). IEEE Computer Society: Los Alamitos CA, U.S.A., 2004; 309-318.
[46]
Lanza M. Object-oriented reverse engineering--Coarse-grained, fine-grained, and evolutionary software visualization. PhD Thesis, University of Berne, 2003.
[47]
Walker RJ, Murphy GC, Freeman-Benson B, Wright D, Swanson D, Isaak J. Visualizing dynamic software system information through high-level models. Proceedings of the Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA) (ACM SIGPLAN Notices, vol. 33). ACM: New York NY, U.S.A., 1998; 271-283.
[48]
Chan A, Holmes R, Murphy GC, Ying AT. Scaling an object-oriented system execution visualizer through sampling. Proceedings of the International Workshop on Program Comprehension (IWPC). IEEE Computer Society: Los Alamitos CA, U.S.A., 2003; 237-244.
[49]
Murphy GC, Notkin D, Sullivan K. Software reflexion models: Bridging the gap between source and high-level models. Proceedings of the ACM SIGSOFT Symposium on Foundations of Software Engineering (FSE). ACM: New York NY, U.S.A., 1995; 18-28.
[50]
Zaidman A, Adams B, De Schutter K, Demeyer S, Hoffman G, De Ruyck B. Regaining lost knowledge through dynamic analysis and aspect orientation--An industrial experience report. Proceedings of the Conference on Software Maintenance and Reengineering (CSMR). IEEE Computer Society: Los Alamitos CA, U.S.A., 2006; 89-98.

Cited By

View all
  • (2023)EASE: An Effort-aware Extension of Unsupervised Key Class Identification ApproachesACM Transactions on Software Engineering and Methodology10.1145/363571433:4(1-43)Online publication date: 2-Dec-2023
  • (2023)Pride: Prioritizing Documentation Effort Based on a PageRank-Like Algorithm and Simple Filtering RulesIEEE Transactions on Software Engineering10.1109/TSE.2022.317146949:3(1118-1151)Online publication date: 1-Mar-2023
  • (2023)Identifying Key Classes for Initial Software Comprehension: Can We Do it Better?Proceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00160(1878-1889)Online publication date: 14-May-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Software Maintenance and Evolution: Research and Practice
Journal of Software Maintenance and Evolution: Research and Practice  Volume 20, Issue 6
November 2008
77 pages

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 01 November 2008

Author Tags

  1. coupling
  2. dynamic analysis
  3. program comprehension
  4. webmining

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)EASE: An Effort-aware Extension of Unsupervised Key Class Identification ApproachesACM Transactions on Software Engineering and Methodology10.1145/363571433:4(1-43)Online publication date: 2-Dec-2023
  • (2023)Pride: Prioritizing Documentation Effort Based on a PageRank-Like Algorithm and Simple Filtering RulesIEEE Transactions on Software Engineering10.1109/TSE.2022.317146949:3(1118-1151)Online publication date: 1-Mar-2023
  • (2023)Identifying Key Classes for Initial Software Comprehension: Can We Do it Better?Proceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00160(1878-1889)Online publication date: 14-May-2023
  • (2020)Is Static Analysis Able to Identify Unnecessary Source Code?ACM Transactions on Software Engineering and Methodology10.1145/336826729:1(1-23)Online publication date: 30-Jan-2020
  • (2019)Discovering Important Services Based on Weighted K-Core DecompositionInternational Journal of Web Services Research10.4018/IJWSR.201901010216:1(22-36)Online publication date: 1-Jan-2019
  • (2019)Service ranking in service networks using parameters in complex networks: a comparative studyCluster Computing10.1007/s10586-017-1694-622:2(2921-2930)Online publication date: 1-Mar-2019
  • (2018)Identifying key classes in object-oriented software using generalized k-core decompositionFuture Generation Computer Systems10.1016/j.future.2017.10.00681:C(188-202)Online publication date: 1-Apr-2018
  • (2018)Analyzing the structure of Java software systems by weighted K-core decompositionFuture Generation Computer Systems10.1016/j.future.2017.09.03983:C(431-444)Online publication date: 1-Jun-2018
  • (2017)On the properties of design-relevant classes for design anomaly assessmentProceedings of the 25th International Conference on Program Comprehension10.1109/ICPC.2017.17(332-335)Online publication date: 20-May-2017
  • (2017)Visualizing trace of Java collection APIs by dynamic bytecode instrumentationJournal of Visual Languages and Computing10.1016/j.jvlc.2017.04.00643:C(14-29)Online publication date: 1-Dec-2017
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media