Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2597008.2597157acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
Article

Condensing class diagrams by analyzing design and network metrics using optimistic classification

Published: 02 June 2014 Publication History

Abstract

A class diagram of a software system enhances our ability to understand software design. However, this diagram is often unavailable. Developers usually reconstruct the diagram by reverse engineering it from source code. Unfortunately, the resultant diagram is often very cluttered; making it difficult to learn anything valuable from it. Thus, it would be very beneficial if we are able to condense the reverse- engineered class diagram to contain only the important classes depicting the overall design of a software system. Such diagram would make program understanding much easier. A class can be important, for example, if its removal would break many connections between classes. In our work, we estimate this kind of importance by using design (e.g., number of attributes, number of dependencies, etc.) and network metrics (e.g., betweenness centrality, closeness centrality, etc.). We use these metrics as features and input their values to our optimistic classifier that will predict if a class is important or not. Different from standard classification, our newly proposed optimistic classification technique deals with data scarcity problem by optimistically assigning labels to some of the unlabeled data and use them for training a better statistical model. We have evaluated our approach to condense reverse-engineered diagrams of 9 software systems and compared our approach with the state-of-the-art work of Osman et al. Our experiments show that our approach can achieve an average Area Under the Receiver Operating Characteristic Curve (AUC) score of 0.825, which is a 9.1% improvement compared to the state-of-the-art approach.

References

[1]
MagicDraw. http://www.nomagic.com/products/magicdraw.html.
[2]
eUML2. http: //www.soyatec.com/euml2/com.soyatec.uml.doc/.
[3]
Class Visualizer. http://www.class-visualizer.net/.
[4]
Ana M. Fernandez-Saez, Michel R. V. Chaudron, Marcela Genero, and Isabel Ramos. Are Forward Designed or Reverse-Engineered UML Diagrams More Helpful for Code Maintenance?: A Controlled Experiment. In International Conference on Evaluation and Assessment in Software Engineering, pages 60–71, 2013.
[5]
Mohd Hafeez Osman, Michel R. V. Chaudron, and Peter van der Putten. An Analysis of Machine Learning Algorithms for Condensing Reverse Engineered Class Diagrams. In International Conference on Software Maintenance, pages 140–149, 2013.
[6]
Shyam R. Chidamber and Chris F. Kemerer. A Metrics Suite for Object Oriented Design. IEEE Trans. Software Eng., 20(6):476–493, 1994.
[7]
Al Lake and Curtis Cook. Use of factor analysis to develop OOP software complexity metrics. In Proc. 6th Annual Oregon Workshop on Software Metrics, Silver Falls, Oregon, 1994.
[8]
Lionel C. Briand, Premkumar T. Devanbu, and Walcélio L. Melo. An Investigation into Coupling Measures for C++. In International Conference on Software Engineering, pages 412–421, 1997.
[9]
Jiliang Tang, Huiji Gao, Xia Hu, and Huan Liu. Exploiting homophily effect for trust prediction. In International Conference on Web Search and Data Mining, pages 53–62, 2013.
[10]
Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual Web search engine. In International Conference on World-Wide Web, pages 107–117, 1998.
[11]
Foster J. Provost, Tom Fawcett, and Ron Kohavi. The Case against Accuracy Estimation for Comparing Induction Algorithms. In International Conference on Machine Learning, pages 445–453, 1998.
[12]
Ahmed Lamkanfi, Serge Demeyer, Emanuel Giger, and Bart Goethals. Predicting the severity of a reported bug. In Mining Software Repositories, pages 1–10, 2010.
[13]
Stefan Lessmann, Bart Baesens, Christophe Mues, and Swantje Pietsch. Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Trans. Software Eng., 34(4):485–496, 2008.
[14]
Daniele Romano and Martin Pinzger. Using source code metrics to predict change-prone Java interfaces. In International Conference on Software Maintenance, pages 303–312, 2011.
[15]
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The weka data mining software: an update. SIGKDD Explorations, 11(1):10–18, 2009.
[16]
Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu. Discriminative Frequent Pattern Analysis for Effective Classification. In International Conference on Data Enginering, pages 716–725, 2007.
[17]
Xifeng Yan, Hong Cheng, Jiawei Han, and Philip S. Yu. Mining significant graph patterns by leap search. In SIGMOD Conference, pages 433–444, 2008.
[18]
J. Han and M. Kamber. Data Mining Concepts and Techniques. Morgan Kaufmann, 2nd edition, 2006.
[19]
Andy Zaidman and Serge Demeyer. Automatic identification of key classes in a software system using webmining techniques. Journal of Software Maintenance, 20(6):387–417, 2008.
[20]
Fabrizio Perin, Lukas Renggli, and Jorge Ressia. Ranking Software Artifacts. In Workshop on FAMIX and Moose in Reengineering, 2010.
[21]
Daniela Steidl, Benjamin Hummel, and Elmar Jürgens. Using Network Analysis for Recommendation of Central Software Classes. In Working Conference on Reverse Engineering, pages 93–102, 2012.
[22]
Maen Hammad, Michael L. Collard, and Jonathan I. Maletic. Measuring Class Importance in the Context of Design Evolution. In International Conference on Program Comprehension, pages 148–151, 2010.
[23]
James M. Bieman, Anneliese Amschler Andrews, and Helen J. Yang. Understanding Change-Proneness in OO Software through Visualization. In International Workshop on Program Comprehension, pages 44–53, 2003.
[24]
Emanuel Giger, Martin Pinzger, and Harald Gall. Predicting the fix time of bugs. In International Workshop on Recommendation Systems for Software Engineering, 2010.
[25]
T. Menzies and A. Marcus. Automated Severity Assessment of Software Defect Reports. In International Conference on Software Maintenance, 2008.
[26]
A. Lamkanfi, S. Demeyer, Q.D. Soetens, and T. Verdonck. Comparing Mining Algorithms for Predicting the Severity of a Reported Bug. In European Conference on Software Maintenance and Reengineering, 2011.
[27]
Yuan Tian, David Lo, and Chengnian Sun. Drone: Predicting priority of reported bugs by multi-factor analysis. In International Conference on Software Maintenance, 2013.
[28]
Ferdian Thung, David Lo, and Lingxiao Jiang. Automatic defect categorization. In Working Conference on Reverse Engineering, 2012.
[29]
Hongyu Zhang, Liang Gong, and Steve Versteeg. Predicting bug-fixing time: an empirical study of commercial software projects. In International Conference on Software Engineering, 2013.
[30]
Michael Gegick, Pete Rotella, and Tao Xie. Identifying security bug reports via text mining: An industrial case study. In Mining Software Repositories, pages 11–20, 2010.
[31]
Abdou Maiga, Nasir Ali, Neelesh Bhattacharya, Aminata Sabane, Yann-Ga¨ el Guéhéneuc, and Esma A¨ımeur. SMURF: A SVM-based Incremental Anti-pattern Detection Approach. In Working Conference on Reverse Engineering, pages 466–475, 2012.
[32]
Daqing Hou and Lingfeng Mo. Content Categorization of API Discussions. In International Conference on Software Maintenance, pages 60–69, 2013.
[33]
Swapna Gottipati, David Lo, and Jing Jiang. Finding relevant answers in software forums. In International Conference on Automated Software Engineering, pages 323–332, 2011.
[34]
Tien-Duy B. Le and David Lo. Will fault localization work for these failures? an automated approach to predict effectiveness of fault localization tools. In nternational Conference on Software Maintenance, 2013.
[35]
Philips Kokoh Prasetyo, David Lo, Palakorn Achananuparp, Yuan Tian, and Ee-Peng Lim. Automatic classification of software related microblogs. In International Conference on Software Maintenance, 2012.

Cited By

View all
  • (2024)Improving the Condensing of Reverse Engineered Class Diagrams using Weighted Network MetricsProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3643520(374-375)Online publication date: 14-Apr-2024
  • (2023)Logical dependencies: Extraction from the versioning system and usage in key classes detectionComputer Science and Information Systems10.2298/CSIS220518025S20:3(1015-1035)Online publication date: 2023
  • (2023)EASE: An Effort-aware Extension of Unsupervised Key Class Identification ApproachesACM Transactions on Software Engineering and Methodology10.1145/363571433:4(1-43)Online publication date: 2-Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPC 2014: Proceedings of the 22nd International Conference on Program Comprehension
June 2014
325 pages
ISBN:9781450328791
DOI:10.1145/2597008
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • TCSE: IEEE Computer Society's Tech. Council on Software Engin.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Design Metrics
  2. Important Classes
  3. Network Metrics
  4. Optimistic Classification

Qualifiers

  • Article

Conference

ICSE '14
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)4
Reflects downloads up to 01 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Improving the Condensing of Reverse Engineered Class Diagrams using Weighted Network MetricsProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3643520(374-375)Online publication date: 14-Apr-2024
  • (2023)Logical dependencies: Extraction from the versioning system and usage in key classes detectionComputer Science and Information Systems10.2298/CSIS220518025S20:3(1015-1035)Online publication date: 2023
  • (2023)EASE: An Effort-aware Extension of Unsupervised Key Class Identification ApproachesACM Transactions on Software Engineering and Methodology10.1145/363571433:4(1-43)Online publication date: 2-Dec-2023
  • (2023)Pride: Prioritizing Documentation Effort Based on a PageRank-Like Algorithm and Simple Filtering RulesIEEE Transactions on Software Engineering10.1109/TSE.2022.317146949:3(1118-1151)Online publication date: 1-Mar-2023
  • (2023)Manual Abstraction in the Wild: A Multiple-Case Study on OSS Systems’ Class Diagrams and Implementations2023 ACM/IEEE 26th International Conference on Model Driven Engineering Languages and Systems (MODELS)10.1109/MODELS58315.2023.00017(36-46)Online publication date: 1-Oct-2023
  • (2023)Identifying Key Classes for Initial Software Comprehension: Can We Do it Better?Proceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00160(1878-1889)Online publication date: 14-May-2023
  • (2022)KEADA: Identifying Key Classes in Software Systems Using Dynamic Analysis and Entropy-Based MetricsEntropy10.3390/e2405065224:5(652)Online publication date: 6-May-2022
  • (2022)Feature-based critical components identification in multimedia softwareMultimedia Tools and Applications10.1007/s11042-021-11277-181:25(35595-35618)Online publication date: 2-Jan-2022
  • (2021)Multi-Scale Software Network Model for Software Safety of the Intended Functionality2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW53611.2021.00071(250-255)Online publication date: Oct-2021
  • (2021)COSPA: Identifying Key Classes in Object-Oriented Software Using Preference AggregationIEEE Access10.1109/ACCESS.2021.31054759(114767-114780)Online publication date: 2021
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media