Article

Condensing class diagrams by analyzing design and network metrics using optimistic classification

Authors:

Mohd Hafeez Osman,

Michel R. V. ChaudronAuthors Info & Claims

ICPC 2014: Proceedings of the 22nd International Conference on Program Comprehension

Pages 110 - 121

https://doi.org/10.1145/2597008.2597157

Published: 02 June 2014 Publication History

Abstract

A class diagram of a software system enhances our ability to understand software design. However, this diagram is often unavailable. Developers usually reconstruct the diagram by reverse engineering it from source code. Unfortunately, the resultant diagram is often very cluttered; making it difficult to learn anything valuable from it. Thus, it would be very beneficial if we are able to condense the reverse- engineered class diagram to contain only the important classes depicting the overall design of a software system. Such diagram would make program understanding much easier. A class can be important, for example, if its removal would break many connections between classes. In our work, we estimate this kind of importance by using design (e.g., number of attributes, number of dependencies, etc.) and network metrics (e.g., betweenness centrality, closeness centrality, etc.). We use these metrics as features and input their values to our optimistic classifier that will predict if a class is important or not. Different from standard classification, our newly proposed optimistic classification technique deals with data scarcity problem by optimistically assigning labels to some of the unlabeled data and use them for training a better statistical model. We have evaluated our approach to condense reverse-engineered diagrams of 9 software systems and compared our approach with the state-of-the-art work of Osman et al. Our experiments show that our approach can achieve an average Area Under the Receiver Operating Characteristic Curve (AUC) score of 0.825, which is a 9.1% improvement compared to the state-of-the-art approach.

References

[1]

MagicDraw. http://www.nomagic.com/products/magicdraw.html.

[2]

eUML2. http: //www.soyatec.com/euml2/com.soyatec.uml.doc/.

[3]

Class Visualizer. http://www.class-visualizer.net/.

[4]

Ana M. Fernandez-Saez, Michel R. V. Chaudron, Marcela Genero, and Isabel Ramos. Are Forward Designed or Reverse-Engineered UML Diagrams More Helpful for Code Maintenance?: A Controlled Experiment. In International Conference on Evaluation and Assessment in Software Engineering, pages 60–71, 2013.

Digital Library

[5]

Mohd Hafeez Osman, Michel R. V. Chaudron, and Peter van der Putten. An Analysis of Machine Learning Algorithms for Condensing Reverse Engineered Class Diagrams. In International Conference on Software Maintenance, pages 140–149, 2013.

Digital Library

[6]

Shyam R. Chidamber and Chris F. Kemerer. A Metrics Suite for Object Oriented Design. IEEE Trans. Software Eng., 20(6):476–493, 1994.

Digital Library

[7]

Al Lake and Curtis Cook. Use of factor analysis to develop OOP software complexity metrics. In Proc. 6th Annual Oregon Workshop on Software Metrics, Silver Falls, Oregon, 1994.

[8]

Lionel C. Briand, Premkumar T. Devanbu, and Walcélio L. Melo. An Investigation into Coupling Measures for C++. In International Conference on Software Engineering, pages 412–421, 1997.

Digital Library

[9]

Jiliang Tang, Huiji Gao, Xia Hu, and Huan Liu. Exploiting homophily effect for trust prediction. In International Conference on Web Search and Data Mining, pages 53–62, 2013.

Digital Library

[10]

Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual Web search engine. In International Conference on World-Wide Web, pages 107–117, 1998.

Digital Library

[11]

Foster J. Provost, Tom Fawcett, and Ron Kohavi. The Case against Accuracy Estimation for Comparing Induction Algorithms. In International Conference on Machine Learning, pages 445–453, 1998.

Digital Library

[12]

Ahmed Lamkanfi, Serge Demeyer, Emanuel Giger, and Bart Goethals. Predicting the severity of a reported bug. In Mining Software Repositories, pages 1–10, 2010.

[13]

Stefan Lessmann, Bart Baesens, Christophe Mues, and Swantje Pietsch. Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Trans. Software Eng., 34(4):485–496, 2008.

Digital Library

[14]

Daniele Romano and Martin Pinzger. Using source code metrics to predict change-prone Java interfaces. In International Conference on Software Maintenance, pages 303–312, 2011.

Digital Library

[15]

Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The weka data mining software: an update. SIGKDD Explorations, 11(1):10–18, 2009.

Digital Library

[16]

Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu. Discriminative Frequent Pattern Analysis for Effective Classification. In International Conference on Data Enginering, pages 716–725, 2007.

[17]

Xifeng Yan, Hong Cheng, Jiawei Han, and Philip S. Yu. Mining significant graph patterns by leap search. In SIGMOD Conference, pages 433–444, 2008.

Digital Library

[18]

J. Han and M. Kamber. Data Mining Concepts and Techniques. Morgan Kaufmann, 2nd edition, 2006.

Digital Library

[19]

Andy Zaidman and Serge Demeyer. Automatic identification of key classes in a software system using webmining techniques. Journal of Software Maintenance, 20(6):387–417, 2008.

Digital Library

[20]

Fabrizio Perin, Lukas Renggli, and Jorge Ressia. Ranking Software Artifacts. In Workshop on FAMIX and Moose in Reengineering, 2010.

[21]

Daniela Steidl, Benjamin Hummel, and Elmar Jürgens. Using Network Analysis for Recommendation of Central Software Classes. In Working Conference on Reverse Engineering, pages 93–102, 2012.

Digital Library

[22]

Maen Hammad, Michael L. Collard, and Jonathan I. Maletic. Measuring Class Importance in the Context of Design Evolution. In International Conference on Program Comprehension, pages 148–151, 2010.

Digital Library

[23]

James M. Bieman, Anneliese Amschler Andrews, and Helen J. Yang. Understanding Change-Proneness in OO Software through Visualization. In International Workshop on Program Comprehension, pages 44–53, 2003.

Digital Library

[24]

Emanuel Giger, Martin Pinzger, and Harald Gall. Predicting the fix time of bugs. In International Workshop on Recommendation Systems for Software Engineering, 2010.

Digital Library

[25]

T. Menzies and A. Marcus. Automated Severity Assessment of Software Defect Reports. In International Conference on Software Maintenance, 2008.

[26]

A. Lamkanfi, S. Demeyer, Q.D. Soetens, and T. Verdonck. Comparing Mining Algorithms for Predicting the Severity of a Reported Bug. In European Conference on Software Maintenance and Reengineering, 2011.

Digital Library

[27]

Yuan Tian, David Lo, and Chengnian Sun. Drone: Predicting priority of reported bugs by multi-factor analysis. In International Conference on Software Maintenance, 2013.

Digital Library

[28]

Ferdian Thung, David Lo, and Lingxiao Jiang. Automatic defect categorization. In Working Conference on Reverse Engineering, 2012.

Digital Library

[29]

Hongyu Zhang, Liang Gong, and Steve Versteeg. Predicting bug-fixing time: an empirical study of commercial software projects. In International Conference on Software Engineering, 2013.

Digital Library

[30]

Michael Gegick, Pete Rotella, and Tao Xie. Identifying security bug reports via text mining: An industrial case study. In Mining Software Repositories, pages 11–20, 2010.

[31]

Abdou Maiga, Nasir Ali, Neelesh Bhattacharya, Aminata Sabane, Yann-Ga¨ el Guéhéneuc, and Esma A¨ımeur. SMURF: A SVM-based Incremental Anti-pattern Detection Approach. In Working Conference on Reverse Engineering, pages 466–475, 2012.

Digital Library

[32]

Daqing Hou and Lingfeng Mo. Content Categorization of API Discussions. In International Conference on Software Maintenance, pages 60–69, 2013.

Digital Library

[33]

Swapna Gottipati, David Lo, and Jing Jiang. Finding relevant answers in software forums. In International Conference on Automated Software Engineering, pages 323–332, 2011.

Digital Library

[34]

Tien-Duy B. Le and David Lo. Will fault localization work for these failures? an automated approach to predict effectiveness of fault localization tools. In nternational Conference on Software Maintenance, 2013.

Digital Library

[35]

Philips Kokoh Prasetyo, David Lo, Palakorn Achananuparp, Yuan Tian, and Ee-Peng Lim. Automatic classification of software related microblogs. In International Conference on Software Maintenance, 2012.

Digital Library

Cited By

Pan WWu WMing HKim DYang JLiu RRoychoudhury APaiva AAbreu RStorey M(2024)Improving the Condensing of Reverse Engineered Class Diagrams using Weighted Network MetricsProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3643520(374-375)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639478.3643520
Stana AŞora I(2023)Logical dependencies: Extraction from the versioning system and usage in key classes detectionComputer Science and Information Systems10.2298/CSIS220518025S20:3(1015-1035)Online publication date: 2023
https://doi.org/10.2298/CSIS220518025S
Pan WKessentini MMing HYang Z(2023)EASE: An Effort-aware Extension of Unsupervised Key Class Identification ApproachesACM Transactions on Software Engineering and Methodology10.1145/363571433:4(1-43)Online publication date: 2-Dec-2023
https://dl.acm.org/doi/10.1145/3635714
Show More Cited By

Index Terms

Condensing class diagrams by analyzing design and network metrics using optimistic classification

Recommendations

Improving the Condensing of Reverse Engineered Class Diagrams using Weighted Network Metrics
ICSE-Companion '24: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings

Reverse engineered class diagrams (REDs) are helpful to ease the comprehension of complex software. However, the original REDs might contain many details and thus provide little benefit. Condensing REDs by identifying the most important classes (aka key ...
A Better Set of Object-Oriented Design Metrics for Within-Project Defect Prediction
EASE '20: Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering

Background: Using design metrics to predict fault-prone elements of a software design can help to focus attention on classes that need redesign and more extensive testing. However, some design metrics have been pointed out to be theoretically invalid, ...
An Analysis of Machine Learning Algorithms for Condensing Reverse Engineered Class Diagrams
ICSM '13: Proceedings of the 2013 IEEE International Conference on Software Maintenance

There is a range of techniques available to reverse engineer software designs from source code. However, these approaches generate highly detailed representations. The condensing of reverse engineered representations into more high-level design ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICPC 2014: Proceedings of the 22nd International Conference on Program Comprehension

June 2014

325 pages

ISBN:9781450328791

DOI:10.1145/2597008

General Chair:
Chanchal K. Roy
University of Saskatchewan, Canada
,
Program Chairs:
Andrew Begel
Microsoft Research, USA
,
Leon Moonen
Simula Research Laboratory, Norway

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

TCSE: IEEE Computer Society's Tech. Council on Software Engin.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICSE '14

Sponsor:

SIGSOFT

ICSE '14: 36th International Conference on Software Engineering

June 2 - 3, 2014

Hyderabad, India

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
233
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)4

Reflects downloads up to 01 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pan WWu WMing HKim DYang JLiu RRoychoudhury APaiva AAbreu RStorey M(2024)Improving the Condensing of Reverse Engineered Class Diagrams using Weighted Network MetricsProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3643520(374-375)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639478.3643520
Stana AŞora I(2023)Logical dependencies: Extraction from the versioning system and usage in key classes detectionComputer Science and Information Systems10.2298/CSIS220518025S20:3(1015-1035)Online publication date: 2023
https://doi.org/10.2298/CSIS220518025S
Pan WKessentini MMing HYang Z(2023)EASE: An Effort-aware Extension of Unsupervised Key Class Identification ApproachesACM Transactions on Software Engineering and Methodology10.1145/363571433:4(1-43)Online publication date: 2-Dec-2023
https://dl.acm.org/doi/10.1145/3635714
Pan WMing HKim DYang Z(2023)Pride: Prioritizing Documentation Effort Based on a PageRank-Like Algorithm and Simple Filtering RulesIEEE Transactions on Software Engineering10.1109/TSE.2022.317146949:3(1118-1151)Online publication date: 1-Mar-2023
https://doi.org/10.1109/TSE.2022.3171469
Zhang WZhang WStrüber DHebig R(2023)Manual Abstraction in the Wild: A Multiple-Case Study on OSS Systems’ Class Diagrams and Implementations2023 ACM/IEEE 26th International Conference on Model Driven Engineering Languages and Systems (MODELS)10.1109/MODELS58315.2023.00017(36-46)Online publication date: 1-Oct-2023
https://doi.org/10.1109/MODELS58315.2023.00017
Pan WDu XMing HKim DYang ZGrundy JPollock LPenta M(2023)Identifying Key Classes for Initial Software Comprehension: Can We Do it Better?Proceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00160(1878-1889)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE48619.2023.00160
Wang LDu XJiang BPan WMing HLiu D(2022)KEADA: Identifying Key Classes in Software Systems Using Dynamic Analysis and Entropy-Based MetricsEntropy10.3390/e2405065224:5(652)Online publication date: 6-May-2022
https://doi.org/10.3390/e24050652
Rathee AChhabra J(2022)Feature-based critical components identification in multimedia softwareMultimedia Tools and Applications10.1007/s11042-021-11277-181:25(35595-35618)Online publication date: 2-Jan-2022
https://doi.org/10.1007/s11042-021-11277-1
Wu ZYang XChen PQu ZLin J(2021)Multi-Scale Software Network Model for Software Safety of the Intended Functionality2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW53611.2021.00071(250-255)Online publication date: Oct-2021
https://doi.org/10.1109/ISSREW53611.2021.00071
Du XWang TPan WWang MJiang BXiang YChai CWang JYuan C(2021)COSPA: Identifying Key Classes in Object-Oriented Software Using Preference AggregationIEEE Access10.1109/ACCESS.2021.31054759(114767-114780)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3105475
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents