Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3275219.3275222acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article

Measuring and Predicting the Relevance Ratings between FLOSS Projects using Topic Features

Published: 16 September 2018 Publication History

Abstract

Understanding the relevance between the Free/Libra Open Source Software projects is important for developers to perform code and design reuse, discover and develop new features, keep their projects up-to-date, and etc. However, it is challenging to perform relevance ratings between the FLOSS projects mainly because: 1) beyond simple code similarity, there are complex aspects considered when measuring the relevance; and 2) the prohibitive large amount of FLOSS projects available. To address the problem, in this paper, we propose a method to measure and further predict the relevance ratings between FLOSS projects. Our method uses topic features extracted by the LDA topic model to describe the characteristics of a project. By using the topic features, multiple aspects of FLOSS projects such as the application domain, technology used, and programming language are extracted and further used to measure and predict their relevance ratings. Based on the topic features, our method uses matrix factorization to leverage the partially known relevance ratings between the projects to learn the mapping between different topic features to the relevance ratings. Finally, our method combines the topic modeling and matrix factorization technologies to predict the relevance ratings between software projects without human intervention, which is scalable to a large amount of projects. We evaluate the performance of the proposed method by applying our topic extraction and relevance modeling methods using 300 projects from GitHub. The result of topic extraction experiment shows that, for topic modeling, our LDA-based approach achieves the highest hit rate of 98.3% and the highest average accuracy of 29.8%. And the relevance modeling experiment shows that our relevance modeling approach achieves the minimum average predict error of 0.093, suggesting the effectiveness of applying the proposed method on real-world data sets.

References

[1]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993--1022.
[2]
Jailton Coelho, Marco Tulio Valente, Jailton Coelho, and Marco Tulio Valente. 2017. Why modern open source projects fail. In Proceedings of 2017 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering. 186--196.
[3]
Heinrich Gregor. 2005. Parameter estimation for text analysis. Technical report (2005).
[4]
Shinji Kawaguchi, Pankaj K Garg, Makoto Matsushita, and Katsuro Inoue. 2006. Mudablue: An automatic categorization system for open source repositories. Journal of Systems and Software 79, 7 (2006), 939--953.
[5]
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009).
[6]
Thomas K Landauer, Peter W Foltz, and Darrell Laham. 1998. An introduction to latent semantic analysis. Discourse processes 25, 2-3 (1998), 259--284.
[7]
Yue Lu, Qiaozhu Mei, and Cheng Xiang Zhai. 2011. Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA. Information Retrieval 14, 2 (2011), 178--203.
[8]
Collin McMillan, Mark Grechanik, and Denys Poshyvanyk. 2012. Detecting similar software applications. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 364--374.
[9]
Collin McMillan, Negar Hariri, Denys Poshyvanyk, Jane Cleland-Huang, and Bamshad Mobasher. 2012. Recommending source code for use in rapid software prototypes. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 848--858.
[10]
Andriy Mnih and Ruslan R Salakhutdinov. 2008. Probabilistic matrix factorization. In Advances in neural information processing systems. 1257--1264.
[11]
Annibale Panichella, Bogdan Dit, Rocco Oliveto, Massimilano Di Penta, Denys Poshynanyk, and Andrea De Lucia. 2013. How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. In Software Engineering (ICSE), 2013 35th International Conference on. IEEE, 522--531.
[12]
Ruslan Salakhutdinov and Andriy Mnih. 2008. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th international conference on Machine learning. ACM, 880--887.
[13]
Abhishek Sharma, Ferdian Thung, Pavneet Singh Kochhar, Agus Sulistya, and David Lo. 2017. Cataloging github repositories. In Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering. ACM, 314--319.
[14]
Ferdian Thung, David Lo, and Lingxiao Jiang. 2012. Detecting similar applications with collaborative tagging. In Software Maintenance (ICSM), 2012 28th IEEE International Conference on. IEEE, 600--603.
[15]
Chong Wang and David M Blei. 2011. Collaborative topic modeling for recommending scientific articles. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 448--456.
[16]
Wenyuan Xu, Xiaobing Sun, Xin Xia, and Xiang Chen. 2017. Scalable relevant project recommendation on GitHub. In Proceedings of the 9th Asia-Pacific Symposium on Internetware. ACM, 9.
[17]
Cheng Yang, Qiang Fan, Tao Wang, Gang Yin, and Huaimin Wang. 2016. Repo-Like: personal repositories recommendation in social coding communities. In Proceedings of the 8th Asia-Pacific Symposium on Internetware. ACM, 54--62.
[18]
Huan Yu, Xin Xia, Xiaoqiong Zhao, and Weiwei Qiu. 2017. Combining Collaborative Filtering and Topic Modeling for More Accurate Android Mobile App Library Recommendation. In Proceedings of the 9th Asia-Pacific Symposium on Internetware. ACM, 17.
[19]
Yun Zhang, David Lo, Pavneet Singh Kochhar, Xin Xia, Quanlai Li, and Jianling Sun. 2017. Detecting similar repositories on GitHub. In Software Analysis, Evolution and Reengineering (SANER), 2017 IEEE 24th International Conference on. IEEE, 13--23.
[20]
Jie Zou, Ling Xu, Mengning Yang, Xiaohong Zhang, and Dan Yang. 2017. Towards comprehending the non-functional requirements through DevelopersâĂŹ eyes: An exploration of Stack Overflow using topic analysis. Information and Software Technology 84 (2017), 19--32.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
Internetware '18: Proceedings of the 10th Asia-Pacific Symposium on Internetware
September 2018
167 pages
ISBN:9781450365901
DOI:10.1145/3275219
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Institute of Software, Chinese Academy of Sciences
  • CCF: China Computer Federation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FLOSS Projects
  2. Matrix Factorization
  3. Relevance Rating
  4. Topic Modeling

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

Internetware '18

Acceptance Rates

Internetware '18 Paper Acceptance Rate 20 of 26 submissions, 77%;
Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 89
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media