research-article

Measuring and Predicting the Relevance Ratings between FLOSS Projects using Topic Features

Authors:

Xianping TaoAuthors Info & Claims

Internetware '18: Proceedings of the 10th Asia-Pacific Symposium on Internetware

Article No.: 12, Pages 1 - 10

https://doi.org/10.1145/3275219.3275222

Published: 16 September 2018 Publication History

Get Access

Abstract

Understanding the relevance between the Free/Libra Open Source Software projects is important for developers to perform code and design reuse, discover and develop new features, keep their projects up-to-date, and etc. However, it is challenging to perform relevance ratings between the FLOSS projects mainly because: 1) beyond simple code similarity, there are complex aspects considered when measuring the relevance; and 2) the prohibitive large amount of FLOSS projects available. To address the problem, in this paper, we propose a method to measure and further predict the relevance ratings between FLOSS projects. Our method uses topic features extracted by the LDA topic model to describe the characteristics of a project. By using the topic features, multiple aspects of FLOSS projects such as the application domain, technology used, and programming language are extracted and further used to measure and predict their relevance ratings. Based on the topic features, our method uses matrix factorization to leverage the partially known relevance ratings between the projects to learn the mapping between different topic features to the relevance ratings. Finally, our method combines the topic modeling and matrix factorization technologies to predict the relevance ratings between software projects without human intervention, which is scalable to a large amount of projects. We evaluate the performance of the proposed method by applying our topic extraction and relevance modeling methods using 300 projects from GitHub. The result of topic extraction experiment shows that, for topic modeling, our LDA-based approach achieves the highest hit rate of 98.3% and the highest average accuracy of 29.8%. And the relevance modeling experiment shows that our relevance modeling approach achieves the minimum average predict error of 0.093, suggesting the effectiveness of applying the proposed method on real-world data sets.

References

[1]

David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993--1022.

Abstract

References

Index Terms

Recommendations

A hybrid recommendation technique using topic embedding for rating prediction and to handle cold-start problem

Extractive text summarization using clustering-based topic modeling

Group matrix factorization for scalable topic modeling

Comments

Information

Published In

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations