Tag recommendation for open source software

Tao Wang¹,
Huaimin Wang¹,
Gang Yin¹,
Charles X. Ling²,
Xiao Li^1,2 &
…
Peng Zou^1,3

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Nowadays open source software becomes highly popular and is of great importance for most software engineering activities. To facilitate software organization and retrieval, tagging is extensively used in open source communities. However, finding the desired software through tags in these communities such as Freecode and ohloh is still challenging because of tag insufficiency. In this paper, we propose TRG (tag recommendation based on semantic graph), a novel approach to discovering and enriching tags of open source software. Firstly, we propose a semantic graph to model the semantic correlations between tags and the words in software descriptions. Then based on the graph, we design an effective algorithm to recommend tags for software. With comprehensive experiments on large-scale open source software datasets by comparing with several typical related works, we demonstrate the effectiveness and efficiency of our method in recommending proper tags.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hybrid Approach for Tag Hierarchy Construction

TagCombine: Recommending Tags to Contents in Software Information Sites

Article 14 September 2015

Semantically-enhanced topic recommendation systems for software projects

Article 24 February 2023

References

Wang T, Yin G, Li X, Wang H. Labeled topic detection of open source software from mining mass textual project profiles. In: Proceedings of the ACM SIGKDD Workshop on Software Mining. 2012, 17–24
Chapter Google Scholar
Tang J, Leung H, Luo Q, Chen D, Gong J. Towards ontology learning from folksonomies. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence. 2009, 9: 2089–2094
Google Scholar
Liu K, Fang B, Zhang W. Ontology emergence from folksonomies. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 2010, 1109–1118
Google Scholar
Wang W, Barnaghi P M, Bargiela A. Probabilistic topic models for learning terminological ontologies. IEEE Transactions on knowledge and Data engineering, 2010, 22(7): 1028–1040
Article Google Scholar
Griffiths T, Steyvers M. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(Suppl 1): 5228–5235
Article Google Scholar
Yin Z, Cao L, Han J, Zhai C, Huang T. Geographical topic discovery and comparison. In: Proceedings of the 20th International Conference on World Wide Web. 2011, 247–256
Chapter Google Scholar
Sigurbjörnsson B, Van Zwol R. Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th International Conference on World Wide Web. 2008, 327–336
Chapter Google Scholar
Song Y, Zhang L, Giles C. Automatic tag recommendation algorithms for social recommender systems. ACM Transactions on theWeb, 2011, 5(1): 4:1–4:31
Google Scholar
Garg N, Weber I. Personalized, interactive tag recommendation for flickr. In: Proceedings of the 2008 ACM Conference on Recommender Systems. 2008, 67–74
Chapter Google Scholar
Alexopoulos P, Pavlopoulos J, Wallace M, Kafentzis K. Exploiting ontological relations for automatic semantic tag recommendation. In: Proceedings of the 7th International Conference on Semantic Systems. 2011, 105–110
Google Scholar
Djuana E, Xu Y, Li Y, Cox C. Personalization in tag ontology learning for recommendation making. In: Proceedings of the 14th International Conference on Information Integration and Web-based Applications and Services. 2012, 368–377
Google Scholar
Kawaguchi S, Garg P, Matsushita M, Inoue K. Mudablue: an automatic categorization system for open source repositories. Journal of Systems and Software, 2006, 79(7): 939–953
Article Google Scholar
Kuhn A. Automatic labeling of software components and their evolution using log-likelihood ratio of word frequencies in source code. In: Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. 2009, 175–178
Google Scholar
McMillan C, Linares-Vásquez M, Poshyvanyk D, Grechanik M. Categorizing software applications for maintenance. In: Proceedings of the 27th IEEE International Conference on Software Maintenance. 2011, 343–352
Google Scholar
Tian K, Revelle M, Poshyvanyk D. Using latent dirichlet allocation for automatic categorization of software. In: Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. 2009, 163–166
Google Scholar
Blei D, Ng A, Jordan M. Latent dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993–1022
MATH Google Scholar
Wang Y, Agichtein E, Benzi M. Tm-lda: efficient online modeling of latent topic transitions in social media. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 123–131
Chapter Google Scholar
Cleary B, Exton C, Buckley J, English M. An empirical analysis of information retrieval based concept location techniques in software comprehension. Empirical Software Engineering, 2009, 14(1): 93–130
Article Google Scholar
Linstead E, Rigor P, Bajracharya S, Lopes C, Baldi P. Mining concepts from code with probabilistic topic models. In: Proceedings of the 22nd IEEE/ACMInternational Conference on Automated Software Engineering. 2007, 461–464
Google Scholar
Grechanik M, Fu C, Xie Q, McMillan C, Poshyvanyk D, Cumby C. A search engine for finding highly relevant applications. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 475–484
Chapter Google Scholar
Zhou J, Zhang H, Lo D. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th International Conference on Software Engineering. 2012, 14–24
Google Scholar
Si X, Sun M. Tag-LDA for scalable real-time tag recommendation. Journal of Computational Information Systems, 2009, 6(1): 23–31
Google Scholar
Krestel R, Fankhauser P, Nejdl W. Latent dirichlet allocation for tag recommendation. In: Proceedings of the 3rd ACM Conference on Recommender Systems. 2009, 61–68
Google Scholar
Jäschke R, Marinho L, Hotho A, Schmidt-Thieme L, Stumme G. Tag recommendations in social bookmarking systems. AI Communications, 2008, 21(4): 231–247
MATH MathSciNet Google Scholar
Song Y, Zhuang Z, Li H, Zhao Q, Li J, Lee W C, Giles C L. Real-time automatic tag recommendation. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2008, 515–522
Google Scholar
Adrian B, Sauermann L, Roth-berghofer T. Contag: A semantic tag recommendation system. In: Proceedings of IMEDIA 2007 and ISEMANTICS 2007. 2007, 297–304
Google Scholar
Prokofyev R, Boyarsky A, Ruchayskiy O, Aberer K, Demartini G, Cudré-Mauroux P. Tag recommendation for large-scale ontology-based information systems. In: Proceedings of the 11th International Conference on the Semantic Web. 2012, 325–336
Google Scholar
Wartena C, Brussee R, Wibbels M. Using tag co-occurrence for recommendation. In: Proceedings of the 9th International Conference on Intelligent Systems Design and Applications. 2009, 273–278
Google Scholar
Krestel R, Fankhauser P. Tag recommendation using probabilistic topic models. In: Proceedings of the 2009 Discovery Challenge. 2009, 131–141
Google Scholar
Asuncion H U, Asuncion A U, Taylor R N. Software traceability with topic modeling. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 95–104
Chapter Google Scholar
Ramage D, Rosen E, Chuang J, Manning C D, McFarland D A. Topic modeling for the social sciences. In: Proceedings of NIPS 2009 Workshop on Applications for Topic Models: Text and Beyond. 2009, 1–4
Google Scholar
Somasundaram K, Murphy G C. Automatic categorization of bug reports using latent dirichlet allocation. In: Proceedings of the 5th India Software Engineering Conference. 2012, 125–130
Google Scholar
Ramage D, Hall D, Nallapati R, Manning C. Labeled lDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009, 248–256
Google Scholar
McCallum A. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002
Google Scholar
Porter M F. An algorithm for suffix stripping. Program: electronic library and information systems, 1980, 14(3): 130–137
Article Google Scholar
Lewis D D, Yang Y, Rose T G, Li F. Rcv1: A new benchmark collection for text categorization research. The Journal of Machine Learning Research, 2004, 5: 361–397
Google Scholar
FariÃ’sa A, Brisaboa N R, Navarro G, Claude F, Places n S, RodrÃguez E. Word-based self-indexes for natural language text. ACM Transactions on Information Systems, 2012, 30(1): 1:1–1:34
Google Scholar
Batagelj V, ZaverAnik M. Generalized cores. Arxiv preprint cs/0202039, 2002
Google Scholar
Gemmell J, Ramezani M, Schimoler T, Christiansen L, Mobasher B. A fast effective multi-channeled tag recommender. In: Proceedings of the 2009 Discovery Challenge Workshop. 2009, 497: 59–63
Google Scholar
Gemmell J, Schimoler T, Ramezani M, Mobasher B. Adapting knearest neighbor for tag recommendation in folksonomies. In: Proceedings of the 7th Workshop on Intelligent Techniques for Web Personalization and Recommender Systems. 2009, 628: Paper 8
Garg N, Weber I. Personalized, interactive tag recommendation for flickr. In: Proceedings of the 2008 ACM Conference on Recommender Systems. 2008, 67–74
Chapter Google Scholar
Illig J, Hotho A, JÃd’schke R, Stumme G. A comparison of content-based tag recommendations in folksonomy systems. Lecture Notes in Computer Science, 2011, 6581: 136–149
Article Google Scholar
Thung F, Lo D, Jiang L. Detecting similar applications with collaborative tagging. In: Proceedings of the 28th IEEE International Confer ence on Software Maintenance. 2012, 600–603
Google Scholar
Mockus A. Amassing and indexing a large sample of version control systems: towards the census of public source code history. In: Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. 2009, 11–20
Google Scholar

Download references

Author information

Authors and Affiliations

National Laboratory for Parallel and Distributed Processing, College of Computer, National University of Defense Technology, Changsha, 410073, China
Tao Wang, Huaimin Wang, Gang Yin, Xiao Li & Peng Zou
Department of Computer Science, The University of Western Ontario, London, N6A5B7, Canada
Charles X. Ling & Xiao Li
Academy of Equipment, Beijing, 101400, China
Peng Zou

Authors

Tao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Huaimin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Gang Yin
View author publications
You can also search for this author in PubMed Google Scholar
Charles X. Ling
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Wang.

Additional information

Tao Wang received both his BS and MS in Computer Science from National University of Defense Technology (NUDT) in 2007 and 2010. He is now a PhD candidate in Computer Science, NUDT. His work interests include open source software engineering, machine learning, data mining, and knowledge discovering in open source software.

Huaimin Wang received his Ph D in Computer Science from National University of Defense Technology (NUDT) in 1992. He is now a professor and chief engineer in department of educational affairs, NUDT. He has been awarded the “Chang Jiang Scholars Program” professor and the Distinct Young Scholar, etc. He has published more than 100 research papers in peer-reviewed international conferences and journals. His current research interests include middleware, software agent, and trustworthy computing.

Gang Yin received his Ph D degree in Computer Science from National University of Defense Technology (NUDT) in 2006. He is now an associate professor in NUDT. He has worked in several grand research projects including national 973, 863 projects and so on. He has published more than 60 research papers in international conferences and journals. His current research interests include distributed computing, information security, software engineering, and machine learning.

Charles X. Ling earned both of his MS and PhD from Computer and Information Science at University of Pennsylvania, and now a faculty member in Computer Science at Western University. He was/is an Associate Editor of IEEE TKDE, ACM TIST as well as the Panel Co-chair of ACM SIGKDD’12 and so on. He has published over 120 research papers in peer-reviewed conferences and journals such as IJCAI, TKDE. He is a Senior Member of IEEE and Lifetime Member of AAAI.

Xiao Li received the BS and MS degrees in Computer Science at the National University of Defense Technology in 2006 and 2008. He is currently a PhD Candidate in the Department of Computer Science at The University of Western Ontario. His research interests include data mining, machine learning, and related real-world applications.

Peng Zou is a professor, PhD supervisor in National University of Defense Technology, and now works in Academy of Equipment. He has worked as the director of several grand research projects and published many research papers in peer-reviewed international conferences and journals. His research interests include network, information security, distributed computing, and software engineering.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, T., Wang, H., Yin, G. et al. Tag recommendation for open source software. Front. Comput. Sci. 8, 69–82 (2014). https://doi.org/10.1007/s11704-013-2394-x

Download citation

Received: 21 December 2012
Accepted: 30 September 2013
Published: 15 November 2013
Issue Date: February 2014
DOI: https://doi.org/10.1007/s11704-013-2394-x

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Hybrid Approach for Tag Hierarchy Construction

TagCombine: Recommending Tags to Contents in Software Information Sites

Semantically-enhanced topic recommendation systems for software projects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Tag recommendation for open source software

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Hybrid Approach for Tag Hierarchy Construction

TagCombine: Recommending Tags to Contents in Software Information Sites

Semantically-enhanced topic recommendation systems for software projects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now