Abstract
Nowadays open source software becomes highly popular and is of great importance for most software engineering activities. To facilitate software organization and retrieval, tagging is extensively used in open source communities. However, finding the desired software through tags in these communities such as Freecode and ohloh is still challenging because of tag insufficiency. In this paper, we propose TRG (tag recommendation based on semantic graph), a novel approach to discovering and enriching tags of open source software. Firstly, we propose a semantic graph to model the semantic correlations between tags and the words in software descriptions. Then based on the graph, we design an effective algorithm to recommend tags for software. With comprehensive experiments on large-scale open source software datasets by comparing with several typical related works, we demonstrate the effectiveness and efficiency of our method in recommending proper tags.
Similar content being viewed by others
References
Wang T, Yin G, Li X, Wang H. Labeled topic detection of open source software from mining mass textual project profiles. In: Proceedings of the ACM SIGKDD Workshop on Software Mining. 2012, 17–24
Tang J, Leung H, Luo Q, Chen D, Gong J. Towards ontology learning from folksonomies. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence. 2009, 9: 2089–2094
Liu K, Fang B, Zhang W. Ontology emergence from folksonomies. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 2010, 1109–1118
Wang W, Barnaghi P M, Bargiela A. Probabilistic topic models for learning terminological ontologies. IEEE Transactions on knowledge and Data engineering, 2010, 22(7): 1028–1040
Griffiths T, Steyvers M. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(Suppl 1): 5228–5235
Yin Z, Cao L, Han J, Zhai C, Huang T. Geographical topic discovery and comparison. In: Proceedings of the 20th International Conference on World Wide Web. 2011, 247–256
Sigurbjörnsson B, Van Zwol R. Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th International Conference on World Wide Web. 2008, 327–336
Song Y, Zhang L, Giles C. Automatic tag recommendation algorithms for social recommender systems. ACM Transactions on theWeb, 2011, 5(1): 4:1–4:31
Garg N, Weber I. Personalized, interactive tag recommendation for flickr. In: Proceedings of the 2008 ACM Conference on Recommender Systems. 2008, 67–74
Alexopoulos P, Pavlopoulos J, Wallace M, Kafentzis K. Exploiting ontological relations for automatic semantic tag recommendation. In: Proceedings of the 7th International Conference on Semantic Systems. 2011, 105–110
Djuana E, Xu Y, Li Y, Cox C. Personalization in tag ontology learning for recommendation making. In: Proceedings of the 14th International Conference on Information Integration and Web-based Applications and Services. 2012, 368–377
Kawaguchi S, Garg P, Matsushita M, Inoue K. Mudablue: an automatic categorization system for open source repositories. Journal of Systems and Software, 2006, 79(7): 939–953
Kuhn A. Automatic labeling of software components and their evolution using log-likelihood ratio of word frequencies in source code. In: Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. 2009, 175–178
McMillan C, Linares-Vásquez M, Poshyvanyk D, Grechanik M. Categorizing software applications for maintenance. In: Proceedings of the 27th IEEE International Conference on Software Maintenance. 2011, 343–352
Tian K, Revelle M, Poshyvanyk D. Using latent dirichlet allocation for automatic categorization of software. In: Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. 2009, 163–166
Blei D, Ng A, Jordan M. Latent dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993–1022
Wang Y, Agichtein E, Benzi M. Tm-lda: efficient online modeling of latent topic transitions in social media. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 123–131
Cleary B, Exton C, Buckley J, English M. An empirical analysis of information retrieval based concept location techniques in software comprehension. Empirical Software Engineering, 2009, 14(1): 93–130
Linstead E, Rigor P, Bajracharya S, Lopes C, Baldi P. Mining concepts from code with probabilistic topic models. In: Proceedings of the 22nd IEEE/ACMInternational Conference on Automated Software Engineering. 2007, 461–464
Grechanik M, Fu C, Xie Q, McMillan C, Poshyvanyk D, Cumby C. A search engine for finding highly relevant applications. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 475–484
Zhou J, Zhang H, Lo D. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th International Conference on Software Engineering. 2012, 14–24
Si X, Sun M. Tag-LDA for scalable real-time tag recommendation. Journal of Computational Information Systems, 2009, 6(1): 23–31
Krestel R, Fankhauser P, Nejdl W. Latent dirichlet allocation for tag recommendation. In: Proceedings of the 3rd ACM Conference on Recommender Systems. 2009, 61–68
Jäschke R, Marinho L, Hotho A, Schmidt-Thieme L, Stumme G. Tag recommendations in social bookmarking systems. AI Communications, 2008, 21(4): 231–247
Song Y, Zhuang Z, Li H, Zhao Q, Li J, Lee W C, Giles C L. Real-time automatic tag recommendation. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2008, 515–522
Adrian B, Sauermann L, Roth-berghofer T. Contag: A semantic tag recommendation system. In: Proceedings of IMEDIA 2007 and ISEMANTICS 2007. 2007, 297–304
Prokofyev R, Boyarsky A, Ruchayskiy O, Aberer K, Demartini G, Cudré-Mauroux P. Tag recommendation for large-scale ontology-based information systems. In: Proceedings of the 11th International Conference on the Semantic Web. 2012, 325–336
Wartena C, Brussee R, Wibbels M. Using tag co-occurrence for recommendation. In: Proceedings of the 9th International Conference on Intelligent Systems Design and Applications. 2009, 273–278
Krestel R, Fankhauser P. Tag recommendation using probabilistic topic models. In: Proceedings of the 2009 Discovery Challenge. 2009, 131–141
Asuncion H U, Asuncion A U, Taylor R N. Software traceability with topic modeling. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 95–104
Ramage D, Rosen E, Chuang J, Manning C D, McFarland D A. Topic modeling for the social sciences. In: Proceedings of NIPS 2009 Workshop on Applications for Topic Models: Text and Beyond. 2009, 1–4
Somasundaram K, Murphy G C. Automatic categorization of bug reports using latent dirichlet allocation. In: Proceedings of the 5th India Software Engineering Conference. 2012, 125–130
Ramage D, Hall D, Nallapati R, Manning C. Labeled lDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009, 248–256
McCallum A. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002
Porter M F. An algorithm for suffix stripping. Program: electronic library and information systems, 1980, 14(3): 130–137
Lewis D D, Yang Y, Rose T G, Li F. Rcv1: A new benchmark collection for text categorization research. The Journal of Machine Learning Research, 2004, 5: 361–397
FariÃ’sa A, Brisaboa N R, Navarro G, Claude F, Places n S, RodrÃguez E. Word-based self-indexes for natural language text. ACM Transactions on Information Systems, 2012, 30(1): 1:1–1:34
Batagelj V, ZaverAnik M. Generalized cores. Arxiv preprint cs/0202039, 2002
Gemmell J, Ramezani M, Schimoler T, Christiansen L, Mobasher B. A fast effective multi-channeled tag recommender. In: Proceedings of the 2009 Discovery Challenge Workshop. 2009, 497: 59–63
Gemmell J, Schimoler T, Ramezani M, Mobasher B. Adapting knearest neighbor for tag recommendation in folksonomies. In: Proceedings of the 7th Workshop on Intelligent Techniques for Web Personalization and Recommender Systems. 2009, 628: Paper 8
Garg N, Weber I. Personalized, interactive tag recommendation for flickr. In: Proceedings of the 2008 ACM Conference on Recommender Systems. 2008, 67–74
Illig J, Hotho A, JÃd’schke R, Stumme G. A comparison of content-based tag recommendations in folksonomy systems. Lecture Notes in Computer Science, 2011, 6581: 136–149
Thung F, Lo D, Jiang L. Detecting similar applications with collaborative tagging. In: Proceedings of the 28th IEEE International Confer ence on Software Maintenance. 2012, 600–603
Mockus A. Amassing and indexing a large sample of version control systems: towards the census of public source code history. In: Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. 2009, 11–20
Author information
Authors and Affiliations
Corresponding author
Additional information
Tao Wang received both his BS and MS in Computer Science from National University of Defense Technology (NUDT) in 2007 and 2010. He is now a PhD candidate in Computer Science, NUDT. His work interests include open source software engineering, machine learning, data mining, and knowledge discovering in open source software.
Huaimin Wang received his Ph D in Computer Science from National University of Defense Technology (NUDT) in 1992. He is now a professor and chief engineer in department of educational affairs, NUDT. He has been awarded the “Chang Jiang Scholars Program” professor and the Distinct Young Scholar, etc. He has published more than 100 research papers in peer-reviewed international conferences and journals. His current research interests include middleware, software agent, and trustworthy computing.
Gang Yin received his Ph D degree in Computer Science from National University of Defense Technology (NUDT) in 2006. He is now an associate professor in NUDT. He has worked in several grand research projects including national 973, 863 projects and so on. He has published more than 60 research papers in international conferences and journals. His current research interests include distributed computing, information security, software engineering, and machine learning.
Charles X. Ling earned both of his MS and PhD from Computer and Information Science at University of Pennsylvania, and now a faculty member in Computer Science at Western University. He was/is an Associate Editor of IEEE TKDE, ACM TIST as well as the Panel Co-chair of ACM SIGKDD’12 and so on. He has published over 120 research papers in peer-reviewed conferences and journals such as IJCAI, TKDE. He is a Senior Member of IEEE and Lifetime Member of AAAI.
Xiao Li received the BS and MS degrees in Computer Science at the National University of Defense Technology in 2006 and 2008. He is currently a PhD Candidate in the Department of Computer Science at The University of Western Ontario. His research interests include data mining, machine learning, and related real-world applications.
Peng Zou is a professor, PhD supervisor in National University of Defense Technology, and now works in Academy of Equipment. He has worked as the director of several grand research projects and published many research papers in peer-reviewed international conferences and journals. His research interests include network, information security, distributed computing, and software engineering.
Rights and permissions
About this article
Cite this article
Wang, T., Wang, H., Yin, G. et al. Tag recommendation for open source software. Front. Comput. Sci. 8, 69–82 (2014). https://doi.org/10.1007/s11704-013-2394-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-013-2394-x