Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Recommending Who to Follow in the Software Engineering Twitter Space

Published: 22 October 2018 Publication History

Abstract

With the advent of social media, developers are increasingly using it in their software development activities. Twitter is one of the popular social mediums used by developers. A recent study by Singer et al. found that software developers use Twitter to “keep up with the fast-paced development landscape.” Unfortunately, due to the general-purpose nature of Twitter, it’s challenging for developers to use Twitter for their development activities. Our survey with 36 developers who use Twitter in their development activities highlights that developers are interested in following specialized software gurus who share relevant technical tweets.
To help developers perform this task, in this work we propose a recommendation system to identify specialized software gurus. Our approach first extracts different kinds of features that characterize a Twitter user and then employs a two-stage classification approach to generate a discriminative model, which can differentiate specialized software gurus in a particular domain from other Twitter users that generate domain-related tweets (aka domain-related Twitter users). We have investigated the effectiveness of our approach in finding specialized software gurus for four different domains (JavaScript, Android, Python, and Linux) on a dataset of 86,824 Twitter users who generate 5,517,878 tweets over 1 month. Our approach can differentiate specialized software experts from other domain-related Twitter users with an F-Measure of up to 0.820. Compared with existing Twitter domain expert recommendation approaches, our proposed approach can outperform their F-Measure by at least 7.63%.

References

[1]
Palakorn Achananuparp, Ibrahim Nelman Lubis, Yuan Tian, David Lo, and Ee-Peng Lim. 2012. Observatory of trends in software related microblogs. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. ACM, 334--337.
[2]
Maurício Aniche, Christoph Treude, Igor Steinmacher, Igor Wiese, Gustavo Pinto, Margaret-Anne Storey, and Marco Aurélio Gerosa. 2018. How modern news aggregators help development communities shape and share knowledge. In Proceedings of the 40th International Conference on Software Engineering (ICSE’18). ACM, New York, NY, USA, 499--510.
[3]
John Anvik, Lyndon Hiew, and Gail C. Murphy. 2006. Who should fix this bug? In Proceedings of the 28th International Conference on Software Engineering. ACM, 361--370.
[4]
John Anvik and Gail C. Murphy. 2007. Determining implementation expertise from bug reports. In Proceedings of the 4th International Workshop on Mining Software Repositories. IEEE Computer Society, 2.
[5]
Gargi Bougie, Jamie Starke, Margaret-Anne Storey, and Daniel M. German. 2011. Towards understanding Twitter use in software engineering: Preliminary findings, ongoing challenges and future questions. In Proceedings of the 2nd International Workshop on Web 2.0 for Software Engineering. 31--36.
[6]
Ulrik Brandes. 2008. On variants of shortest-path betweenness centrality and their generic computation. Social Networks 30, 2 (2008), 136--145.
[7]
Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. 2013. API design for machine learning software: Experiences from the Scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning. 108--122.
[8]
John L. Campbell, Charles Quincy, Jordan Osserman, and Ove K. Pedersen. 2013. Coding in-depth semistructured interviews: Problems of unitization and intercoder reliability and agreement. Sociological Methods 8 Research 42, 3 (2013), 294--320.
[9]
Shuo Chang and Aditya Pal. 2013. Routing questions for collaborative answering in community question answering. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. ACM, 494--501.
[10]
Morakot Choetkiertikul, Daniel Avery, Hoa Khanh Dam, Truyen Tran, and Aditya Ghose. 2015. Who will answer my question on stack overflow? In 2015 24th Australasian Software Engineering Conference (ASWEC’15). IEEE, 155--164.
[11]
Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 1 (1960), 37--46.
[12]
Juliet Corbin, Anselm Strauss, and Anselm L Strauss. 2014. Basics of Qualitative Research. Sage.
[13]
Linton C. Freeman. 1979. Centrality in social networks conceptual clarification. Social Networks 1, 3 (1979), 215--239.
[14]
GitHub. 2015. About GitHub Inc. Retrieved from https://github.com/about/press. Accessed August 27, 2015.
[15]
Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR’13). IEEE Press, Piscataway, NJ, 233--236.
[16]
Emitza Guzman, Rana Alkadhi, and Norbert Seyff. 2016. A needle in a haystack: What do Twitter users say about software?. In 2016 IEEE 24th International Requirements Engineering Conference (RE’16). IEEE, 96--105.
[17]
Chaoran Huang, Lina Yao, Xianzhi Wang, Boualem Benatallah, and Quan Z. Sheng. 2017. Expert as a service: Software expert recommendation via knowledge domain embeddings in stack overflow. In 2017 IEEE International Conference on Web Services (ICWS’17). IEEE, 317--324.
[18]
William Hudson. 2013. Card sorting. In The Encyclopedia of Human-Computer Interaction. 2nd ed.
[19]
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422--446.
[20]
Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2016. The emerging role of data scientists on software development teams. In Proceedings of the 38th International Conference on Software Engineering. ACM, 96--107.
[21]
Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2017. Data scientists in software teams: State of the art and challenges. IEEE Transactions on Software Engineering 1 (2017), 1--1.
[22]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue B. Moon. 2010. What is Twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web (WWW’10). 591--600.
[23]
Alina Lazar, Sarah Ritchey, and Bonita Sharif. 2014. Improving the accuracy of duplicate bug report detection using textual similarity measures. In Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 308--311.
[24]
Bin Lin, Alexey Zagalsky, Margaret-Anne Storey, and Alexander Serebrenik. 2016. Why developers are slacking off: Understanding how software teams use slack. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion. ACM, 333--336.
[25]
Tie-Yan Liu. 2011. Learning to Rank for Information Retrieval. Springer Science 8 Business Media.
[26]
David Lo, Nachiappan Nagappan, and Thomas Zimmermann. 2015. How practitioners perceive the relevance of software engineering research. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, 415--425.
[27]
David Ma, David Schuler, Thomas Zimmermann, and Jonathan Sillito. 2009. Expert recommendation with usage expertise. In IEEE International Conference on Software Maintenance, 2009 (ICSM’09). IEEE, 535--538.
[28]
Laura MacLeod, Margaret-Anne Storey, and Andreas Bergen. 2015. Code, camera, action: How software developers document and share program knowledge using YouTube. In Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension. IEEE Press, 104--114.
[29]
Henry B. Mann and Donald R. Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics 18 (1947), 50--60.
[30]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich SchÃijtze. 2008. Introduction to Information Retrieval. Cambridge University Press.
[31]
Mary L. McHugh. 2012. Interrater reliability: The kappa statistic. Biochemia Medica 22, 3 (2012), 276--282.
[32]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report 1999-66. Stanford InfoLab.
[33]
Aditya Pal and Scott Counts. 2011. Identifying topical authorities in microblogs. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining. ACM, 45--54.
[34]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (Oct. 2011), 2825--2830.
[35]
Luca Ponzanelli, Gabriele Bavota, Andrea Mocci, Massimiliano Di Penta, Rocco Oliveto, Mir Hasan, Barbara Russo, Sonia Haiduc, and Michele Lanza. 2016. Too long; didn’t watch!: Extracting relevant fragments from software development video tutorials. In Proceedings of the 38th International Conference on Software Engineering. ACM, 261--272.
[36]
Martin F. Porter. 1980. An algorithm for suffix stripping. Program 14, 3 (1980), 130--137.
[37]
Philips Kokoh Prasetyo, David Lo, Palakorn Achananuparp, Yuan Tian, and Ee-Peng Lim. 2012. Automatic classification of software related microblogs. In Proceedings of the 28th IEEE International Conference on Software Maintenance (ICSM’12). IEEE, 596--599.
[38]
Adithya Rao, Nemanja Spasojevic, Zhisheng Li, and Trevor DSouza. 2015. Klout score: Measuring influence across multiple social networks. In 2015 IEEE International Conference on Big Data (Big Data’15). IEEE, 2282--2289.
[39]
Johnny Saldaña. 2015. The Coding Manual for Qualitative Researchers. Sage.
[40]
Gustavo Santos, Klérisson V. R. Paixão, Nicolas Anquetil, Anne Etien, Marcelo de Almeida Maia, and Stéphane Ducasse. 2017. Recommending source code locations for system specific transformations. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER’17). IEEE, 160--170.
[41]
Abhishek Sharma, Ferdian Thung, Pavneet Singh Kochhar, Agus Sulistya, and David Lo. 2017. Cataloging Github repositories. In Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering. ACM, 314--319.
[42]
Abhishek Sharma, Yuan Tian, and David Lo. 2015. NIRMAL: Automatic identification of software relevant tweets leveraging language model. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution and Reengineering (SANER’15). IEEE, 449--458.
[43]
Abhishek Sharma, Yuan Tian, and David Lo. 2015. What’s hot in software engineering Twitter space? In 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME’15). IEEE, 541--545.
[44]
Abhishek Sharma, Yuan Tian, Agus Sulistya, David Lo, and Aiko Fallas Yamashita. 2017. Harnessing Twitter to support serendipitous learning of developers. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER’17). IEEE, 387--391.
[45]
Leif Singer, Fernando Marques Figueira Filho, and Margaret-Anne D. Storey. 2014. Software engineering at the speed of light: How developers stay current using Twitter. In 36th International Conference on Software Engineering (ICSE’14). 211--221.
[46]
Edward K. Smith, Christian Bird, and Thomas Zimmermann. 2015. Build it yourself!: Homegrown tools in a large software company. In Proceedings of the 37th International Conference on Software Engineering - Volume 1. IEEE Press, 369--379.
[47]
Nemanja Spasojevic, Jinyun Yan, Adithya Rao, and Prantik Bhattacharyya. 2014. Lasta: Large scale topic assignment on multiple social networks. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1809--1818.
[48]
Margaret-Anne Storey, Leif Singer, Brendan Cleary, Fernando Figueira Filho, and Alexey Zagalsky. 2014. The (r) evolution of social media in software engineering. In Proceedings of the Future of Software Engineering. ACM, 100--116.
[49]
Margaret-Anne Storey, Alexey Zagalsky, Fernando Figueira Filho, Leif Singer, and Daniel M. German. 2017. How social and communication channels shape and challenge a participatory culture in software development. IEEE Transactions on Software Engineering 43, 2 (2017), 185--204.
[50]
Anselm Strauss and Juliet M. Corbin. 1997. Grounded Theory in Practice. Sage.
[51]
Yuan Tian, Palakorn Achananuparp, Ibrahim Nelman Lubis, David Lo, and Ee-Peng Lim. 2012. What does software engineering community microblog about? In MSR. 247--250.
[52]
Yuan Tian and David Lo. 2014. An exploratory study on software microblogger behaviors. In MUD.
[53]
Yuan Tian, David Lo, and Julia Lawall. 2014. Automated construction of a software-specific word similarity database. In 2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE’14). IEEE, 44--53.
[54]
Twitter. 2017. About Twitter Inc. Retrieved from https://about.twitter.com/company. Accessed July 26, 2017.
[55]
Gias Uddin and Foutse Khomh. 2017. Automatic summarization of API reviews. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17). IEEE, 159--170.
[56]
Harold Valdivia Garcia and Emad Shihab. 2014. Characterizing and predicting blocking bugs in open source projects. In Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 72--81.
[57]
Bogdan Vasilescu, Kelly Blincoe, Qi Xuan, Casey Casalnuovo, Daniela Damian, Premkumar Devanbu, and Vladimir Filkov. 2016. The sky is not the limit: Multitasking on GitHub projects. In International Conference on Software Engineering (ICSE’16). ACM, 994--1005. Retrieved from
[58]
Bogdan Vasilescu, Vladimir Filkov, and Alexander Serebrenik. 2013. StackOverflow and GitHub: Associations between software development and crowdsourced knowledge. In 2013 International Conference on Social Computing (SocialCom’13). IEEE, 188--195.
[59]
Anthony J. Viera and Joanne M. Garrett. 2005. Understanding interobserver agreement: The kappa statistic. Family Medicine 37, 5 (2005), 360--363.
[60]
Xiaofeng Wang, I. Kuzmickaja, K.-J. Stol, P. Abrahamsson, and B. Fitzgerald. 2014. Microblogging in open source software development: The case of Drupal and Twitter. IEEE Software 31, 4 (2014), 72--80.
[61]
Scott White and Padhraic Smyth. 2003. Algorithms for estimating relative importance in networks. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 266--275.
[62]
Xin Xia, David Lo, Ying Ding, Jafar M. Al-Kofahi, Tien N. Nguyen, and Xinyu Wang. 2017. Improving automated bug triaging with specialized topic model. IEEE Transactions on Software Engineering 43, 3 (2017), 272--297.
[63]
Xin Xia, David Lo, Emad Shihab, Xinyu Wang, and Xiaohu Yang. 2015. ELBlocker: Predicting blocking bugs with ensemble imbalance learning. Information and Software Technology 61 (2015), 93--106.
[64]
Xin Xia, David Lo, Xinyu Wang, and Bo Zhou. 2015. Dual analysis for recommending developers to resolve bugs. Journal of Software: Evolution and Process 27, 3 (2015), 195--220.
[65]
Yangyang Zhao, Alexander Serebrenik, Yuming Zhou, Vladimir Filkov, and Bogdan Vasilescu. 2017. The impact of continuous integration on other software development practices: A large-scale empirical study. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 60--71.
[66]
Pingyi Zhou, Jin Liu, Zijiang Yang, and Guangyou Zhou. 2017. Scalable tag recommendation for software information sites. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER’17). IEEE, 272--282.
[67]
Yu Zhou, Yanxiang Tong, Ruihang Gu, and Harald Gall. 2014. Combining text mining and data mining for bug report classification. In 2014 IEEE International Conference on Software Maintenance and Evolution (ICSME’14). IEEE, 311--320.

Cited By

View all
  • (2024)An empirical study of software ecosystem related tweets by npm maintainersPeerJ Computer Science10.7717/peerj-cs.166910(e1669)Online publication date: 17-Jan-2024
  • (2020)Review of Social Media Influence on Software DevelopmentMehran University Research Journal of Engineering and Technology10.22581/muet1982.2003.1539:3(603-611)Online publication date: 1-Jul-2020
  • (2019)A first look at unfollowing behavior on GitHubInformation and Software Technology10.1016/j.infsof.2018.08.012105(150-160)Online publication date: Jan-2019
  • Show More Cited By

Index Terms

  1. Recommending Who to Follow in the Software Engineering Twitter Space

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Software Engineering and Methodology
    ACM Transactions on Software Engineering and Methodology  Volume 27, Issue 4
    October 2018
    159 pages
    ISSN:1049-331X
    EISSN:1557-7392
    DOI:10.1145/3287303
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 October 2018
    Accepted: 01 July 2018
    Revised: 01 March 2018
    Received: 01 June 2016
    Published in TOSEM Volume 27, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Twitter
    2. recommendation systems
    3. software engineering

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • National Research Foundation, Prime Minister's Office, Singapore under its International Research Centres in Singapore Funding Initiative

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)An empirical study of software ecosystem related tweets by npm maintainersPeerJ Computer Science10.7717/peerj-cs.166910(e1669)Online publication date: 17-Jan-2024
    • (2020)Review of Social Media Influence on Software DevelopmentMehran University Research Journal of Engineering and Technology10.22581/muet1982.2003.1539:3(603-611)Online publication date: 1-Jul-2020
    • (2019)A first look at unfollowing behavior on GitHubInformation and Software Technology10.1016/j.infsof.2018.08.012105(150-160)Online publication date: Jan-2019
    • (2019)SIEVE: Helping developers sift wheat from chaff via cross-platform analysisEmpirical Software Engineering10.1007/s10664-019-09775-w25:1(996-1030)Online publication date: 2-Oct-2019

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media