Abstract
On Stack Overflow, users reuse 11,926,354 external links to share the resources hosted outside the Stack Overflow website. The external links connect to the existing programming-related knowledge and extend the crowdsourced knowledge on Stack Overflow. Some of the external links, so-called as repeated external links, can be shared for multiple times. We observe that 82.5% of the link sharing activities (i.e., sharing links in any question, answer, or comment) on Stack Overflow share external resources, and 57.0% of the occurrences of the external links are sharing the repeated external links. However, it is still unclear what types of external resources are repeatedly shared. To help users manage their knowledge, we wish to investigate the characteristics of the repeated external links in knowledge sharing on Stack Overflow. In this paper, we analyze the repeated external links on Stack Overflow. We observe that external links that point to the text resources (hosted in documentation websites, tutorial websites, etc.) are repeatedly shared the most. We observe that different users repeatedly share the same knowledge in the form of repeated external links, thus increasing the maintenance effort of knowledge (e.g., update invalid links in multiple posts). The repeated external links can bring risks to the software engineering process, as 1) the same users can repeatedly share the external links for the purpose of promotion, and 2) external links can point to webpages with an overload of information that makes it difficult for users to retrieve relevant information. Our findings provide insights to Stack Overflow moderators and researchers. For example, we encourage Stack Overflow to centrally manage the commonly occurring knowledge in the form of repeated external links in order to better maintain the crowdsourced knowledge on Stack Overflow.
Similar content being viewed by others
Notes
For example, we consider docs.oracle.com and www.oracle.com are different websites because they have different full domains.
References
An L, Mlouki O, Khomh F, Antoniol G (2017) Stack overflow: a code laundering platform?, IEEE
Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2012) Discovering value from community activity on focused question answering sites: A case study of Stack Overflow. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12. ACM, New York, pp 850–858
Bajaj K, Pattabiraman K, Mesbah A (2014) Mining questions asked by web developers. In: Proceedings of the 11th working conference on mining software repositories, MSR 2014. ACM, New York, pp 112–121
Baltes S, Dumani L, Treude C, Diehl S (2018) Sotorrent: reconstructing and analyzing the evolution of Stack Overflow posts. In: Proceedings of the 15th international conference on mining software repositories, MSR 2018. Gothenburg, Sweden, May 28-29, 2018, pp 319–330
Baltes S, Treude C, Diehl S (2019) Sotorrent: Studying the origin, evolution, and usage of Stack Overflow code snippets. In: 2019 IEEE/ACM 16th international conference on mining software repositories, MSR. IEEE, pp 191–194
Baltes S, Treude C, Robillard MP (2020) Contextual documentation referencing on Stack Overflow. IEEE Trans Softw Eng
Barua A, Thomas SW, Hassan AE (2014) What are developers talking about? an analysis of topics and trends in Stack Overflow. Empir Softw Eng 19 (3):619–654
Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Noise reduction in speech processing. Springer, pp 1–4
Berners-Lee T, Fielding R, Masinter L (2005) Rfc 3986, uniform resource identifier (uri): Generic syntax, 2005. http://www.faqs.org/rfcs/rfc3986.html
Cai L, Wang H, Huang Q, Xia X, Xing Z, Lo D (2019) Biker: a tool for bi-information source based api method recommendation. In: Dumas M, Pfahl D, Apel S, Russo A (eds) Proceedings of the 2019 27th ACM joint meeting - european software engineering conference and symposium on the foundations of software engineering. Association for Computing Machinery, pp 1075–1079
Cavusoglu H, Li Z, Huang KW (2015) Can gamification motivate voluntary contributions?: The case of stackoverflow q&a community. In: Proceedings of the 18th ACM conference companion on computer supported cooperative work & social computing, CSCW’15. ACM, New York Companion, pp 171–174
Chen C, Xing Z, Liu Y (2017) By the community & for the community: a deep learning approach to assist collaborative editing in q&a sites. ACM Proc Human-Comput Interact 1(CSCW). https://doi.org/10.1145/3134667
Chen C, Chen X, Sun J, Xing Z, Li G (2018) Data-driven proactive policy assurance of post quality in community q&a sites. Proc ACM Hum-Comput Interact 2(CSCW):33:1–33:22
Chen F, Kim S (2015) Crowd debugging. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ACM, pp 320–332
Chen M, Fischer F, Meng N, Wang X, Grossklags J (2019) How reliable is the crowdsourced knowledge of security implementation?. In: Proceedings of the 41st international conference on software engineering, pp 536–547
Correa D, Sureka A (2013) Integrating issue tracking systems with community-based question and answering websites. In: 2013 22nd Australian software engineering conference. IEEE, pp 88–96
Dang V, Croft BW (2010) Query reformulation using anchor text. In: Proceedings of the third ACM international conference on web search and data mining, WSDM ’10. Association for Computing Machinery, New York, p 41–50. https://doi.org/10.1145/1718487.1718493
Gao S, Xing Z, Ma Y, Ye D, Lin S (2017) Enhancing knowledge sharing in Stack Overflow via automatic external web resources linking. In: 2017 22nd international conference on engineering of complex computer systems, pp 90–99
Gómez C, Cleary B, Singer L (2013) A study of innovation diffusion through link sharing on Stack Overflow. In: Proceedings of the 10th Working Conference on Mining Software Repositories, IEEE Press
Hanrahan BV, Convertino G, Nelson L (2012) Modeling problem difficulty and expertise in stackoverflow. In: Proceedings of the ACM 2012 conference on computer supported cooperative work companion, CSCW ’12. ACM, New York, pp 91–94
Huang Q, Xia X, Xing Z, Lo D, Wang X (2018) Api method recommendation without worrying about the task-api knowledge gap. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, ASE 2018. ACM, New York, pp 293–304
Li G, Zhu H, Lu T, Ding X, Gu N (2015) Is it good to be like wikipedia?: Exploring the trade-offs of introducing collaborative editing model to q&a sites. In: Conference on computer supported cooperative work, pp 1080–1091
Linares-Vásquez M, Bavota G, Di Penta M, Oliveto R, Poshyvanyk D (2014) How do api changes trigger Stack Overflow discussions? a study on the android sdk. In: Proceedings of the 22nd international conference on program comprehension, ICPC 2014. ACM, New York, pp 83–94
Liu J, Xia X, Lo D, Zhang H, Zou Y, Hassan AE, Li S (2020) Broken external links on Stack Overflow. arXiv:201004892
Liu J, Xia X, Lo D, Li S (2021) Characterizing and predicting fragile links on Stack Overflow. submitted to EMSE journal
MacLeod L, Storey MA, Bergen A (2015) Code, camera, action: How software developers document and share program knowledge using youtube. In: Proceedings of the 2015 IEEE 23rd international conference on program comprehension. IEEE Press, pp 104–114
Mamykina L, Manoim B, Mittal M, Hripcsak G, Hartmann B (2011) Design lessons from the fastest q&a site in the west. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’11. ACM, New York, pp 2857–2866
Newman M (2005) Power laws, pareto distributions and zipf’s law. Contemp Phys 46(5):323–351. https://doi.org/10.1080/00107510500052444
Pal A, Chang S, Konstan JA (2012) Evolution of experts in question answering communities. In: Sixth international AAAI conference on weblogs and social media
Ponzanelli L, Bavota G, Mocci A, Di Penta M, Oliveto R, Hasan M, Russo B, Haiduc S, Lanza M (2016a) Too long; didn’t watch!: Extracting relevant fragments from software development video tutorials. In: Proceedings of the 38th international conference on software engineering, ICSE ’16. ACM, New York, pp 261–272
Ponzanelli L, Bavota G, Mocci A, Di Penta M, Oliveto R, Russo B, Haiduc S, Lanza M (2016b) Codetube: Extracting relevant fragments from software development video tutorials. In: Proceedings of the 38th international conference on software engineering companionICSE ’16. ACM, New York, pp 645–648
Ragkhitwetsagul C, Krinke J, Paixão M, Bianco G, Oliveto R (2018) Toxic code snippets on Stack Overflow. arXiv:1806.07659
Rahman MM, Yeasmin S, Roy CK (2014) Towards a context-aware ide-based meta search engine for recommendation about programming errors and exceptions. In: 2014 software evolution week-ieee conference on software maintenance, reengineering, and reverse engineering. IEEE, pp 194–203
Rath M, Rendall J, Guo JL, Cleland-Huang J, Mäder P (2018) Traceability in the wild: automatically augmenting incomplete trace links. In: 2018 IEEE/ACM 40th international conference on software engineering. IEEE
Rosen C, Shihab E (2016) What are mobile developers asking about? a large scale study using Stack Overflow. Empir Softw Eng 21(3):1192–1223
Saha RK, Saha AK, Perry DE (2013) Toward understanding the causes of unanswered questions in software information sites: A case study of Stack Overflow. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering, ESEC/FSE 2013. ACM, New York. https://doi.org/10.1145/2491411.2494585, pp 663–666
Spencer D (2009) Card sorting: Designing usable categories. Rosenfeld Media, New York
Viera AJ, Garrett JM (2005) Understanding interobserver agreement: The kappa statistic. Fam Med 37(5):360–363
Wang S, Lo D, Vasilescu B, Serebrenik A (2014) Entagrec: An enhanced tag recommendation system for software information sites. In: 2014 IEEE international conference on software maintenance and evolution. IEEE, pp 291–300
Wang S, Chen THP, Hassan AE (2018) How do users revise answers on technical q&a websites? a case study on Stack Overflow. IEEE Trans Softw Eng
Wang T, Yin G, Wang H, Yang C, Zou P (2015) Automatic knowledge sharing across communities: a case study on android issue tracker and Stack Overflow. In: 2015 IEEE symposium on service-oriented system engineering. IEEE, pp 107–116
Wu Y, Wang S, Bezemer CP, Inoue K (2019) How do developers utilize source code from Stack Overflow? Empir Softw Eng 24(2):637–673
Xia X, Bao L, Lo D, Kochhar PS, Hassan AE, Xing Z (2017) What do developers search for on the web? Empir Softw Eng 22(6):3149–3185
Xu B, Ye D, Xing Z, Xia X, Chen G, Li S (2016) Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, ASE 2016. ACM, New York, pp 51–62
Ye D, Xing Z, Kapre N (2017) The structure and dynamics of knowledge network in domain-specific q&a sites: a case study of Stack Overflow. Empir Softw Eng 22(1):375–406
Zhang H, Wang S, Chen T, Hassan AE (2019a) Reading answers on Stack Overflow: Not enough! IEEE Trans Softw Eng :1–1
Zhang H, Wang S, Chen TP, Zou Y, Hassan AE (2019b), An empirical study of obsolete answers on Stack Overflow. IEEE Trans Softw Eng. https://doi.org/10.1109/TSE.2019.2906315
Acknowledgments
This research was partially supported by the National Science Foundation of China (No. U20A20173), Key Research and Development Program of Zhejiang Province (No.2021C01014), and the National Research Foundation, Singapore under its Industry Alignment Fund – Prepositioning (IAF-PP) Funding Initiative. Any opinions, findings, and conclusions, or recommendations expressed in this material are those of the author(s) and do not reflect the views of Huawei and the National Research Foundation, Singapore.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Emerson Murphy-Hill
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, J., Zhang, H., Xia, X. et al. An exploratory study on the repeatedly shared external links on Stack Overflow. Empir Software Eng 27, 19 (2022). https://doi.org/10.1007/s10664-021-10028-y
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-021-10028-y