research-article

Test report prioritization to assist crowdsourced testing

Authors:

James A. Jones,

Baowen XuAuthors Info & Claims

ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

Pages 225 - 236

https://doi.org/10.1145/2786805.2786862

Published: 30 August 2015 Publication History

Abstract

In crowdsourced testing, users can be incentivized to perform testing tasks and report their results, and because crowdsourced workers are often paid per task, there is a financial incentive to complete tasks quickly rather than well. These reports of the crowdsourced testing tasks are called "test reports" and are composed of simple natural language and screenshots. Back at the software-development organization, developers must manually inspect the test reports to judge their value for revealing faults. Due to the nature of crowdsourced work, the number of test reports are often difficult to comprehensively inspect and process. In order to help with this daunting task, we created the first technique of its kind, to the best of our knowledge, to prioritize test reports for manual inspection. Our technique utilizes two key strategies: (1) a diversity strategy to help developers inspect a wide variety of test reports and to avoid duplicates and wasted effort on falsely classified faulty behavior, and (2) a risk strategy to help developers identify test reports that may be more likely to be fault-revealing based on past observations. Together, these strategies form our DivRisk strategy to prioritize test reports in crowd- sourced testing. Three industrial projects have been used to evaluate the effectiveness of test report prioritization methods. The results of the empirical study show that: (1) DivRisk can significantly outperform random prioritization; (2) DivRisk can approximate the best theoretical result for a real-world industrial mobile application. In addition, we provide some practical guidelines of test report prioritization for crowdsourced testing based on the empirical study and our experiences.

References

[1]

P. Awasthi, D. Rao, and B. Ravindran. Part of speech tagging and chunking with hmm and crf. Proceedings of NLP Association of India Machine Learning Contest, 2006.

[2]

J. F. Bowring, J. M. Rehg, and M. J. Harrold. Active learning for automatic classification of software behavior. ACM SIGSOFT Software Engineering Notes, 29(4):195–205, 2004.

Digital Library

[3]

P. F. Brown, P. V. Desouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467–479, 1992.

Digital Library

[4]

N. Chen and S. Kim. Puzzle-based automatic testing: bringing humans into the loop by solving puzzles. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pages 140–149. ACM, 2012.

Digital Library

[5]

T. Y. Chen, F.-C. Kuo, R. G. Merkel, and T. Tse. Adaptive random testing: The art of test case diversity. Journal of Systems and Software, 83(1):60–66, 2010.

Digital Library

[6]

W. Chen, Z. Li, and T. Liu. Ltp: A chinese language technology platform. In Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, pages 13–16. Association for Computational Linguistics, 2010.

Digital Library

[7]

Z. Chen and B. Luo. Quasi-crowdsourcing testing for educational projects. In Proceedings of the 36th International Conference on Software Engineering, ICSE Companion, pages 272–275. ACM, 2014.

Digital Library

[8]

Z. Chen, J. Zhang, and B. Luo. Teaching software testing methods based on diversity principles. In Proceedings of the 24th IEEE-CS Conference on Software Engineering Education and Training, pages 391–395. IEEE Computer Society, 2011.

Digital Library

[9]

T. Dhaliwal, F. Khomh, and Y. Zou. Classifying field crash reports for fixing bugs: A case study of mozilla firefox. In Proceeding of the 2011 IEEE International Conference on Software Maintenance, pages 333–342. IEEE, 2011.

Digital Library

[10]

W. Dickinson, D. Leon, and A. Podgurski. Pursuing failure: the distribution of program failures in a profile space. 26(5):246–255, 2001.

Digital Library

[11]

E. Dolstra, R. Vliegendhart, and J. Pouwelse. Crowdsourcing GUI tests. In Proceedings of the IEEE 6th International Conference on Software Testing, Verification and Validation, pages 332–341. IEEE, 2013.

Digital Library

[12]

S. Elbaum, A. G. Malishevsky, and G. Rothermel. Test case prioritization: A family of empirical studies. Software Engineering, IEEE Transactions on, 28(2):159–182, 2002.

Digital Library

[13]

E. Estellés-Arolas and F. González-Ladrón-de Guevara. Towards an integrated crowdsourcing definition. Journal of Information science, 38(2):189–200, 2012.

Digital Library

[14]

C. Fang, Z. Chen, K. Wu, and Z. Zhao. Similarity-based test case prioritization using ordered sequences of program entities. Software Quality Journal, 22(2):335–361, 2014.

Digital Library

[15]

C. Fang, Z. Chen, and B. Xu. Comparing logic coverage criteria on test case prioritization. SCIENCE CHINA Information Sciences, 55(12):2826–2840, 2012.

[16]

S. Foo and H. Li. Chinese word segmentation and its effect on information retrieval. Information processing & management, 40(1):161––190, 2004.

Digital Library

[17]

H. Hemmati, A. Arcuri, and L. Briand. Achieving scalable model-based testing through test case diversity. ACM Transactions on Software Engineering and Methodology (TOSEM), 22(1):6, 2013.

Digital Library

[18]

B. Jiang, Z. Zhang, W. K. Chan, and T. Tse. Adaptive random test case prioritization. In Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering, pages 233–244. IEEE, 2009.

Digital Library

[19]

D. Jurafsky and H. James. Speech and language processing an introduction to natural language processing, computational linguistics, and speech. Pearson Education, 2000.

Digital Library

[20]

A. Kao and S. R. Poteet. Natural language processing and text mining. Springer, 2007.

Digital Library

[21]

M. Lease and E. Yilmaz. Crowdsourcing for information retrieval. In ACM SIGIR Forum, volume 45, pages 66–75. ACM, 2012.

Digital Library

[22]

Y. Ledru, A. Petrenko, and S. Boroday. Using string distances for test case prioritisation. In Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering, pages 510–514. IEEE, 2009.

Digital Library

[23]

D. Liu, R. G. Bias, M. Lease, and R. Kuipers. Crowdsourcing for usability testing. American Society for Information Science and Technology, 49(1):1–10, 2012.

[24]

D. Lo, H. Cheng, J. Han, S.-C. Khoo, and C. Sun. Classification of software behaviors for failure detection: a discriminative pattern mining approach. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 557–566. ACM, 2009.

Digital Library

[25]

K. Mao, Y. Yang, M. Li, and M. Harman. Pricing crowdsourcing-based software development tasks. In Proceedings of the 35th International Conference on Software Engineering, pages 1205–1208, 2013.

Digital Library

[26]

G. A. Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995.

Digital Library

[27]

D. Mondal, H. Hemmati, and S. Durocher. Exploring test suite diversification and code coverage in multi-objective test case selection. In Software Testing, Verification and Validation (ICST), 2015 IEEE 8th International Conference on, pages 1–10. IEEE, 2015.

[28]

M. Nebeling, M. Speicher, M. Grossniklaus, and M. C. Norrie. Crowdsourced web site evaluation with crowdstudy. Springer, 2012.

Digital Library

[29]

F. Pastore, L. Mariani, and G. Fraser. Crowdoracles: Can the crowd solve the oracle problem? In Proceedings of the IEEE 6th International Conference on Software Testing, Verificationand Validation, pages 342–351. IEEE, 2013.

Digital Library

[30]

A. Podgurski, D. Leon, P. Francis, W. Masri, M. Minch, J. Sun, and B. Wang. Automated support for classifying software failure reports. In Proceedings of the 25th International Conference on Software Engineering, pages 465–475. IEEE, 2003.

Digital Library

[31]

A.-M. Popescu and O. Etzioni. Extracting product features and opinions from reviews. In Natural language processing and text mining, pages 9–28. Springer, 2007.

[32]

G. Rothermel, M. J. Harrold, J. Ostrin, and C. Hong. An empirical study of the effects of minimization on the fault detection capabilities of test suites. In Proceedings of the 1998 International Conference on Software Maintenance, pages 34–43. IEEE, 1998.

Digital Library

[33]

G. Rothermel, R. Untch, C. Chu, and M. Harrold. Test case prioritization: an empirical study. In Proceedings of the International Conference on Software Maintenance, pages 179–188, Aug 1999.

Digital Library

[34]

G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold. Prioritizing test cases for regression testing. IEEE Transactions on Software Engineering, 27(10):929–948, 2001.

Digital Library

[35]

P. Runeson, M. Alexandersson, and O. Nyholm. Detection of duplicate defect reports using natural language processing. In Proceedings of the 29th International Conference on Software Engineering, pages 499–510. IEEE, 2007.

Digital Library

[36]

I. Salman, A. T. Misirli, and N. Juristo. Are students representatives of professionals in software engineering experiments? In Proceedings of the 37th International Conference on Software Engineering. ACM, 2015.

Digital Library

[37]

Y.-H. Tung and S.-S. Tseng. A novel approach to collaborative testing in a crowdsourcing environment. Journal of Systems and Software, 86(8):2143–2153, 2013.

Digital Library

[38]

X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun. An approach to detecting duplicate bug reports using natural language and execution information. In Proceedings of the 30th international conference on Software engineering, pages 461–470. ACM, 2008.

Digital Library

[39]

W. Wong, J. Horgan, S. London, and H. Agrawal. A study of effective regression testing in practice. In Proceedings of the International Symposium on Software Reliability Engineering, pages 264–274, Nov 1997.

Digital Library

[40]

S. Yan, Z. Chen, Z. Zhao, C. Zhang, and Y. Zhou. A dynamic test cluster sampling strategy by leveraging execution spectra information. In Proceedings of the 3rd International Conference on Software Testing, Verification and Validation, pages 147–154. IEEE, 2010.

Digital Library

[41]

S. Yoo and M. Harman. Regression testing minimization, selection and prioritization: a survey. Software Testing, Verification and Reliability, 22(2):67–120, 2012.

[42]

S. Yoo, M. Harman, P. Tonella, and A. Susi. Clustering test cases to achieve effective and scalable prioritisation incorporating expert knowledge. In Proceedings of the eighteenth international symposium on Software testing and analysis, pages 201–212. ACM, 2009.

Digital Library

[43]

K. Zhang, H. Xu, J. Tang, and J. Li. Keyword extraction using support vector machine. In Advances in Web-Age Information Management, pages 85–96. Springer, 2006.

Cited By

Fang CYu SZhang QLi XLiu YChen Z(2025)Enhanced Crowdsourced Test Report Prioritization via Image-and-Text Semantic Understanding and Feature IntegrationIEEE Transactions on Software Engineering10.1109/TSE.2024.351637251:1(283-304)Online publication date: Jan-2025
https://doi.org/10.1109/TSE.2024.3516372
Li YZhong YYang LWang YZhu P(2025)LLM-Guided Crowdsourced Test Report ClusteringIEEE Access10.1109/ACCESS.2025.353096013(24894-24904)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3530960
Ling YYu SFang CPan GWang JLiu J(2025)Redefining crowdsourced test report prioritizationInformation and Software Technology10.1016/j.infsof.2024.107629179:COnline publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1016/j.infsof.2024.107629
Show More Cited By

Index Terms

Test report prioritization to assist crowdsourced testing
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Multi-objective test report prioritization using image understanding
ASE '16: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering

In crowdsourced software testing, inspecting the large number of test reports is an overwhelming but inevitable software maintenance task. In recent years, to alleviate this task, many text-based test-report classification and prioritization techniques ...
Redefining crowdsourced test report prioritization: An innovative approach with large language model
Abstract Context:
Crowdsourced testing has gained popularity in software testing, especially for mobile app testing, due to its ability to bring diversity and tackle fragmentation issues. However, the openness of crowdsourced testing presents challenges, ...
Clustering and Prioritization of Web Crowdsourced Test Reports Based on Text Classification

In crowdsourced testing, prioritizing numerous test reports is critical for improving developer review efficiency. Many researchers have proposed methods for prioritizing crowdsourced test reports for mobile applications, however, web crowdsourced test ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

August 2015

1068 pages

ISBN:9781450336758

DOI:10.1145/2786805

General Chair:
Elisabetta Di Nitto
Politecnico di Milano, Italy
,
Program Chairs:
Mark Harman
University College London, UK
,
Patrick Heymans
University of Namur, Belgium

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 August 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE'15

Sponsor:

SIGSOFT

ESEC/FSE'15: Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering

August 30 - September 4, 2015

Bergamo, Italy

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

51
Total Citations
View Citations
646
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fang CYu SZhang QLi XLiu YChen Z(2025)Enhanced Crowdsourced Test Report Prioritization via Image-and-Text Semantic Understanding and Feature IntegrationIEEE Transactions on Software Engineering10.1109/TSE.2024.351637251:1(283-304)Online publication date: Jan-2025
https://doi.org/10.1109/TSE.2024.3516372
Li YZhong YYang LWang YZhu P(2025)LLM-Guided Crowdsourced Test Report ClusteringIEEE Access10.1109/ACCESS.2025.353096013(24894-24904)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3530960
Ling YYu SFang CPan GWang JLiu J(2025)Redefining crowdsourced test report prioritizationInformation and Software Technology10.1016/j.infsof.2024.107629179:COnline publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1016/j.infsof.2024.107629
Elgendy IHierons RMcMinn P(2025)A Systematic Mapping Study of the Metrics, Uses and Subjects of Diversity‐Based Testing TechniquesSoftware Testing, Verification and Reliability10.1002/stvr.191435:2Online publication date: 17-Jan-2025
https://doi.org/10.1002/stvr.1914
Liao JXiao LXie XZhou XLi Y(2024)Clustering and Prioritization of Web Crowdsourced Test Reports Based on Text ClassificationInternational Journal of Web Services Research10.4018/IJWSR.35799921:1(1-19)Online publication date: 7-Nov-2024
https://dl.acm.org/doi/10.4018/IJWSR.357999
Xiao LWu YChen RShi H(2024)Optimizing Prioritization of Crowdsourced Test Reports of Web Applications through Image-to-Text ConversionSymmetry10.3390/sym1601008016:1(80)Online publication date: 8-Jan-2024
https://doi.org/10.3390/sym16010080
Yu SFang CLi XLing YChen ZSu Z(2024)Effective, Platform-Independent GUI Testing via Image Embedding and Reinforcement LearningACM Transactions on Software Engineering and Methodology10.1145/367472833:7(1-27)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1145/3674728
Yu SFang CZhang QDu MLiu JChen Z(2024)Semi-supervised Crowdsourced Test Report Clustering via Screenshot-Text Binding RulesProceedings of the ACM on Software Engineering10.1145/36607761:FSE(1540-1563)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660776
Li YXiao LZhuang WXie XZhang J(2024)Prioritization of Crowdsourced Test Reports Based on Defect Severity and Frequency Weighting2024 11th International Conference on Dependable Systems and Their Applications (DSA)10.1109/DSA63982.2024.00029(150-160)Online publication date: 2-Nov-2024
https://doi.org/10.1109/DSA63982.2024.00029
Wu YLin CLiu AZhao LZhang X(2024)Crowdsourced bug report severity prediction based on text and image understanding via heterogeneous graph convolutional networksJournal of Software: Evolution and Process10.1002/smr.2705Online publication date: 27-Jun-2024
https://doi.org/10.1002/smr.2705
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten