Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2786805.2786862acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Test report prioritization to assist crowdsourced testing

Published: 30 August 2015 Publication History

Abstract

In crowdsourced testing, users can be incentivized to perform testing tasks and report their results, and because crowdsourced workers are often paid per task, there is a financial incentive to complete tasks quickly rather than well. These reports of the crowdsourced testing tasks are called "test reports" and are composed of simple natural language and screenshots. Back at the software-development organization, developers must manually inspect the test reports to judge their value for revealing faults. Due to the nature of crowdsourced work, the number of test reports are often difficult to comprehensively inspect and process. In order to help with this daunting task, we created the first technique of its kind, to the best of our knowledge, to prioritize test reports for manual inspection. Our technique utilizes two key strategies: (1) a diversity strategy to help developers inspect a wide variety of test reports and to avoid duplicates and wasted effort on falsely classified faulty behavior, and (2) a risk strategy to help developers identify test reports that may be more likely to be fault-revealing based on past observations. Together, these strategies form our DivRisk strategy to prioritize test reports in crowd- sourced testing. Three industrial projects have been used to evaluate the effectiveness of test report prioritization methods. The results of the empirical study show that: (1) DivRisk can significantly outperform random prioritization; (2) DivRisk can approximate the best theoretical result for a real-world industrial mobile application. In addition, we provide some practical guidelines of test report prioritization for crowdsourced testing based on the empirical study and our experiences.

References

[1]
P. Awasthi, D. Rao, and B. Ravindran. Part of speech tagging and chunking with hmm and crf. Proceedings of NLP Association of India Machine Learning Contest, 2006.
[2]
J. F. Bowring, J. M. Rehg, and M. J. Harrold. Active learning for automatic classification of software behavior. ACM SIGSOFT Software Engineering Notes, 29(4):195–205, 2004.
[3]
P. F. Brown, P. V. Desouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467–479, 1992.
[4]
N. Chen and S. Kim. Puzzle-based automatic testing: bringing humans into the loop by solving puzzles. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pages 140–149. ACM, 2012.
[5]
T. Y. Chen, F.-C. Kuo, R. G. Merkel, and T. Tse. Adaptive random testing: The art of test case diversity. Journal of Systems and Software, 83(1):60–66, 2010.
[6]
W. Chen, Z. Li, and T. Liu. Ltp: A chinese language technology platform. In Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, pages 13–16. Association for Computational Linguistics, 2010.
[7]
Z. Chen and B. Luo. Quasi-crowdsourcing testing for educational projects. In Proceedings of the 36th International Conference on Software Engineering, ICSE Companion, pages 272–275. ACM, 2014.
[8]
Z. Chen, J. Zhang, and B. Luo. Teaching software testing methods based on diversity principles. In Proceedings of the 24th IEEE-CS Conference on Software Engineering Education and Training, pages 391–395. IEEE Computer Society, 2011.
[9]
T. Dhaliwal, F. Khomh, and Y. Zou. Classifying field crash reports for fixing bugs: A case study of mozilla firefox. In Proceeding of the 2011 IEEE International Conference on Software Maintenance, pages 333–342. IEEE, 2011.
[10]
W. Dickinson, D. Leon, and A. Podgurski. Pursuing failure: the distribution of program failures in a profile space. 26(5):246–255, 2001.
[11]
E. Dolstra, R. Vliegendhart, and J. Pouwelse. Crowdsourcing GUI tests. In Proceedings of the IEEE 6th International Conference on Software Testing, Verification and Validation, pages 332–341. IEEE, 2013.
[12]
S. Elbaum, A. G. Malishevsky, and G. Rothermel. Test case prioritization: A family of empirical studies. Software Engineering, IEEE Transactions on, 28(2):159–182, 2002.
[13]
E. Estellés-Arolas and F. González-Ladrón-de Guevara. Towards an integrated crowdsourcing definition. Journal of Information science, 38(2):189–200, 2012.
[14]
C. Fang, Z. Chen, K. Wu, and Z. Zhao. Similarity-based test case prioritization using ordered sequences of program entities. Software Quality Journal, 22(2):335–361, 2014.
[15]
C. Fang, Z. Chen, and B. Xu. Comparing logic coverage criteria on test case prioritization. SCIENCE CHINA Information Sciences, 55(12):2826–2840, 2012.
[16]
S. Foo and H. Li. Chinese word segmentation and its effect on information retrieval. Information processing & management, 40(1):161––190, 2004.
[17]
H. Hemmati, A. Arcuri, and L. Briand. Achieving scalable model-based testing through test case diversity. ACM Transactions on Software Engineering and Methodology (TOSEM), 22(1):6, 2013.
[18]
B. Jiang, Z. Zhang, W. K. Chan, and T. Tse. Adaptive random test case prioritization. In Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering, pages 233–244. IEEE, 2009.
[19]
D. Jurafsky and H. James. Speech and language processing an introduction to natural language processing, computational linguistics, and speech. Pearson Education, 2000.
[20]
A. Kao and S. R. Poteet. Natural language processing and text mining. Springer, 2007.
[21]
M. Lease and E. Yilmaz. Crowdsourcing for information retrieval. In ACM SIGIR Forum, volume 45, pages 66–75. ACM, 2012.
[22]
Y. Ledru, A. Petrenko, and S. Boroday. Using string distances for test case prioritisation. In Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering, pages 510–514. IEEE, 2009.
[23]
D. Liu, R. G. Bias, M. Lease, and R. Kuipers. Crowdsourcing for usability testing. American Society for Information Science and Technology, 49(1):1–10, 2012.
[24]
D. Lo, H. Cheng, J. Han, S.-C. Khoo, and C. Sun. Classification of software behaviors for failure detection: a discriminative pattern mining approach. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 557–566. ACM, 2009.
[25]
K. Mao, Y. Yang, M. Li, and M. Harman. Pricing crowdsourcing-based software development tasks. In Proceedings of the 35th International Conference on Software Engineering, pages 1205–1208, 2013.
[26]
G. A. Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995.
[27]
D. Mondal, H. Hemmati, and S. Durocher. Exploring test suite diversification and code coverage in multi-objective test case selection. In Software Testing, Verification and Validation (ICST), 2015 IEEE 8th International Conference on, pages 1–10. IEEE, 2015.
[28]
M. Nebeling, M. Speicher, M. Grossniklaus, and M. C. Norrie. Crowdsourced web site evaluation with crowdstudy. Springer, 2012.
[29]
F. Pastore, L. Mariani, and G. Fraser. Crowdoracles: Can the crowd solve the oracle problem? In Proceedings of the IEEE 6th International Conference on Software Testing, Verificationand Validation, pages 342–351. IEEE, 2013.
[30]
A. Podgurski, D. Leon, P. Francis, W. Masri, M. Minch, J. Sun, and B. Wang. Automated support for classifying software failure reports. In Proceedings of the 25th International Conference on Software Engineering, pages 465–475. IEEE, 2003.
[31]
A.-M. Popescu and O. Etzioni. Extracting product features and opinions from reviews. In Natural language processing and text mining, pages 9–28. Springer, 2007.
[32]
G. Rothermel, M. J. Harrold, J. Ostrin, and C. Hong. An empirical study of the effects of minimization on the fault detection capabilities of test suites. In Proceedings of the 1998 International Conference on Software Maintenance, pages 34–43. IEEE, 1998.
[33]
G. Rothermel, R. Untch, C. Chu, and M. Harrold. Test case prioritization: an empirical study. In Proceedings of the International Conference on Software Maintenance, pages 179–188, Aug 1999.
[34]
G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold. Prioritizing test cases for regression testing. IEEE Transactions on Software Engineering, 27(10):929–948, 2001.
[35]
P. Runeson, M. Alexandersson, and O. Nyholm. Detection of duplicate defect reports using natural language processing. In Proceedings of the 29th International Conference on Software Engineering, pages 499–510. IEEE, 2007.
[36]
I. Salman, A. T. Misirli, and N. Juristo. Are students representatives of professionals in software engineering experiments? In Proceedings of the 37th International Conference on Software Engineering. ACM, 2015.
[37]
Y.-H. Tung and S.-S. Tseng. A novel approach to collaborative testing in a crowdsourcing environment. Journal of Systems and Software, 86(8):2143–2153, 2013.
[38]
X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun. An approach to detecting duplicate bug reports using natural language and execution information. In Proceedings of the 30th international conference on Software engineering, pages 461–470. ACM, 2008.
[39]
W. Wong, J. Horgan, S. London, and H. Agrawal. A study of effective regression testing in practice. In Proceedings of the International Symposium on Software Reliability Engineering, pages 264–274, Nov 1997.
[40]
S. Yan, Z. Chen, Z. Zhao, C. Zhang, and Y. Zhou. A dynamic test cluster sampling strategy by leveraging execution spectra information. In Proceedings of the 3rd International Conference on Software Testing, Verification and Validation, pages 147–154. IEEE, 2010.
[41]
S. Yoo and M. Harman. Regression testing minimization, selection and prioritization: a survey. Software Testing, Verification and Reliability, 22(2):67–120, 2012.
[42]
S. Yoo, M. Harman, P. Tonella, and A. Susi. Clustering test cases to achieve effective and scalable prioritisation incorporating expert knowledge. In Proceedings of the eighteenth international symposium on Software testing and analysis, pages 201–212. ACM, 2009.
[43]
K. Zhang, H. Xu, J. Tang, and J. Li. Keyword extraction using support vector machine. In Advances in Web-Age Information Management, pages 85–96. Springer, 2006.

Cited By

View all
  • (2024)Clustering and Prioritization of Web Crowdsourced Test Reports Based on Text ClassificationInternational Journal of Web Services Research10.4018/IJWSR.35799921:1(1-19)Online publication date: 7-Nov-2024
  • (2024)Optimizing Prioritization of Crowdsourced Test Reports of Web Applications through Image-to-Text ConversionSymmetry10.3390/sym1601008016:1(80)Online publication date: 8-Jan-2024
  • (2024)Effective, Platform-Independent GUI Testing via Image Embedding and Reinforcement LearningACM Transactions on Software Engineering and Methodology10.1145/367472833:7(1-27)Online publication date: 21-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering
August 2015
1068 pages
ISBN:9781450336758
DOI:10.1145/2786805
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 August 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Crowdsourcing testing
  2. natural language processing
  3. test diversity
  4. test report prioritization

Qualifiers

  • Research-article

Conference

ESEC/FSE'15
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)4
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Clustering and Prioritization of Web Crowdsourced Test Reports Based on Text ClassificationInternational Journal of Web Services Research10.4018/IJWSR.35799921:1(1-19)Online publication date: 7-Nov-2024
  • (2024)Optimizing Prioritization of Crowdsourced Test Reports of Web Applications through Image-to-Text ConversionSymmetry10.3390/sym1601008016:1(80)Online publication date: 8-Jan-2024
  • (2024)Effective, Platform-Independent GUI Testing via Image Embedding and Reinforcement LearningACM Transactions on Software Engineering and Methodology10.1145/367472833:7(1-27)Online publication date: 21-Jun-2024
  • (2024)Semi-supervised Crowdsourced Test Report Clustering via Screenshot-Text Binding RulesProceedings of the ACM on Software Engineering10.1145/36607761:FSE(1540-1563)Online publication date: 12-Jul-2024
  • (2024)Crowdsourced bug report severity prediction based on text and image understanding via heterogeneous graph convolutional networksJournal of Software: Evolution and Process10.1002/smr.2705Online publication date: 27-Jun-2024
  • (2023)Mobile App Crowdsourced Test Report Consistency Detection via Deep Image-and-Text Fusion UnderstandingIEEE Transactions on Software Engineering10.1109/TSE.2023.3285787(1-20)Online publication date: 2023
  • (2023)Leveraging Android Automated Testing to Assist Crowdsourced TestingIEEE Transactions on Software Engineering10.1109/TSE.2022.321687949:4(2318-2336)Online publication date: 1-Apr-2023
  • (2023)MuTCR: Test Case Recommendation via Multi-Level Signature Matching2023 IEEE/ACM International Conference on Automation of Software Test (AST)10.1109/AST58925.2023.00022(179-190)Online publication date: May-2023
  • (2023)Mobile crowdsourced test report prioritization based on text and image understandingJournal of Software: Evolution and Process10.1002/smr.2541Online publication date: 9-Feb-2023
  • (2022)Context- and Fairness-Aware In-Process Crowdworker RecommendationACM Transactions on Software Engineering and Methodology10.1145/348757131:3(1-31)Online publication date: 7-Mar-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media