Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

On the effectiveness of testing sentiment analysis systems with metamorphic testing

Published: 01 October 2022 Publication History

Abstract

Context:

Metamorphic testing (MT) has been successfully applied to a wide scope of software systems. In these applications, the testing results of MT form the basis for drawing conclusions about the target system’s performance. Therefore, the effectiveness of MT is crucial to the trustfulness of the derived conclusions.

Objective:

However, due to the nature of MT, its effectiveness can be affected by various factors. Despite of MT’s success, it is still important to study its effectiveness under different application contexts.

Method:

To investigate the effectiveness of MT, we focus on an important aspect, namely, false satisfactions (which are satisfactions of metamorphic relations that involve at least one failing execution), and revisit the application of MT to sentiment analysis (SA) systems. An in-depth analysis of the essence of false satisfactions reveals the situations where they would occur, and how they would affect the effectiveness of MT. Furthermore, 20 metamorphic relations (MRs) are identified for supporting a user-oriented evaluation of SA systems.

Results:

The occurrence rates of false satisfactions are reported with respect to four SA systems. For the majority of MRs, false satisfactions account for about 20% to 50% of all MR satisfactions, suggesting that false satisfactions occur quite frequently in the evaluation of SA systems. It is also demonstrated that such high occurrence rates of false satisfactions adversely affect the users’ selection of SA systems.

Conclusion:

Our analysis reveals that without considering the occurrence of false satisfactions, MT may overestimate the system’s conformance to the relevant MR. Furthermore, our experiments empirically show that conclusions derived from MT can be adversely affected when there are many false satisfactions. Our findings will help the MT community to adopt a more fair and reliable way of using the test outcomes of MT, and can also inspire the development of solid foundations for MT.

Highlights

An analysis of false satisfactions, revealing their impacts on metamorphic testing.
An application of metamorphic testing to sentiment analysis systems.
A discussion about the causes for high occurrence rates of false satisfactions.

References

[1]
Chen T.Y., Kuo F.-C., Liu H., Poon P.-L., Towey D., Tse T.H., Zhou Z.Q., Metamorphic testing: A review of challenges and opportunities, ACM Comput. Surv. 51 (1) (2018) 4:1–4:27.
[2]
Segura S., Fraser G., Sanchez A.B., Ruiz-Cortés A., A survey on metamorphic testing, IEEE Trans. Softw. Eng. 42 (9) (2016) 805–824.
[3]
T.Y. Chen, T.H. Tse, New visions on metamorphic testing after a quarter of a century of inception, in: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, 2021, pp. 1487–1490.
[4]
Xie X., Ho J.W.K., Murphy C., Kaiser G., Xu B.W., Chen T.Y., Testing and validating machine learning classifiers by metamorphic testing, J. Syst. Softw. 84 (4) (2011) 544–558.
[5]
Segura S., Parejo J.A., Troya J., Ruiz-Cortés A., Metamorphic testing of restful web apis, IEEE Trans. Softw. Eng. 44 (11) (2018) 1083–1099.
[6]
Zhou Z.Q., Sun L., Metamorphic testing of driverless cars, Commun. ACM 62 (3) (2019) 61–67.
[7]
Zhou Z.Q., Tse T.H., Witheridge M., Metamorphic robustness testing: Exposing hidden defects in citation statistics and journal impact factors, IEEE Trans. Softw. Eng. 47 (6) (2019) 1164–1183.
[8]
A.F. Donaldson, Metamorphic testing of android graphics drivers, in: Proceedings of the 4th International Workshop on Metamorphic Testing, MET ’19, 2019, pp. 1–1.
[9]
J. Hughes, How to specify it!, in: Proceedings of the International Symposium on Trends in Functional Programming, 2020, pp. 58–83.
[10]
Lin X., Simon M., Niu N., Scientific software testing goes serverless: Creating and invoking metamorphic functions, IEEE Softw. 38 (1) (2021) 61–67.
[11]
J. Ahlgren, M.E. Berezin, K. Bojarczuk, E. Dulskyte, I. Dvortsova, J. George, N. Gucevska, M. Harman, M. Lomeli, E. Meijer, S. Sapora, J. Spahr-Summers, Testing web enabled simulation at scale using metamorphic testing, in: Proceedings of the 43rd International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP ’21, 2021, pp. 140–149.
[12]
Zhou Z.Q., Xiang S., Chen T.Y., Metamorphic testing for software quality assessment: A study of search engines, IEEE Trans. Softw. Eng. 42 (3) (2016) 264–284.
[13]
Xie X., Zhang Z., Chen T.Y., Liu Y., Poon P.-L., Xu B., METTLE: a metamorphic testing approach to assessing and validating unsupervised machine learning systems, IEEE Trans. Reliab. 69 (4) (2020) 1293–1322.
[14]
Zhou Z.Q., Sun L., Chen T.Y., Towey D., Metamorphic relations for enhancing system understanding and use, IEEE Trans. Softw. Eng. 46 (10) (2020) 1120–1154.
[15]
Chen T.Y., Tse T.H., Zhou Z.Q., Semi-proving: An integrated method for program proving, testing and debugging, IEEE Trans. Softw. Eng. 37 (1) (2011) 109–125.
[16]
Jiang M., Chen T.Y., Kuo F.-C., Towey D., Ding Z., A metamorphic testing approach for supporting program repair without the need for a test oracle, J. Syst. Softw. 126 (2017) 127–140.
[17]
Jiang M., Chen T.Y., Zhou Z.Q., Ding Z., Input test suites for program repair: A novel construction method based on metamorphic relations, IEEE Trans. Reliab. 70 (1) (2021) 285–303.
[18]
Y. Yuan, S. Wang, M. Jiang, T.Y. Chen, Perception matters: Detecting perception failures of vqa models using metamorphic testing, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 16908–16917.
[19]
P. Ma, S. Wang, J. Liu, Metamorphic testing and certified mitigation of fairness violations in NLP models, in: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020, pp. 458–465.
[20]
Asyrofi M.H., Yusuf I.N.B., Kang H.J., Thung F., Yang Z., Lo D., Biasfinder: Metamorphic test generation to uncover bias for sentiment analysis systems, 2021, arXiv:2102.01859.
[21]
Ding J., Wu T., Lu J.Q., Hu X.-H., Self-checked metamorphic testing of an image processing program, in: 2010 Fourth International Conference on Secure Software Integration and Reliability Improvement, 2010, pp. 190–197.
[22]
Liu B., Zhang L., A Survey of Opinion Mining and Sentiment Analysis, Springer US, Boston, MA, 2012, pp. 415–463.
[23]
Ribeiro M., Wu T., Guestrin C., Singh S., Beyond accuracy: Behavioral testing of NLP models with CheckList, in: Association for Computational Linguistics (ACL), 2020, pp. 4902–4912.
[24]
Liu H., Kuo F.-C., Towey D., Chen T.Y., How effectively does metamorphic testing alleviate the oracle problem?, IEEE Trans. Softw. Eng. 40 (1) (2014) 4–22.
[25]
Kuo F.-C., Chen T.Y., Tam W.K., Testing embedded software by metamorphic testing: A wireless metering system case study, in: 2011 IEEE 36th Conference on Local Computer Networks, 2011, pp. 291–294.
[26]
Olsen M., Raunak M., Increasing validity of simulation models through metamorphic testing, IEEE Trans. Reliab. 68 (1) (2019) 91–108.
[27]
Ding J., Kang X., Hu X., Validating a deep learning framework by metamorphic testing, in: 2017 IEEE/ACM 2nd International Workshop on Metamorphic Testing (MET), 2017, pp. 28–34.
[28]
Wang S., Su Z., Metamorphic object insertion for testing object detection systems, in: 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2020, pp. 1053–1065.
[29]
P. He, C. Meister, Z. Su, Structure-invariant testing for machine translation, in: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ICSE ’20, 2020, pp. 961–973.
[30]
Z. Sun, J.M. Zhang, M. Harman, M. Papadakis, L. Zhang, Automatic testing and improvement of machine translation, in: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ICSE ’20, 2020, pp. 974–985.
[31]
Yue L., Chen W., Li X., Zuo W., Yin M., A survey of sentiment analysis in social media, Knowl. Inf. Syst. 60 (2) (2019) 617–663.
[32]
M. Hu, B. Liu, Mining and summarizing customer reviews, in: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 168–177.
[33]
Mishne G., Glance N.S., Predicting movie sales from blogger sentiment, in: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, 2006, pp. 155–158.
[34]
Li X., Xie H., Chen L., Wang J., Deng X., News impact on stock price return via sentiment analysis, Knowl.-Based Syst. 69 (2014) 14–23.
[35]
Li Y.-M., Shiu Y.-L., A diffusion mechanism for social advertising over microblogs, Decis. Support Syst. 54 (1) (2012) 9–22.
[36]
Alamoodi A., Zaidan B., Zaidan A., Albahri O., Mohammed K., Malik R., Almahdi E., Chyad M., Tareq Z., Albahri A., Hameed H., Alaa M., Sentiment analysis and its applications in fighting covid-19 and infectious diseases: A systematic review, Expert Syst. Appl. 167 (2021).
[37]
Hernández A., Sanchez V., Sánchez G., Pérez H., Olivares J., Toscano K., Nakano M., Martinez V., Security attack prediction based on user sentiment analysis of twitter data, in: 2016 IEEE International Conference on Industrial Technology (ICIT), 2016, pp. 610–617.
[38]
Yousif A., Niu Z., Tarus J., Ahmad A., A survey on sentiment analysis of scientific citations, Artif. Intell. Rev. 52 (3) (2019) 1805–1838.
[39]
S. Kiritchenko, S. Mohammad, Examining gender and race bias in two hundred sentiment analysis systems, in: Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, 2018, pp. 43–53.
[40]
Alhazmi A., Zhang W.E., Sheng Q.Z., Aljubairy A., Analyzing the sensitivity of deep neural networks for sentiment analysis: A scoring approach, in: 2020 International Joint Conference on Neural Networks (IJCNN), 2020, pp. 1–7.
[41]
de Oliveira G.A., de Sousa R.T., de Oliveira Albuquerque R., Garcá Villalba L.J., Adversarial attacks on a lexical sentiment analysis classifier, Comput. Commun. 174 (2021) 154–171.
[42]
Barr E.T., Harman M., McMinn P., Shahbaz M., Yoo S., The oracle problem in software testing: A survey, IEEE Trans. Softw. Eng. 41 (5) (2015) 507–525.
[43]
Medhat W., Hassan A., Korashy H., Sentiment analysis algorithms and applications: A survey, Ain Shams Eng. J. 5 (4) (2014) 1093–1113.
[44]
B. Pang, L. Lee, A sentimental education: Sentiment analysis using subjectivity, in: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL), 2004, pp. 271–278.
[45]
Mai P.X., Pastore F., Goknil A., Briand L., Metamorphic security testing for web systems, in: 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST), 2020, pp. 186–197.
[46]
Chan A., Ma L., Juefei-Xu F., Ong Y.-S., Xie X., Xue M., Liu Y., Breaking neural reasoning architectures with metamorphic relation-based adversarial examples, IEEE Trans. Neural Netw. Learn. Syst. (2021) 1–7.
[47]
M. Zhang, Y. Zhang, L. Zhang, C. Liu, S. Khurshid, Deeproad: Gan-based metamorphic testing and input validation framework for autonomous driving systems, in: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, 2018, pp. 132–142.
[48]
Y. Tian, K. Pei, S. Jana, B. Ray, Deeptest: Automated testing of deep-neural-network-driven autonomous cars, in: Proceedings of the 40th International Conference on Software Engineering, ICSE ’18, 2018, pp. 303–314.
[49]
M.N. Mansur, M. Christakis, V. Wustholz, Metamorphic testing of datalog engines, in: Proceedings of the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2021, pp. 639–650.
[50]
A.F. Donaldson, H. Evrard, A. Lascu, P. Thomson, Automated testing of graphics shader compilers, in: Proceedings of the ACM on Programming Languages, 2017, pp. 1–29.
[51]
Kanewala U., Bieman J.M., Using machine learning techniques to detect metamorphic relations for programs without test oracles, in: 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE), 2013, pp. 1–10.
[52]
Kanewala U., Bieman J.M., Ben-Hur A., Predicting metamorphic relations for testing scientific software: A machine learning approach using graph kernels, Softw. Test. Verif. Reliab. 26 (3) (2016) 245–269.
[53]
J. Zhang, J. Chen, D. Hao, Y. Xiong, B. Xie, L. Zhang, H. Mei, Search-based inference of polynomial metamorphic relations, in: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, 2014, pp. 701–712.
[54]
J. Ayerdi, V. Terragni, A. Arrieta, G. Sagardui, P. Tonella, M. Arratibel, Generating metamorphic relations for cyber–physical systems with genetic programming: An industrial case study, in: Proceedings of the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), no. 1264–1274, 2021.
[55]
C.-A. Sun, Y. Liu, Z. Wang, W.K. Chan, μmt: A data mutation directed metamorphic relation acquisition methodology, in: Proceedings of the 1st International Workshop on Metamorphic Testing, MET ’16, 2016, pp. 12–18.
[56]
Sun C., Fu A., Poon P., Xie X., Liu H., Chen T.Y., METRIC+: A metamorphic relation identification technique based on input plus output domains, IEEE Trans. Softw. Eng. 47 (9) (2019) 1764–1785.
[57]
Qiu K., Zheng Z., Chen T., Poon P.-L., Theoretical and empirical analyses of the effectiveness of metamorphic relation composition, IEEE Trans. Softw. Eng. (2020) 1.
[58]
Spieker H., Gotlieb A., Adaptive metamorphic testing with contextual bandits, J. Syst. Softw. 165 (2020).
[59]
S. Tolksdorf, D. Lehmann, M. Pradel, Interactive metamorphic testing of debuggers, in: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2019, 2019, pp. 273–283.

Cited By

View all
  • (2024)BinAug: Enhancing Binary Similarity Analysis with Low-Cost Input RepairingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623328(1-13)Online publication date: 20-May-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information and Software Technology
Information and Software Technology  Volume 150, Issue C
Oct 2022
295 pages

Publisher

Butterworth-Heinemann

United States

Publication History

Published: 01 October 2022

Author Tags

  1. Metamorphic testing
  2. Metamorphic relation
  3. Sentiment analysis
  4. False satisfaction

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)BinAug: Enhancing Binary Similarity Analysis with Low-Cost Input RepairingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623328(1-13)Online publication date: 20-May-2024

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media