Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3377811.3380924acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Caspar: extracting and synthesizing user stories of problems from app reviews

Published: 01 October 2020 Publication History

Abstract

A user's review of an app often describes the user's interactions with the app. These interactions, which we interpret as mini stories, are prominent in reviews with negative ratings. In general, a story in an app review would contain at least two types of events: user actions and associated app behaviors. Being able to identify such stories would enable an app's developer in better maintaining and improving the app's functionality and enhancing user experience.
We present Caspar, a method for extracting and synthesizing user-reported mini stories regarding app problems from reviews. By extending and applying natural language processing techniques, Caspar extracts ordered events from app reviews, classifies them as user actions or app problems, and synthesizes action-problem pairs. Our evaluation shows that Caspar is effective in finding action-problem pairs from reviews. First, Caspar classifies the events with an accuracy of 82.0% on manually labeled data. Second, relative to human evaluators, Caspar extracts event pairs with 92.9% precision and 34.2% recall. In addition, we train an inference model on the extracted action-problem pairs that automatically predicts possible app problems for different use cases. Preliminary evaluation shows that our method yields promising results. Caspar illustrates the potential for a deeper understanding of app reviews and possibly other natural language artifacts arising in software engineering.

References

[1]
Brandon Beamer and Roxana Girju. 2009. Using a Bigram Event Model to Predict Causal Potential. In Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing). Springer Verlag, Mexico City, Mexico, 430--441.
[2]
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, YunHsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder. CoRR abs/1803.11175 (2018), 1--7.
[3]
Nathanael Chambers and Daniel Jurafsky. 2008. Unsupervised Learning of Narrative Event Chains. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computer Linguistics, Columbus, Ohio, 789--797.
[4]
Ning Chen, Jialiu Lin, Steven C. H. Hoi, Xiaokui Xiao, and Boshen Zhang. 2014. AR-Miner: Mining Informative Reviews for Developers from Mobile App Marketplace. In Proceedings of the 36th International Conference on Software Engineering (ICSE). ACM, Hyderabad, India, 767--778.
[5]
Marie-Catherine de Marneffe and Christopher D. Manning. 2008. Stanford Typed Dependencies Manual. https://nlp.stanford.edu/software/dependencies_manual.pdf. [Online; accessed: 2019-08-22].
[6]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186.
[7]
Venkatesh T. Dhinakaran, Raseshwari Pulle, Nirav Ajmeri, and Pradeep K. Murukannaiah. 2018. App Review Analysis Via Active Learning: Reducing Supervision Effort without Compromising Classification Accuracy. In Proceedings of the 26th IEEE International Requirements Engineering Conference (RE). IEEE Press, Banff, AB, Canada, 170--181.
[8]
Andrea Di Sorbo, Sebastiano Panichella, Carol V. Alexandru, Junji Shimagaki, Corrado A. Visaggio, Gerardo Canfora, and Harald C. Gall. 2016. What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE). ACM, Seattle, WA, USA, 499--510.
[9]
Emitza Guzman, Rana Alkadhi, and Norbert Seyff. 2016. A Needle in a Haystack: What Do Twitter Users Say about Software?. In Proceedings of the 24th IEEE International Requirements Engineering Conference (RE). IEEE Press, Beijing, China, 96--105.
[10]
Emitza Guzman and Walid Maalej. 2014. How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Reviews. In Proceedings of the 22nd IEEE International Requirements Engineering Conference (RE). IEEE, Karlskrona, Sweden, 153--162.
[11]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (Nov. 1997), 1735--1780.
[12]
Zhichao Hu, Elahe Rahimtoroghi, and Marilyn Walker. 2017. Inference of FineGrained Event Causality from Blogs and Films. In Proceedings of the Events and Stories in the News Workshop. Association for Computational Linguistics, Vancouver, Canada, 52--58.
[13]
Zhizhao Hu and Marilyn A. Walker. 2017. Inferring Narrative Causality between Event Pairs in Films. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. Association for Computational Linguistics, Saarbrücken, Germany, 342--351.
[14]
Claudia Iacob and Rachel Harrison. 2013. Retrieving and Analyzing Mobile Apps Feature Requests from Online Reviews. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR). IEEE Press, San Francisco, CA, USA, 41--44.
[15]
Diederik P. Kingma and Jimmy Lei Ba. 2015. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR). arXiv.org, San Diego, California, 15.
[16]
Andrew J. Ko, Michael J. Lee, Valentina Ferrari, Steven Ip, and Charlie Tran. 2011. A Case Study of Post-deployment User Feedback Triage. In Proceedings of the 4th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). Association for Computing Machinery, Waikiki, Honolulu, HI, USA, 1--8.
[17]
Zijad Kurtanović and Walid Maalej. 2017. Mining User Rationale from Software Reviews. In Proceedings of the 25th IEEE International Requirements Engineering Conference (RE). IEEE Press, Lisbon, Portugal, 61--70.
[18]
Mirella Lapata and Alex Lascarides. 2004. Inferring Sentence-internal Temporal Relations. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL. Association for Computational Linguistics, Boston, Massachusetts, USA, 153--160.
[19]
Mirella Lapata and Alex Lascarides. 2006. Learning Sentence-internal Temporal Relations. Journal of Artificial Intelligence Research 27, 1 (Sept. 2006), 85--117.
[20]
Walid Maalej and Hadeer Nabil. 2015. Bug Report, Feature Request, or Simply Praise? On Automatically Classifying App Reviews. In Proceedings of the 23rd IEEE International Requirements Engineering Conference (RE). IEEE Press, Ottawa, ON, Canada, 116--125.
[21]
Inderjeet Mani, Marc Verhagen, Ben Wellner, Chong Min Lee, and James Pustejovsky. 2006. Machine Learning of Temporal Relations. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Sydney, Australia, 753--760.
[22]
Stuart Mcilroy, Nasir Ali, Hammad Khalid, and Ahmed E. Hassan. 2016. Analyzing and Automatically Labelling the Types of User Issues That Are Raised in Mobile App Reviews. Empirical Software Engineering 21, 3 (June 2016), 1067--1106.
[23]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS). Neural Information Processing Systems Foundation, Lake Tahoe, Nevada, 3111--3119.
[24]
Seyed Abolghasem Mirroshandel and Gholamreza Ghassem-Sani. 2012. Towards Unsupervised Learning of Temporal Relations between Events. Journal of Artificial Intelligence Research 45, 1 (Sept. 2012), 125--163.
[25]
Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, and James Allen. 2016. A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California, 839--849.
[26]
Qiang Ning, Hao Wu, Haoruo Peng, and Dan Roth. 2018. Improving Temporal Relation Extraction with a Globally Acquired Statistical Resource. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 841--851.
[27]
Dennis Pagano and Bernd Bruegge. 2013. User Involvement in Software Evolution Practice: A Case Study. In Proceedings of the 35th International Conference on Software Engineering (ICSE). IEEE Press, San Francisco, CA, USA, 953--962.
[28]
Dennis Pagano and Walid Maalej. 2013. User Feedback in the AppStore: An Empirical Study. In Proceedings of the 21st IEEE International Requirements Engineering Conference (RE). IEEE Press, Rio de Janeiro, Brazil, 125--134.
[29]
Fabio Palomba, Mario Linares-Vásquez, Gabriele Bavota, Rocco Oliveto, Massimiliano Di Penta, Denys Poshyvanyk, and Andrea De Lucia. 2015. User Reviews Matter! Tracking Crowdsourced Reviews to Support Evolution of Successful Apps. In Proceedings of the 31st IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE Press, Bremen, Germany, 291--300.
[30]
Sebastiano Panichella, Andrea Di Sorbo, Emitza Guzman, Corrado Visaggio, Gerardo Canfora, and Harald Gall. 2015. How Can I Improve My App? Classifying User Reviews for Software Maintenance and Evolution. In Proceedings of the 31st IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, Bremen, Germany, 281--290.
[31]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532--1543.
[32]
Gerald Petz, Michał Karpowicz, Harald Fürschuß, Andreas Auinger, Václav Stříteský, and Andreas Holzinger. 2013. Opinion Mining on the Web 2.0 - Characteristics of User Generated Content and Their Impacts. In Proceedings of 3rd International Workshop on Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data, Andreas Holzinger and Gabriella Pasi (Eds.). Springer Berlin Heidelberg, Maribor, Slovenia, 35--46.
[33]
Stuart J. Russell and Peter Norvig. 2016. Artificial Intelligence: A Modern Approach. Pearson Education Limited, London, England.
[34]
Gerard Salton and Michael J. McGill. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, New York, NY, USA.
[35]
Beatrice Santorini. 1995. Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision, 2nd printing). Technical Report. Department of Computer and Information Science, University of Pennsylvania.
[36]
Siddarth Srinivasan, Richa Arora, and Mark Riedl. 2018. A Simple and Effective Approach to the Story Cloze Test. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 92--96.
[37]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (NIPS). MIT Press, Montreal, Canada, 3104--3112.
[38]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. Neural Information Processing Systems Foundation, Long Beach, California, USA, 6000--6010.
[39]
Yinfei Yang and Amin Ahmad. 2019. Multilingual Universal Sentence Encoder for Semantic Retrieval. https://ai.googleblog.com/2019/07/multilingual-universal-sentence-encoder.html. [Online; accessed: 2019-08-22].
[40]
Rowan Zellers, Yonatan Bisk, Roy Schwartz, and Yejin Choi. 2018. SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Brussels, Belgium, 93--104.
[41]
Zhe Zhang and Munindar Singh. 2018. Limbic: Author-Based Sentiment Aspect Modeling Regularized with Word Embeddings and Discourse Relations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3412--3422.
[42]
Zhe Zhang and Munindar P. Singh. 2019. Leveraging Structural and Semantic Correspondence for Attribute-Oriented Aspect Sentiment Discovery. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Hong Kong, 5531--5541.

Cited By

View all
  • (2025)Better together: Automated app review analysis with deep multi-task learningInformation and Software Technology10.1016/j.infsof.2024.107597177(107597)Online publication date: Jan-2025
  • (2024)How to effectively mine app reviews concerning software ecosystem? A survey of review characteristicsJournal of Systems and Software10.1016/j.jss.2024.112040213(112040)Online publication date: Jul-2024
  • (2024)What is an app store? The software engineering perspectiveEmpirical Software Engineering10.1007/s10664-023-10362-329:1Online publication date: 2-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering
June 2020
1640 pages
ISBN:9781450371216
DOI:10.1145/3377811
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • KIISE: Korean Institute of Information Scientists and Engineers
  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2020

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • NSA Science of Security Lablet at NC State University

Conference

ICSE '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)67
  • Downloads (Last 6 weeks)5
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Better together: Automated app review analysis with deep multi-task learningInformation and Software Technology10.1016/j.infsof.2024.107597177(107597)Online publication date: Jan-2025
  • (2024)How to effectively mine app reviews concerning software ecosystem? A survey of review characteristicsJournal of Systems and Software10.1016/j.jss.2024.112040213(112040)Online publication date: Jul-2024
  • (2024)What is an app store? The software engineering perspectiveEmpirical Software Engineering10.1007/s10664-023-10362-329:1Online publication date: 2-Jan-2024
  • (2023)An Easy Data Augmentation Approach for Application Reviews Event InferenceIEEE Transactions on Software Engineering10.1109/TSE.2023.331398949:10(4751-4772)Online publication date: 20-Sep-2023
  • (2023)STRE: An Automated Approach to Suggesting App Developers When to Stop Reading ReviewsIEEE Transactions on Software Engineering10.1109/TSE.2023.328574349:8(4135-4151)Online publication date: 1-Aug-2023
  • (2023)Evaluating pre-trained models for user feedback analysis in software engineering: a study on classification of app-reviewsEmpirical Software Engineering10.1007/s10664-023-10314-x28:4Online publication date: 23-May-2023
  • (2023)How Much Context Do Users Provide in App Reviews? Implications for Requirements ElicitationInformation for a Better World: Normality, Virtuality, Physicality, Inclusivity10.1007/978-3-031-28032-0_2(16-25)Online publication date: 13-Mar-2023
  • (2022)Unsupervised Summarization of Privacy Concerns in Mobile Application ReviewsProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3561155(1-12)Online publication date: 10-Oct-2022
  • (2022)Reflecting on Recurring Failures in IoT DevelopmentProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3559545(1-5)Online publication date: 10-Oct-2022
  • (2022)Domain-specific analysis of mobile app reviews using keyword-assisted topic modelsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510201(762-773)Online publication date: 21-May-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media