research-article

Caspar: extracting and synthesizing user stories of problems from app reviews

Authors:

Munindar P. SinghAuthors Info & Claims

ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

Pages 628 - 640

https://doi.org/10.1145/3377811.3380924

Published: 01 October 2020 Publication History

Abstract

A user's review of an app often describes the user's interactions with the app. These interactions, which we interpret as mini stories, are prominent in reviews with negative ratings. In general, a story in an app review would contain at least two types of events: user actions and associated app behaviors. Being able to identify such stories would enable an app's developer in better maintaining and improving the app's functionality and enhancing user experience.

We present Caspar, a method for extracting and synthesizing user-reported mini stories regarding app problems from reviews. By extending and applying natural language processing techniques, Caspar extracts ordered events from app reviews, classifies them as user actions or app problems, and synthesizes action-problem pairs. Our evaluation shows that Caspar is effective in finding action-problem pairs from reviews. First, Caspar classifies the events with an accuracy of 82.0% on manually labeled data. Second, relative to human evaluators, Caspar extracts event pairs with 92.9% precision and 34.2% recall. In addition, we train an inference model on the extracted action-problem pairs that automatically predicts possible app problems for different use cases. Preliminary evaluation shows that our method yields promising results. Caspar illustrates the potential for a deeper understanding of app reviews and possibly other natural language artifacts arising in software engineering.

References

[1]

Brandon Beamer and Roxana Girju. 2009. Using a Bigram Event Model to Predict Causal Potential. In Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing). Springer Verlag, Mexico City, Mexico, 430--441.

Digital Library

[2]

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, YunHsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder. CoRR abs/1803.11175 (2018), 1--7.

[3]

Nathanael Chambers and Daniel Jurafsky. 2008. Unsupervised Learning of Narrative Event Chains. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computer Linguistics, Columbus, Ohio, 789--797.

[4]

Ning Chen, Jialiu Lin, Steven C. H. Hoi, Xiaokui Xiao, and Boshen Zhang. 2014. AR-Miner: Mining Informative Reviews for Developers from Mobile App Marketplace. In Proceedings of the 36th International Conference on Software Engineering (ICSE). ACM, Hyderabad, India, 767--778.

Digital Library

[5]

Marie-Catherine de Marneffe and Christopher D. Manning. 2008. Stanford Typed Dependencies Manual. https://nlp.stanford.edu/software/dependencies_manual.pdf. [Online; accessed: 2019-08-22].

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186.

[7]

Venkatesh T. Dhinakaran, Raseshwari Pulle, Nirav Ajmeri, and Pradeep K. Murukannaiah. 2018. App Review Analysis Via Active Learning: Reducing Supervision Effort without Compromising Classification Accuracy. In Proceedings of the 26th IEEE International Requirements Engineering Conference (RE). IEEE Press, Banff, AB, Canada, 170--181.

[8]

Andrea Di Sorbo, Sebastiano Panichella, Carol V. Alexandru, Junji Shimagaki, Corrado A. Visaggio, Gerardo Canfora, and Harald C. Gall. 2016. What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE). ACM, Seattle, WA, USA, 499--510.

[9]

Emitza Guzman, Rana Alkadhi, and Norbert Seyff. 2016. A Needle in a Haystack: What Do Twitter Users Say about Software?. In Proceedings of the 24th IEEE International Requirements Engineering Conference (RE). IEEE Press, Beijing, China, 96--105.

[10]

Emitza Guzman and Walid Maalej. 2014. How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Reviews. In Proceedings of the 22nd IEEE International Requirements Engineering Conference (RE). IEEE, Karlskrona, Sweden, 153--162.

[11]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (Nov. 1997), 1735--1780.

Digital Library

[12]

Zhichao Hu, Elahe Rahimtoroghi, and Marilyn Walker. 2017. Inference of FineGrained Event Causality from Blogs and Films. In Proceedings of the Events and Stories in the News Workshop. Association for Computational Linguistics, Vancouver, Canada, 52--58.

[13]

Zhizhao Hu and Marilyn A. Walker. 2017. Inferring Narrative Causality between Event Pairs in Films. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. Association for Computational Linguistics, Saarbrücken, Germany, 342--351.

[14]

Claudia Iacob and Rachel Harrison. 2013. Retrieving and Analyzing Mobile Apps Feature Requests from Online Reviews. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR). IEEE Press, San Francisco, CA, USA, 41--44.

[15]

Diederik P. Kingma and Jimmy Lei Ba. 2015. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR). arXiv.org, San Diego, California, 15.

[16]

Andrew J. Ko, Michael J. Lee, Valentina Ferrari, Steven Ip, and Charlie Tran. 2011. A Case Study of Post-deployment User Feedback Triage. In Proceedings of the 4th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). Association for Computing Machinery, Waikiki, Honolulu, HI, USA, 1--8.

Digital Library

[17]

Zijad Kurtanović and Walid Maalej. 2017. Mining User Rationale from Software Reviews. In Proceedings of the 25th IEEE International Requirements Engineering Conference (RE). IEEE Press, Lisbon, Portugal, 61--70.

[18]

Mirella Lapata and Alex Lascarides. 2004. Inferring Sentence-internal Temporal Relations. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL. Association for Computational Linguistics, Boston, Massachusetts, USA, 153--160.

[19]

Mirella Lapata and Alex Lascarides. 2006. Learning Sentence-internal Temporal Relations. Journal of Artificial Intelligence Research 27, 1 (Sept. 2006), 85--117.

[20]

Walid Maalej and Hadeer Nabil. 2015. Bug Report, Feature Request, or Simply Praise? On Automatically Classifying App Reviews. In Proceedings of the 23rd IEEE International Requirements Engineering Conference (RE). IEEE Press, Ottawa, ON, Canada, 116--125.

[21]

Inderjeet Mani, Marc Verhagen, Ben Wellner, Chong Min Lee, and James Pustejovsky. 2006. Machine Learning of Temporal Relations. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Sydney, Australia, 753--760.

Digital Library

[22]

Stuart Mcilroy, Nasir Ali, Hammad Khalid, and Ahmed E. Hassan. 2016. Analyzing and Automatically Labelling the Types of User Issues That Are Raised in Mobile App Reviews. Empirical Software Engineering 21, 3 (June 2016), 1067--1106.

Digital Library

[23]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS). Neural Information Processing Systems Foundation, Lake Tahoe, Nevada, 3111--3119.

Digital Library

[24]

Seyed Abolghasem Mirroshandel and Gholamreza Ghassem-Sani. 2012. Towards Unsupervised Learning of Temporal Relations between Events. Journal of Artificial Intelligence Research 45, 1 (Sept. 2012), 125--163.

[25]

Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, and James Allen. 2016. A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California, 839--849.

[26]

Qiang Ning, Hao Wu, Haoruo Peng, and Dan Roth. 2018. Improving Temporal Relation Extraction with a Globally Acquired Statistical Resource. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 841--851.

[27]

Dennis Pagano and Bernd Bruegge. 2013. User Involvement in Software Evolution Practice: A Case Study. In Proceedings of the 35th International Conference on Software Engineering (ICSE). IEEE Press, San Francisco, CA, USA, 953--962.

[28]

Dennis Pagano and Walid Maalej. 2013. User Feedback in the AppStore: An Empirical Study. In Proceedings of the 21st IEEE International Requirements Engineering Conference (RE). IEEE Press, Rio de Janeiro, Brazil, 125--134.

[29]

Fabio Palomba, Mario Linares-Vásquez, Gabriele Bavota, Rocco Oliveto, Massimiliano Di Penta, Denys Poshyvanyk, and Andrea De Lucia. 2015. User Reviews Matter! Tracking Crowdsourced Reviews to Support Evolution of Successful Apps. In Proceedings of the 31st IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE Press, Bremen, Germany, 291--300.

Digital Library

[30]

Sebastiano Panichella, Andrea Di Sorbo, Emitza Guzman, Corrado Visaggio, Gerardo Canfora, and Harald Gall. 2015. How Can I Improve My App? Classifying User Reviews for Software Maintenance and Evolution. In Proceedings of the 31st IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, Bremen, Germany, 281--290.

Digital Library

[31]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532--1543.

[32]

Gerald Petz, Michał Karpowicz, Harald Fürschuß, Andreas Auinger, Václav Stříteský, and Andreas Holzinger. 2013. Opinion Mining on the Web 2.0 - Characteristics of User Generated Content and Their Impacts. In Proceedings of 3rd International Workshop on Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data, Andreas Holzinger and Gabriella Pasi (Eds.). Springer Berlin Heidelberg, Maribor, Slovenia, 35--46.

[33]

Stuart J. Russell and Peter Norvig. 2016. Artificial Intelligence: A Modern Approach. Pearson Education Limited, London, England.

[34]

Gerard Salton and Michael J. McGill. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, New York, NY, USA.

Digital Library

[35]

Beatrice Santorini. 1995. Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision, 2nd printing). Technical Report. Department of Computer and Information Science, University of Pennsylvania.

[36]

Siddarth Srinivasan, Richa Arora, and Mark Riedl. 2018. A Simple and Effective Approach to the Story Cloze Test. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 92--96.

[37]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (NIPS). MIT Press, Montreal, Canada, 3104--3112.

Digital Library

[38]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. Neural Information Processing Systems Foundation, Long Beach, California, USA, 6000--6010.

[39]

Yinfei Yang and Amin Ahmad. 2019. Multilingual Universal Sentence Encoder for Semantic Retrieval. https://ai.googleblog.com/2019/07/multilingual-universal-sentence-encoder.html. [Online; accessed: 2019-08-22].

[40]

Rowan Zellers, Yonatan Bisk, Roy Schwartz, and Yejin Choi. 2018. SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Brussels, Belgium, 93--104.

[41]

Zhe Zhang and Munindar Singh. 2018. Limbic: Author-Based Sentiment Aspect Modeling Regularized with Word Embeddings and Discourse Relations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3412--3422.

[42]

Zhe Zhang and Munindar P. Singh. 2019. Leveraging Structural and Semantic Correspondence for Attribute-Oriented Aspect Sentiment Discovery. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Hong Kong, 5531--5541.

Cited By

Wang YWang JZhang HMing XWang Q(2025)Better together: Automated app review analysis with deep multi-task learningInformation and Software Technology10.1016/j.infsof.2024.107597177(107597)Online publication date: Jan-2025
https://doi.org/10.1016/j.infsof.2024.107597
Wang XZhang TTan YShang WLi Y(2024)How to effectively mine app reviews concerning software ecosystem? A survey of review characteristicsJournal of Systems and Software10.1016/j.jss.2024.112040213(112040)Online publication date: Jul-2024
https://doi.org/10.1016/j.jss.2024.112040
Zhu WProksch SGerman DGodfrey MLi LMcIntosh S(2024)What is an app store? The software engineering perspectiveEmpirical Software Engineering10.1007/s10664-023-10362-329:1Online publication date: 2-Jan-2024
https://dl.acm.org/doi/10.1007/s10664-023-10362-3
Show More Cited By

Recommendations

An Explorative Study of the Mobile App Ecosystem from App Developers' Perspective
WWW '17: Proceedings of the 26th International Conference on World Wide Web

With the prevalence of smartphones, app markets such as Apple App Store and Google Play has become the center stage in the mobile app ecosystem, with millions of apps developed by tens of thousands of app developers in each major market. This paper ...
A Measurement-based Study on Application Popularity in Android and iOS App Stores
Mobidata '15: Proceedings of the 2015 Workshop on Mobile Big Data

Mobile application stores (appstores) are emerging digital distribution platforms with explosive growth. Although there have been some observations on the mobile application (app) popularity in Android appstores, there is no report on the app popularity ...
RubyMotion iOS Development Essentials

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

June 2020

1640 pages

ISBN:9781450371216

DOI:10.1145/3377811

General Chairs:
Gregg Rothermel
North Carolina State University
,
Doo-Hwan Bae
KAIST, South Korea

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

KIISE: Korean Institute of Information Scientists and Engineers
IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

NSA Science of Security Lablet at NC State University

Conference

ICSE '20

Sponsor:

SIGSOFT

ICSE '20: 42nd International Conference on Software Engineering

June 27 - July 19, 2020

Seoul, South Korea

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
425
Total Downloads

Downloads (Last 12 months)67
Downloads (Last 6 weeks)5

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang YWang JZhang HMing XWang Q(2025)Better together: Automated app review analysis with deep multi-task learningInformation and Software Technology10.1016/j.infsof.2024.107597177(107597)Online publication date: Jan-2025
https://doi.org/10.1016/j.infsof.2024.107597
Wang XZhang TTan YShang WLi Y(2024)How to effectively mine app reviews concerning software ecosystem? A survey of review characteristicsJournal of Systems and Software10.1016/j.jss.2024.112040213(112040)Online publication date: Jul-2024
https://doi.org/10.1016/j.jss.2024.112040
Zhu WProksch SGerman DGodfrey MLi LMcIntosh S(2024)What is an app store? The software engineering perspectiveEmpirical Software Engineering10.1007/s10664-023-10362-329:1Online publication date: 2-Jan-2024
https://dl.acm.org/doi/10.1007/s10664-023-10362-3
Guo SLin HZhao JLi HChen RLi XJiang H(2023)An Easy Data Augmentation Approach for Application Reviews Event InferenceIEEE Transactions on Software Engineering10.1109/TSE.2023.331398949:10(4751-4772)Online publication date: 20-Sep-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3313989
Tan YChen JShang WZhang TFang SLuo XChen ZQi S(2023)STRE: An Automated Approach to Suggesting App Developers When to Stop Reading ReviewsIEEE Transactions on Software Engineering10.1109/TSE.2023.328574349:8(4135-4151)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3285743
Hadi MFard F(2023)Evaluating pre-trained models for user feedback analysis in software engineering: a study on classification of app-reviewsEmpirical Software Engineering10.1007/s10664-023-10314-x28:4Online publication date: 23-May-2023
https://dl.acm.org/doi/10.1007/s10664-023-10314-x
Grace RBurnham KNa H(2023)How Much Context Do Users Provide in App Reviews? Implications for Requirements ElicitationInformation for a Better World: Normality, Virtuality, Physicality, Inclusivity10.1007/978-3-031-28032-0_2(16-25)Online publication date: 13-Mar-2023
https://dl.acm.org/doi/10.1007/978-3-031-28032-0_2
Ebrahimi FMahmoud A(2022)Unsupervised Summarization of Privacy Concerns in Mobile Application ReviewsProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3561155(1-12)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3551349.3561155
Anandayuvaraj DDavis J(2022)Reflecting on Recurring Failures in IoT DevelopmentProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3559545(1-5)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3551349.3559545
Tushev MEbrahimi FMahmoud ADwyer MDamian DZeller A(2022)Domain-specific analysis of mobile app reviews using keyword-assisted topic modelsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510201(762-773)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3510201
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents