research-article

App Miscategorization Detection: A Case Study on Google Play

Authors:

Suranga Seneviratne,

Aruna Seneviratne,

Sanjay ChawlaAuthors Info & Claims

IEEE Transactions on Knowledge and Data Engineering, Volume 29, Issue 8

Pages 1591 - 1604

https://doi.org/10.1109/TKDE.2017.2686851

Published: 01 August 2017 Publication History

Abstract

An ongoing challenge in the rapidly evolving app market ecosystem is to maintain the integrity of app categories. At the time of registration, app developers have to select, what they believe, is the most appropriate category for their apps. Besides the inherent ambiguity of selecting the right category, the approach leaves open the possibility of misuse and potential gaming by the registrant. Periodically, the app store will refine the list of categories available and potentially reassign the apps. However, it has been observed that the mismatch between the description of the app and the category it belongs to, continues to persist. Although some common mechanisms (e.g., a complaint-driven or manual checking) exist, they limit the response time to detect miscategorized apps and still open the challenge on categorization. We introduce <sc>FRAC+</sc>: (FR)amework for (A)pp (C)ategorization. <sc>FRAC+</sc> has the following salient features: (i) it is based on a data-driven topic model and automatically suggests the categories appropriate for the app store, and (ii) it can detect miscategorizated apps. Extensive experiments attest to the performance of <sc>FRAC+</sc>. Experiments on <sc>Google</sc> Play shows that <sc>FRAC+</sc>’s topics are more aligned with <sc>Google</sc>’s new categories and 0.35-1.10 percent game apps are detected to be miscategorized.

References

[1]

C. Warren, “Google play hits 1 million apps,” 2013. [Online]. Available: http://mashable.com

[2]

S. Fiegerman, “Apple’s app store tops 1 million apps,” 2013. [Online]. Available: http://mashable.com

[3]

Unity3d, “5000+ apps banned by Apple,” 2010. [Online]. Available: http://forum.unity3d.com

[4]

G. Machuret, ASO Ninja. Dallas, TX, USA: Primedia E-Launch LLC, 2013.

[5]

Barnes & Noble, “Nook app submission guide,” 2013. [Online]. Available: https://nookdeveloper.zendesk.com

[6]

D. Sainati, “Evaluation and distribution of mHealth apps,” (2014). [Online]. Available: https://gem-ted.sciencesconf.org/conference/gem-ted/Medappcare.pdf

[7]

Estoty Entertainment, “2048 number puzzle game,” 2014. [Online]. Available: https://play.google.com/store/apps/details?id=com.estoty.game2048

[8]

A3TEST, “2048,” 2014. [Online]. Available: https://play.google.com/store/apps/details?id=com.a3test.tzfe

[9]

S. Perez, “Developer spams Google Play with ripoffs of well-known apps...again,” 2014. [Online]. Available: https://techcrunch.com/2014/01/02/developer-spams-google-play-with-ripo%ffs-of-well-known-apps-again/

[10]

S. Perez, “Nearly 60k low-quality apps booted from Google Play store in february, points to increased spam-fighting,” 2013. [Online]. Available: https://techcrunch.com/2013/04/08/nearly-60k-low-quality-apps-booted- fr%om-google-play-store-in-february-points- to-increased-spam-fighting/

[11]

S. Vakulenko, O. Müller, and J. V. Brocke, “ Enriching iTunes app store categories via topic modeling,” in Proc. 35th Int. Conf. Inf. Syst. (E-Business), 2014, pp. 1–11 .

[12]

A. Al-Subaihin, et al., “Clustering mobile apps based on mined textual features,” in Proc. 10th ACM/IEEE Int. Symp. Empirical Softw. Eng. Meas., 2016, Art. no. 38.

[13]

G. Berardi, A. Esuli, T. Fagni, and F. Sebastiani, “Multi-store metadata-based supervised mobile app classification,” in Proc. 30th Annu. ACM Symp. Appl. Comput., 2015, pp. 585–588.

[14]

H. Zhu, H. Cao, E. Chen, H. Xiong, and J. Tian, “Exploiting enriched contextual information for mobile app classification,” in Proc. 21st ACM Int. Conf. Inf. Knowl. Manage., 2012, pp. 1617–1621.

[15]

H. Zhu, E. Chen, H. Xiong, H. Cao, and J. Tian, “Mobile app classification with enriched contextual information,” IEEE Trans. Mobile Comput., vol. 13, no. 7, pp. 1550–1563, Jul. 2014.

[16]

D. L. B. Lulu and T. Kuflik, “Wise mobile icons organization: Apps taxonomy classification using functionality mining to ease apps finding,” Mobile Inf. Syst., vol. 2016, 2016, Art. no. 3083450.

[17]

A. Gorla, I. Tavecchia, F. Gross, and A. Zeller, “Checking app behavior against app descriptions,” in Proc. Int. Conf. Softw. Eng., 2014, pp. 1025–1035.

[18]

X. Wei, L. Gomez, I. Neamtiu, and M. Faloutsos, “Profiledroid: Multi-layer profiling of Android applications,” in Proc. Int. Conf. Mobile Comput. Netw., 2012, pp. 137–148.

[19]

A. Shabtai, U. Kanonov, Y. Elovici, C. Glezer, and Y. Weiss, ““Andromaly”: A behavioral malware detection framework for Android devices,” J. Intell. Inf. Syst., vol. 38, no. 1, pp. 161–190, 2012.

Digital Library

[20]

D. J. Wu, C. H. Mao, T. E. Wei, H. M. Lee, and K. P. Wu, “DroidMat: Android malware detection through manifest and API calls tracing,” in Proc. Asia Joint Conf. Inf. Secur., 2012, pp. 62–69.

[21]

B. Sanz, I. Santos, C. Laorden, X. Ugarte-Pedrero, and P. G. Bringas, “On the automatic categorisation of android applications,” in Proc. IEEE Consum. Commun. Netw. Conf., 2012, pp. 149–153.

[22]

S. Ma, S. Wang, D. Lo, R. H. Deng, and C. Sun, “Active semi-supervised approach for checking app behavior against its description,” in Proc. IEEE 39th Annu. Int. Comput. Softw. Appl. , 2015, pp. 179–184.

[23]

A. Shabtai, Y. Fledel, and Y. Elovici, “Automated static code analysis for classifying Android applications using machine learning,” in Proc. Comput. Intell. Soc., 2010, pp. 329–333.

[24]

H. Zhu, H. Xiong, Y. Ge, and E. Chen, “Ranking fraud detection for mobile apps: A holistic view,” in Proc. Conf. Inf. Knowl. Manage., 2013, pp. 619–628.

[25]

R. Chandy and H. Gu, “Identifying spam in the iOS app store,” in Proc. Joint WICOW/AIRWeb Workshop Web Qual., 2012, pp. 56–59.

[26]

B. Fu, J. Lin, L. Li, C. Faloutsos, J. Hong, and N. Sadeh, “ Why people hate your app: Making sense of user feedback in a mobile app store,” in Proc. Int. Conf. Knowl. Discovery Data Mining, 2013, pp. 1276– 1284.

[27]

M. Liu, C. Wu, X.-N. Zhao, C.-Y. Lin, and X.-L. Wang, “App relationship calculation: An iterative process,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 8, pp. 2049–2063, Aug. 2015.

Digital Library

[28]

Y. Zhou, H. Yu, and X. Cai, “A novel k-means algorithm for clustering and outlier detection,” in Proc. IEEE Int. Conf. Future Inf. Technol. Manage. Eng., 2009, pp. 476– 480.

[29]

K.-A. Yoon, O.-S. Kwon, and D.-H. Bae, “An approach to outlier detection of software measurement data using the k-means clustering method,” in Proc. Int. Symp. Empirical Softw. Eng. Meas., 2007, pp. 443 –445.

[30]

B. Schölkopf, R. C. Williamson, A. J. Smola, J. Shawe-Taylor, and J. C. Platt, “Support vector method for novelty detection,” in Proc. Advances Neural Inf. Process. Syst., 2000, pp. 582–588.

[31]

L. M. Manevitz and M. Yousef, “One-class SVMs for document classification,” J. Mach. Learning Res., vol. 2, pp. 139–154, 2001.

[32]

V. Barnett and T. Lewis, Outliers in Statistical Data. Hoboken, NJ, USA: Wiley, 1978.

[33]

D. M. Blei, A. Y. Ng, M. I. Jordan, and J. Lafferty, “Latent dirichlet allocation,” J. Mach. Learning Res., vol. 3, pp. 993 –1022, 2003.

Digital Library

[34]

A. Ahmed, Y. Low, M. Aly, V. Josifovski, and A. J. Smola, “Scalable distributed inference of dynamic user interests for behavioral targeting,” in Proc. Int. Conf. Knowl. Discovery Data Mining, 2011, pp. 114–122.

[35]

D. Ramage, D. Hall, R. Nallapati, and C. D. Manning, “Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora,” in Proc. Conf. Empirical Methods Natural Language Process., 2009, pp. 248–256.

[36]

A. Banerjee, I. Dhillon, J. Ghosh, and S. Sra,“Clustering on the unit hypersphere using von mises-fisher distributions,” J. Mach. Learning Res., vol. 6, pp. 1345–1382, 2005.

Digital Library

[37]

D. Surian and S. Chawla, “Mining outlier participants: Insights using directional distributions in latent models,” in Proc. Eur. Conf. Mach. Learning Principles Practice Knowl. Discovery Databases , 2013, pp. 337–352.

[38]

C. C. Aggarwal, S. C. Gates, and P. S. Yu, “On the merits of building categorization systems by supervised clustering,” in Proc. Int. Conf. Knowl. Discovery Data Mining, 1999, pp. 352–356.

[39]

X. Cao, G. Cong, B. Cui, C. S. Jensen, and Q. Yuan, “Approaches to exploring category information for question retrieval in community question-answer archives,” ACM Trans. Inf. Syst., vol. 30, pp. 7.1–7.38, 2012.

[40]

Q. Yuan, G. Cong, A. Sun, C.-Y. Lin, and N. M. Thalmann, “Category hierarchy maintenance: A data-driven approach,” in Proc. ACM SIGIR Int. Conf. Res. Develop. Inf. Retrieval, 2012, pp. 791–800.

[41]

L. He and X. Sun, “ Automatic maintenance of the category hierarchy,” in Proc. Int. Conf. Semantics Knowl. Grids, 2013, pp. 218–221.

[42]

B. Li, J. Liu, C.-Y. Lin, I. King, and M. R. Lyu, “A hierarchical entity-based approach to structuralize user generated content in social media: A case of Yahoo! answers,” in Proc. Empirical Methods Natural Language Process., 2013, pp. 1521–1532.

[43]

G. Salton and M. J. McGill, Introduction to Modern Information Retrieval. New York, NY, USA: McGraw-Hill, 1996.

[44]

K. V. Mardia and P. E. Jupp, Directional Statistics. Hoboken, NJ, USA: Wiley, 2000.

[45]

J. Yuan, Y. Zheng, and X. Xie, “Discovering regions of different functions in a city using human mobility and POIs,” in Proc. ACM Int. Conf. Knowl. Discovery Data Mining, 2012, pp. 186–194.

[46]

P. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” J. Comput. Appl. Math., vol. 20, no. 1, pp. 53–65, 1987.

Digital Library

[47]

D. Arthur and S. Vassilvitskii, “k-means++: The advantages of careful seeding,” in Proc. ACM-SIAM Symp. Discr. Algorithms, 2007, pp. 1027–1035.

[48]

D. Ramage and E. Rosen, “Stanford topic modeling toolbox,” 2009. [Online]. Available: http://nlp.stanford.edu/software/tmt/0.4/

[49]

C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Trans. Intell. Syst. Technol., vol. 2, pp. 27:1–27:27, 2011, Software Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm

[50]

S. Seneviratne, A. Seneviratne, M. A. Kaafar, A. Mahanti, and P. Mohapatra, “Early detection of spam mobile apps,” in Proc. Int. Conf. World Wide Web Conf., 2015, pp. 949– 959.

[51]

Google Inc., “FAQ for upcoming change to game categories,” 2014. [Online]. Available: https://support.google.com

[52]

Statista, Inc., “Most popular Google Play app store categories in 4th quarter 2012, by share of listed apps,” 2012. [Online]. Available: http://www.statista.com

[53]

Statista, Inc., “Most popular Apple app atore categories in March 2014, by share of available apps,” 2014. [Online]. Available: http://www.statista.com

[54]

M. H. DeGroot, Probability and Statistics, 2nd ed. Reading, MA, USA: Addison-Wesley, 1986.

[55]

Z. Xu, Y. Ke, Y. Wang, H. Cheng, and J. Cheng, “A model-based approach to attributed graph clustering,” in Proc. ACM Special Interest Group Manage. Data, 2012, pp. 505 –516.

[56]

C. Elkan, “Clustering documents with an exponential-family approximation of the dirichlet compound multinomial distribution,” in Proc. Int. Conf. Mach. Learning , 2006, pp. 289–296.

Cited By

Alecci MSamhi JBissyande TKlein JRoychoudhury APaiva AAbreu RStorey M(2024)Revisiting Android App CategorizationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639094(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639094
Alecci MSamhi JLi LBissyandé TKlein J(2024)Improving Logic Bomb Identification in Android Apps via Context-Aware Anomaly DetectionIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.335897921:5(4735-4753)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1109/TDSC.2024.3358979
Khalili FMohebbi ATerragni VPezzè MMariani LHeydarnoori ARastogi ATufano RBavota GArnaoudova VHaiduc S(2022)The ineffectiveness of domain-specific word embedding models for GUI test reuseProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527873(560-564)Online publication date: 16-May-2022
https://dl.acm.org/doi/10.1145/3524610.3527873
Show More Cited By

Index Terms

App Miscategorization Detection: A Case Study on Google Play
1. Computing methodologies

Index terms have been assigned to the content through auto-classification.

Recommendations

Beyond Google Play: A Large-Scale Comparative Study of Chinese Android App Markets
IMC '18: Proceedings of the Internet Measurement Conference 2018

China is one of the largest Android markets in the world. As Chinese users cannot access Google Play to buy and install Android apps, a number of independent app stores have emerged and compete in the Chinese app market. Some of the Chinese app stores ...
AppNet: understanding app recommendation in Google Play
WAMA 2019: Proceedings of the 3rd ACM SIGSOFT International Workshop on App Market Analytics

With the prevalence of smartphones, mobile apps have seen widespread adoption. Millions of apps in markets have made it difficult for users to find the most interesting and relevant apps. App markets such as Google Play have deployed app recommendation ...
Understanding Incentivized Mobile App Installs on Google Play Store
IMC '20: Proceedings of the ACM Internet Measurement Conference

"Incentivized" advertising platforms allow mobile app developers to acquire new users by directly paying users to install and engage with mobile apps (e.g., create an account, make in-app purchases). Incentivized installs are banned by the Apple App ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering

IEEE Transactions on Knowledge and Data Engineering Volume 29, Issue 8

Aug. 2017

202 pages

ISSN:1041-4347

Issue’s Table of Contents

1041-4347 © 2016 IEEE.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 August 2017

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Alecci MSamhi JBissyande TKlein JRoychoudhury APaiva AAbreu RStorey M(2024)Revisiting Android App CategorizationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639094(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639094
Alecci MSamhi JLi LBissyandé TKlein J(2024)Improving Logic Bomb Identification in Android Apps via Context-Aware Anomaly DetectionIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.335897921:5(4735-4753)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1109/TDSC.2024.3358979
Khalili FMohebbi ATerragni VPezzè MMariani LHeydarnoori ARastogi ATufano RBavota GArnaoudova VHaiduc S(2022)The ineffectiveness of domain-specific word embedding models for GUI test reuseProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527873(560-564)Online publication date: 16-May-2022
https://dl.acm.org/doi/10.1145/3524610.3527873
Qi LLin WZhang XDou WXu XChen J(2022)A Correlation Graph Based Approach for Personalized and Compatible Web APIs Recommendation in Mobile APP DevelopmentIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.316861135:6(5444-5457)Online publication date: 26-Apr-2022
https://dl.acm.org/doi/10.1109/TKDE.2022.3168611
Fuad AAl-Yahya M(2021)Analysis and Classification of Mobile Apps Using Topic ModelingComplexity10.1155/2021/66774132021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/6677413
Mehrnezhad MAlmeida TKitamura YQuigley AIsbister KIgarashi TBjørn PDrucker S(2021)Caring for Intimate Data in Fertility TechnologiesProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445132(1-11)Online publication date: 6-May-2021
https://dl.acm.org/doi/10.1145/3411764.3445132
Peltonen ELagerspetz EHamberg JMehrotra AMusolesi MNurmi PTarkoma SBailie LOliver N(2018)The hidden image of mobile appsProceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services10.1145/3229434.3229474(1-12)Online publication date: 3-Sep-2018
https://dl.acm.org/doi/10.1145/3229434.3229474

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents