Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

App Miscategorization Detection: A Case Study on Google Play

Published: 01 August 2017 Publication History

Abstract

An ongoing challenge in the rapidly evolving app market ecosystem is to maintain the integrity of app categories. At the time of registration, app developers have to select, what they believe, is the most appropriate category for their apps. Besides the inherent ambiguity of selecting the right category, the approach leaves open the possibility of misuse and potential gaming by the registrant. Periodically, the app store will refine the list of categories available and potentially reassign the apps. However, it has been observed that the mismatch between the description of the app and the category it belongs to, continues to persist. Although some common mechanisms (e.g., a complaint-driven or manual checking) exist, they limit the response time to detect miscategorized apps and still open the challenge on categorization. We introduce <sc>FRAC+</sc>: (FR)amework for (A)pp (C)ategorization. <sc>FRAC+</sc> has the following salient features: (i) it is based on a data-driven topic model and automatically suggests the categories appropriate for the app store, and (ii) it can detect miscategorizated apps. Extensive experiments attest to the performance of <sc>FRAC+</sc>. Experiments on <sc>Google</sc> Play shows that <sc>FRAC+</sc>&#x2019;s topics are more aligned with <sc>Google</sc>&#x2019;s new categories and 0.35-1.10 percent game apps are detected to be miscategorized.

References

[1]
C. Warren, “Google play hits 1 million apps,” 2013. [Online]. Available: http://mashable.com
[2]
S. Fiegerman, “Apple’s app store tops 1 million apps,” 2013. [Online]. Available: http://mashable.com
[3]
Unity3d, “5000+ apps banned by Apple,” 2010. [Online]. Available: http://forum.unity3d.com
[4]
G. Machuret, ASO Ninja. Dallas, TX, USA: Primedia E-Launch LLC, 2013.
[5]
Barnes & Noble, “Nook app submission guide,” 2013. [Online]. Available: https://nookdeveloper.zendesk.com
[6]
D. Sainati, “Evaluation and distribution of mHealth apps,” (2014). [Online]. Available: https://gem-ted.sciencesconf.org/conference/gem-ted/Medappcare.pdf
[7]
Estoty Entertainment, “2048 number puzzle game,” 2014. [Online]. Available: https://play.google.com/store/apps/details?id=com.estoty.game2048
[9]
S. Perez, “Developer spams Google Play with ripoffs of well-known apps...again,” 2014. [Online]. Available: https://techcrunch.com/2014/01/02/developer-spams-google-play-with-ripo%ffs-of-well-known-apps-again/
[10]
S. Perez, “Nearly 60k low-quality apps booted from Google Play store in february, points to increased spam-fighting,” 2013. [Online]. Available: https://techcrunch.com/2013/04/08/nearly-60k-low-quality-apps-booted- fr%om-google-play-store-in-february-points- to-increased-spam-fighting/
[11]
S. Vakulenko, O. Müller, and J. V. Brocke, “ Enriching iTunes app store categories via topic modeling,” in Proc. 35th Int. Conf. Inf. Syst. (E-Business), 2014, pp. 1–11 .
[12]
A. Al-Subaihin, et al., “Clustering mobile apps based on mined textual features,” in Proc. 10th ACM/IEEE Int. Symp. Empirical Softw. Eng. Meas., 2016, Art. no. 38.
[13]
G. Berardi, A. Esuli, T. Fagni, and F. Sebastiani, “Multi-store metadata-based supervised mobile app classification,” in Proc. 30th Annu. ACM Symp. Appl. Comput., 2015, pp. 585–588.
[14]
H. Zhu, H. Cao, E. Chen, H. Xiong, and J. Tian, “Exploiting enriched contextual information for mobile app classification,” in Proc. 21st ACM Int. Conf. Inf. Knowl. Manage., 2012, pp. 1617–1621.
[15]
H. Zhu, E. Chen, H. Xiong, H. Cao, and J. Tian, “Mobile app classification with enriched contextual information,” IEEE Trans. Mobile Comput., vol. 13, no. 7, pp. 1550–1563, Jul. 2014.
[16]
D. L. B. Lulu and T. Kuflik, “Wise mobile icons organization: Apps taxonomy classification using functionality mining to ease apps finding,” Mobile Inf. Syst., vol. 2016, 2016, Art. no. 3083450.
[17]
A. Gorla, I. Tavecchia, F. Gross, and A. Zeller, “Checking app behavior against app descriptions,” in Proc. Int. Conf. Softw. Eng., 2014, pp. 1025–1035.
[18]
X. Wei, L. Gomez, I. Neamtiu, and M. Faloutsos, “Profiledroid: Multi-layer profiling of Android applications,” in Proc. Int. Conf. Mobile Comput. Netw., 2012, pp. 137–148.
[19]
A. Shabtai, U. Kanonov, Y. Elovici, C. Glezer, and Y. Weiss, ““Andromaly”: A behavioral malware detection framework for Android devices,” J. Intell. Inf. Syst., vol. 38, no. 1, pp. 161–190, 2012.
[20]
D. J. Wu, C. H. Mao, T. E. Wei, H. M. Lee, and K. P. Wu, “DroidMat: Android malware detection through manifest and API calls tracing,” in Proc. Asia Joint Conf. Inf. Secur., 2012, pp. 62–69.
[21]
B. Sanz, I. Santos, C. Laorden, X. Ugarte-Pedrero, and P. G. Bringas, “On the automatic categorisation of android applications,” in Proc. IEEE Consum. Commun. Netw. Conf., 2012, pp. 149–153.
[22]
S. Ma, S. Wang, D. Lo, R. H. Deng, and C. Sun, “Active semi-supervised approach for checking app behavior against its description,” in Proc. IEEE 39th Annu. Int. Comput. Softw. Appl. , 2015, pp. 179–184.
[23]
A. Shabtai, Y. Fledel, and Y. Elovici, “Automated static code analysis for classifying Android applications using machine learning,” in Proc. Comput. Intell. Soc., 2010, pp. 329–333.
[24]
H. Zhu, H. Xiong, Y. Ge, and E. Chen, “Ranking fraud detection for mobile apps: A holistic view,” in Proc. Conf. Inf. Knowl. Manage., 2013, pp. 619–628.
[25]
R. Chandy and H. Gu, “Identifying spam in the iOS app store,” in Proc. Joint WICOW/AIRWeb Workshop Web Qual., 2012, pp. 56–59.
[26]
B. Fu, J. Lin, L. Li, C. Faloutsos, J. Hong, and N. Sadeh, “ Why people hate your app: Making sense of user feedback in a mobile app store,” in Proc. Int. Conf. Knowl. Discovery Data Mining, 2013, pp. 1276– 1284.
[27]
M. Liu, C. Wu, X.-N. Zhao, C.-Y. Lin, and X.-L. Wang, “App relationship calculation: An iterative process,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 8, pp. 2049–2063, Aug. 2015.
[28]
Y. Zhou, H. Yu, and X. Cai, “A novel k-means algorithm for clustering and outlier detection,” in Proc. IEEE Int. Conf. Future Inf. Technol. Manage. Eng., 2009, pp. 476– 480.
[29]
K.-A. Yoon, O.-S. Kwon, and D.-H. Bae, “An approach to outlier detection of software measurement data using the k-means clustering method,” in Proc. Int. Symp. Empirical Softw. Eng. Meas., 2007, pp. 443 –445.
[30]
B. Schölkopf, R. C. Williamson, A. J. Smola, J. Shawe-Taylor, and J. C. Platt, “Support vector method for novelty detection,” in Proc. Advances Neural Inf. Process. Syst., 2000, pp. 582–588.
[31]
L. M. Manevitz and M. Yousef, “One-class SVMs for document classification,” J. Mach. Learning Res., vol. 2, pp. 139–154, 2001.
[32]
V. Barnett and T. Lewis, Outliers in Statistical Data. Hoboken, NJ, USA: Wiley, 1978.
[33]
D. M. Blei, A. Y. Ng, M. I. Jordan, and J. Lafferty, “Latent dirichlet allocation,” J. Mach. Learning Res., vol. 3, pp. 993 –1022, 2003.
[34]
A. Ahmed, Y. Low, M. Aly, V. Josifovski, and A. J. Smola, “Scalable distributed inference of dynamic user interests for behavioral targeting,” in Proc. Int. Conf. Knowl. Discovery Data Mining, 2011, pp. 114–122.
[35]
D. Ramage, D. Hall, R. Nallapati, and C. D. Manning, “Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora,” in Proc. Conf. Empirical Methods Natural Language Process., 2009, pp. 248–256.
[36]
A. Banerjee, I. Dhillon, J. Ghosh, and S. Sra,“Clustering on the unit hypersphere using von mises-fisher distributions,” J. Mach. Learning Res., vol. 6, pp. 1345–1382, 2005.
[37]
D. Surian and S. Chawla, “Mining outlier participants: Insights using directional distributions in latent models,” in Proc. Eur. Conf. Mach. Learning Principles Practice Knowl. Discovery Databases , 2013, pp. 337–352.
[38]
C. C. Aggarwal, S. C. Gates, and P. S. Yu, “On the merits of building categorization systems by supervised clustering,” in Proc. Int. Conf. Knowl. Discovery Data Mining, 1999, pp. 352–356.
[39]
X. Cao, G. Cong, B. Cui, C. S. Jensen, and Q. Yuan, “Approaches to exploring category information for question retrieval in community question-answer archives,” ACM Trans. Inf. Syst., vol. 30, pp. 7.1–7.38, 2012.
[40]
Q. Yuan, G. Cong, A. Sun, C.-Y. Lin, and N. M. Thalmann, “Category hierarchy maintenance: A data-driven approach,” in Proc. ACM SIGIR Int. Conf. Res. Develop. Inf. Retrieval, 2012, pp. 791–800.
[41]
L. He and X. Sun, “ Automatic maintenance of the category hierarchy,” in Proc. Int. Conf. Semantics Knowl. Grids, 2013, pp. 218–221.
[42]
B. Li, J. Liu, C.-Y. Lin, I. King, and M. R. Lyu, “A hierarchical entity-based approach to structuralize user generated content in social media: A case of Yahoo! answers,” in Proc. Empirical Methods Natural Language Process., 2013, pp. 1521–1532.
[43]
G. Salton and M. J. McGill, Introduction to Modern Information Retrieval. New York, NY, USA: McGraw-Hill, 1996.
[44]
K. V. Mardia and P. E. Jupp, Directional Statistics. Hoboken, NJ, USA: Wiley, 2000.
[45]
J. Yuan, Y. Zheng, and X. Xie, “Discovering regions of different functions in a city using human mobility and POIs,” in Proc. ACM Int. Conf. Knowl. Discovery Data Mining, 2012, pp. 186–194.
[46]
P. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” J. Comput. Appl. Math., vol. 20, no. 1, pp. 53–65, 1987.
[47]
D. Arthur and S. Vassilvitskii, “k-means++: The advantages of careful seeding,” in Proc. ACM-SIAM Symp. Discr. Algorithms, 2007, pp. 1027–1035.
[48]
D. Ramage and E. Rosen, “Stanford topic modeling toolbox,” 2009. [Online]. Available: http://nlp.stanford.edu/software/tmt/0.4/
[49]
C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Trans. Intell. Syst. Technol., vol. 2, pp. 27:1–27:27, 2011, Software Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm
[50]
S. Seneviratne, A. Seneviratne, M. A. Kaafar, A. Mahanti, and P. Mohapatra, “Early detection of spam mobile apps,” in Proc. Int. Conf. World Wide Web Conf., 2015, pp. 949– 959.
[51]
Google Inc., “FAQ for upcoming change to game categories,” 2014. [Online]. Available: https://support.google.com
[52]
Statista, Inc., “Most popular Google Play app store categories in 4th quarter 2012, by share of listed apps,” 2012. [Online]. Available: http://www.statista.com
[53]
Statista, Inc., “Most popular Apple app atore categories in March 2014, by share of available apps,” 2014. [Online]. Available: http://www.statista.com
[54]
M. H. DeGroot, Probability and Statistics, 2nd ed. Reading, MA, USA: Addison-Wesley, 1986.
[55]
Z. Xu, Y. Ke, Y. Wang, H. Cheng, and J. Cheng, “A model-based approach to attributed graph clustering,” in Proc. ACM Special Interest Group Manage. Data, 2012, pp. 505 –516.
[56]
C. Elkan, “Clustering documents with an exponential-family approximation of the dirichlet compound multinomial distribution,” in Proc. Int. Conf. Mach. Learning , 2006, pp. 289–296.

Cited By

View all
  • (2024)Revisiting Android App CategorizationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639094(1-12)Online publication date: 20-May-2024
  • (2024)Improving Logic Bomb Identification in Android Apps via Context-Aware Anomaly DetectionIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.335897921:5(4735-4753)Online publication date: 1-Sep-2024
  • (2022)The ineffectiveness of domain-specific word embedding models for GUI test reuseProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527873(560-564)Online publication date: 16-May-2022
  • Show More Cited By

Index Terms

  1. App Miscategorization Detection: A Case Study on Google Play
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Knowledge and Data Engineering
    IEEE Transactions on Knowledge and Data Engineering  Volume 29, Issue 8
    Aug. 2017
    202 pages

    Publisher

    IEEE Educational Activities Department

    United States

    Publication History

    Published: 01 August 2017

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Revisiting Android App CategorizationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639094(1-12)Online publication date: 20-May-2024
    • (2024)Improving Logic Bomb Identification in Android Apps via Context-Aware Anomaly DetectionIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.335897921:5(4735-4753)Online publication date: 1-Sep-2024
    • (2022)The ineffectiveness of domain-specific word embedding models for GUI test reuseProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527873(560-564)Online publication date: 16-May-2022
    • (2022)A Correlation Graph Based Approach for Personalized and Compatible Web APIs Recommendation in Mobile APP DevelopmentIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.316861135:6(5444-5457)Online publication date: 26-Apr-2022
    • (2021)Analysis and Classification of Mobile Apps Using Topic ModelingComplexity10.1155/2021/66774132021Online publication date: 1-Jan-2021
    • (2021)Caring for Intimate Data in Fertility TechnologiesProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445132(1-11)Online publication date: 6-May-2021
    • (2018)The hidden image of mobile appsProceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services10.1145/3229434.3229474(1-12)Online publication date: 3-Sep-2018

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media