Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3691620.3695546acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article
Free access

A Longitudinal Analysis Of Replicas in the Wild Wild Android

Published: 27 October 2024 Publication History

Abstract

In this work, we report and study a phenomenon that contributes to Android API sprawls. We observe that OEM developers introduce private APIs that are composed by copy-paste-editing full or partial code from AOSP and other OEM APIs - we call such APIs, Replicas.
To quantify the prevalence of Replicas in the wild fragmented Android ecosystem, we perform the first large-scale (security) measurement study, aiming at detecting and evaluating Replicas across 342 ROMs, manufactured by 10 vendors and spanning 7 versions. Our study is motivated by the intuition that Replicas contribute to the production of bloated custom Android codebases, add to the complexity of the Android access control mechanism and updates process, and hence may lead to access control vulnerabilities.
Our study is facilitated by RepFinder, a tool that infers the core functionality of an API and detects syntactically and semantically similar APIs using static program paths. RepFinder reveals that Replicas are commonly introduced by OEMs and more importantly, they unnecessarily introduce security enforcement anomalies. Specifically, RepFinder reports an average of 141 Replicas per the studied ROMs, accounting for 9% to 17% of custom APIs - where 37% (on average) are identified as under-protected. Our study thus points to the urgent need to debloat Replicas.

References

[1]
2019. Original Samsung Firmware Updates. https://samfrew.com/
[2]
2024. Android Dumps. https://dumps.tadiphone.dev/dumps/.
[3]
2024. Android Images. https://developers.google.com/android/ota.
[4]
2024. ImgTool. https://newandroidbook.com/tools/imjtool.html.
[5]
2024. Lpunpack and Lpmake. https://github.com/LonelyFool/lpunpack_and_lpmake
[6]
2024. OpenAI Codex. https://openai.com/blog/openai-codex
[7]
2024. simg2img. https://formulae.brew.sh/formula/simg2img.
[8]
2024. StatCounter Global Stats 2024. https://gs.statcounter.com/vendor-market-share/mobile/worldwide/#monthly-201003-202007
[9]
Yousra Aafer, Jianjun Huang, Yi Sun, Xiangyu Zhang, Ninghui Li, and Chen Tian. 2018. AceDroid: Normalizing Diverse Android Access Control Checks for Inconsistency Detection. In 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18--21, 2018. The Internet Society. http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2018/02/ndss2018_08-1_Aafer_paper.pdf
[10]
Yousra Aafer, Guanhong Tao, Jianjun Huang, Xiangyu Zhang, and Ninghui Li. 2018. Precise android api protection mapping derivation and reasoning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 1151--1164.
[11]
Yousra Aafer, Xiao Zhang, and Wenliang Du. 2016. Harvesting Inconsistent Security Configurations in Custom Android ROMs via Differential Analysis. In 25th USENIX Security Symposium (USENIX Security 16). USENIX Association, Austin, TX, 1153--1168. https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/aafer
[12]
Shahid Alam, Ryan Riley, Ibrahim Sogukpinar, and Necmeddin Carkaci. 2016. Droidclone: Detecting android malware variants by exposing code clones. In 2016 Sixth International Conference on Digital Information and Communication Technology and its Applications (DICTAP). IEEE, 79--84.
[13]
Anestisb. 2024. vdexExtractor: Tool to decompile extract android dex bytecode from Vdex files. https://github.com/anestisb/vdexExtractor
[14]
Kathy Wain Yee Au, Yi Fan Zhou, Zhen Huang, and David Lie. 2012. PScout: Analyzing the Android Permission Specification. In Proceedings of the 2012 ACM Conference on Computer and Communications Security (Raleigh, North Carolina, USA) (CCS '12). Association for Computing Machinery, New York, NY, USA, 217--228.
[15]
Michael Backes, Sven Bugiel, and Erik Derr. 2016. Reliable Third-Party Library Detection in Android and its Security Applications. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (Vienna, Austria) (CCS '16). Association for Computing Machinery, New York, NY, USA, 356--367.
[16]
Michael Backes, Sven Bugiel, Erik Derr, Patrick McDaniel, Damien Octeau, and Sebastian Weisgerber. 2016. On Demystifying the Android Application Framework: Re-Visiting Android Permission Specification Analysis. In Proceedings of the 25th USENIX Conference on Security Symposium (Austin, TX, USA) (SEC'16). USENIX Association, USA, 1101--1118.
[17]
Stefan Bellon, Rainer Koschke, Giuliano Antoniol, Jens Krinke, and Ettore Merlo. 2007. Comparison and Evaluation of Clone Detection Tools. IEEE Transactions on Software Engineering 33 (2007). https://api.semanticscholar.org/CorpusID:14267328
[18]
Jonathan Crussell, Clint Gibler, and Hao Chen. 2012. Attack of the clones: Detecting cloned applications on android markets. In Computer Security-ESORICS 2012: 17th European Symposium on Research in Computer Security, Pisa, Italy, September 10--12, 2012. Proceedings 17. Springer, 37--54.
[19]
Anh T. V. Dau, Jin L. C. Guo, and Nghi D. Q. Bui. 2024. DocChecker: Bootstrapping Code Large Language Model for Detecting and Resolving Code-Comment Inconsistencies. arXiv:2306.06347 [cs.SE]
[20]
Abdallah Dawoud and Sven Bugiel. 2021. Bringing Balance to the Force: Dynamic Analysis of the Android Application Framework.
[21]
Yangruibo Ding, Benjamin Steenhoek, Kexin Pei, Gail Kaiser, Wei Le, and Baishakhi Ray. 2024. TRACED: Execution-aware Pre-training for Source Code. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (<conf-loc>, <city>Lisbon</city>, <country>Portugal</country>, </conf-loc>) (ICSE '24). Association for Computing Machinery, New York, NY, USA, Article 36, 12 pages.
[22]
Firmware Drive. 2021. https://firmwaredrive.com/index.php
[23]
Zeinab El-Rewini and Yousra Aafer. 2021. Dissecting Residual APIs in Custom Android ROMs. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (Virtual Event, Republic of Korea) (CCS '21). Association for Computing Machinery, New York, NY, USA, 1598--1611.
[24]
Zeinab El-Rewini, Zhuo Zhang, and Yousra Aafer. 2022. Poirot: Probabilistically recommending protections for the android framework. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 937--950.
[25]
Ming Fan, Jun Liu, Wei Wang, Haifei Li, Zhenzhou Tian, and Ting Liu. 2017. DAPASA: Detecting Android Piggybacked Apps Through Sensitive Subgraph Analysis. Trans. Info. For. Sec. 12, 8 (aug 2017), 1772--1785.
[26]
Adrienne Porter Felt, Erika Chin, Steve Hanna, Dawn Song, and David Wagner. 2011. Android Permissions Demystified. In Proceedings of the 18th ACM Conference on Computer and Communications Security (Chicago, Illinois, USA) (CCS '11). Association for Computing Machinery, New York, NY, USA, 627--638.
[27]
Felix Fischer, Konstantin Böttinger, Huang Xiao, Christian Stransky, Yasemin Acar, Michael Backes, and Sascha Fahl. 2017. StackOverflow Considered Harmful? The Impact of CopyPaste on Android Application Security. In 2017 IEEE Symposium on Security and Privacy (SP). 121--136.
[28]
Yaroslav Golubev, Viktor Poletansky, Nikita Povarov, and Timofey Bryksin. 2021. Multi-threshold token-based code clone detection. 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) (2021), 496--500. https://api.semanticscholar.org/CorpusID:234479319
[29]
Google. 2022. Google Developers Platform Architecture. https://developer.android.com/guide/platform
[30]
Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. Unixcoder: Unified cross-modal pre-training for code representation. arXiv preprint arXiv:2203.03850 (2022).
[31]
Yutao Hu, Deqing Zou, Junru Peng, Yueming Wu, Junjie Shan, and Hai Jin. 2022. TreeCen: Building Tree Graph for Scalable Semantic Code Clone Detection. Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (2022). https://api.semanticscholar.org/CorpusID:255441444
[32]
Yu-Liang Hung and Shingo Takada. 2020. CPPCD: A Token-Based Approach to Detecting Potential Clones. 2020 IEEE 14th International Workshop on Software Clones (IWSC) (2020), 26--32. https://api.semanticscholar.org/CorpusID:214690959
[33]
IBM. [n. d.]. WALA: T.J. Watson Libraries for Analysis. https://wala.sourceforge.net
[34]
iBotPeaches. 2024. Apktool: A tool for reverse engineering android APK files. https://github.com/iBotPeaches/Apktool
[35]
Sigmund Albert Gorski III, Benjamin Andow, Adwait Nadkarni, Sunil Manandhar, William Enck, Eric Bodden, and Alexandre Bartel. 2019. ACMiner: Extraction and Analysis of Authorization Checks in Android's Middleware. CoRR abs/1901.03603 (2019). arXiv:1901.03603 http://arxiv.org/abs/1901.03603
[36]
Sigmund Albert Gorski III, Seaver Thorn, William Enck, and Haining Chen. 2022. FReD: Identifying File Re-Delegation in Android System Services. In 31st USENIX Security Symposium (USENIX Security 22). USENIX Association, Boston, MA, 1525--1542. https://www.usenix.org/conference/usenixsecurity22/presentation/gorski
[37]
JesusFreke. 2024. Smali: Smali/Baksmali. https://github.com/JesusFreke/smali
[38]
T. Kamiya, S. Kusumoto, and K. Inoue. 2002. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28, 7 (2002), 654--670.
[39]
Liuqing Li, He Feng, Wenjie Zhuang, Na Meng, and Barbara G. Ryder. 2017. CCLearner: A Deep Learning-Based Clone Detection Approach. 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2017), 249--260. https://api.semanticscholar.org/CorpusID:1474148
[40]
Xiaorui Pan, Xueqiang Wang, Yue Duan, XiaoFeng Wang, and Heng Yin. 2017. Dark Hazard: Learning-based, Large-scale Discovery of Hidden Sensitive Operations in Android Apps. (02 2017).
[41]
Andrea Possemato, Simone Aonzo, Davide Balzarotti, and Yanick Fratantonio. 2021. Trust, but verify: A longitudinal analysis of Android OEM compliance and customization. In S&P 2021, 42nd IEEE Symposium on Security and Privacy, 23--27 May 2021 (Virtual Conference), IEEE (Ed.). © 2021 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
[42]
Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: scaling code clone detection to big-code. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM.
[43]
SamMobile. 2019. Download firmware updates for your Samsung Mobile Phone and tablet. https://www.sammobile.com/firmwares/
[44]
Yuru Shao, Qi Alfred Chen, Z. Morley Mao, Jason Ott, and Zhiyun Qian. 2016. Kratos: Discovering Inconsistent Security Policy Enforcement in the Android Framework. In Network and Distributed System Security Symposium.
[45]
Skylot. 2024. jadx: Dex to java decompiler. https://github.com/skylot/jadx
[46]
steadfasterX. 2024. SteadfasterX/salt: Salt - [s]teadfasterx [a]ll-in-one [l]g [t]ool. https://github.com/steadfasterX/SALT
[47]
Xiaoyu Sun, Xiao Chen, Li Li, Haipeng Cai, John Grundy, Jordan Samhi, Tegawendé Bissyandé, and Jacques Klein. 2023. Demystifying Hidden Sensitive Operations in Android Apps. ACM Trans. Softw. Eng. Methodol. 32, 2, Article 50 (mar 2023), 30 pages.
[48]
Dave (Jing) Tian, Grant Hernandez, Joseph I. Choi, Vanessa Frost, Christie Ruales, Patrick Traynor, Hayawardh Vijayakumar, Lee Harrison, Amir Rahmati, Michael Grace, and Kevin R. B. Butler. 2018. ATtention Spanned: Comprehensive Vulnerability Analysis of AT Commands within the Android Ecosystem. In 27th USENIX Security Symposium (USENIX Security 18). USENIX Association, Baltimore, MD. https://www.usenix.org/conference/usenixsecurity18/presentation/tian
[49]
Parjanya Vyas, Asim Waheed, Yousra Aafer, and N Asokan. 2023. Auditing Framework {APIs} via Inferred App-side Security Specifications. In 32nd USENIX Security Symposium (USENIX Security 23). 6061--6077.
[50]
Deze Wang, Boxing Chen, Shanshan Li, Wei Luo, Shaoliang Peng, Wei Dong, and Xiangke Liao. 2023. One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 5--16.
[51]
Min Wang, Pengcheng Wang, and Yun Xu. 2017. CCSharp: An Efficient Three-Phase Code Clone Detector Using Modified PDGs. 2017 24th Asia-Pacific Software Engineering Conference (APSEC) (2017), 100--109. https://api.semanticscholar.org/CorpusID:3679613
[52]
Pengcheng Wang, Jeffrey Svajlenko, Yanzhao Wu, Yun Xu, and Chanchal K Roy. 2018. CCAligner: a token based large-gap clone detector. In Proceedings of the 40th International Conference on Software Engineering. 1066--1077.
[53]
Pengcheng Wang, Jeffrey Svajlenko, Yanzhao Wu, Yun Xu, and Chanchal K Roy. 2018. CCAligner: a token based large-gap clone detector. In Proceedings of the 40th International Conference on Software Engineering. 1066--1077.
[54]
Huihui Wei and Ming Li. 2017. Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code. In International Joint Conference on Artificial Intelligence. https://api.semanticscholar.org/CorpusID:23029303
[55]
Lei Wu, Michael Grace, Yajin Zhou, Chiachih Wu, and Xuxian Jiang. 2013. The impact of vendor customizations on android security. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security (Berlin, Germany) (CCS '13). Association for Computing Machinery, New York, NY, USA, 623--634.
[56]
Yueming Wu, Siyue Feng, Deqing Zou, and Hai Jin. 2022. Detecting Semantic Code Clones by Building AST-based Markov Chains Model. Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (2022). https://api.semanticscholar.org/CorpusID:255441365
[57]
Yafei Wu, Cong Sun, Dongrui Zeng, Gang Tan, Siqi Ma, and Peicheng Wang. 2023. LibScan: Towards More Precise Third-Party Library Identification for Android Applications. In 32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 3385--3402. https://www.usenix.org/conference/usenixsecurity23/presentation/wu-yafei
[58]
Dongsong Yu, Guangliang Yang, Guozhu Meng, Xiaorui Gong, Xiu Zhang, Xiaobo Xiang, Xiaoyu Wang, Yue Jiang, Kai Chen, Wei Zou, Wenke Lee, and Wenchang Shi. 2021. SEPAL: Towards a Large-scale Analysis of SEAndroid Policy Customization. CoRR abs/2102.09764 (2021). arXiv:2102.09764 https://arxiv.org/abs/2102.09764
[59]
Xian Zhan, Lingling Fan, Sen Chen, Feng Wu, Tianming Liu, Xiapu Luo, and Yang Liu. 2021. ATVHunter: Reliable Version Detection of Third-Party Libraries for Vulnerability Identification in Android Applications. arXiv:2102.08172 [cs.SE]
[60]
Jiexin Zhang, Alastair R. Beresford, and Stephan A. Kollmann. 2019. LibID: reliable identification of obfuscated third-party Android libraries. Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (2019). https://api.semanticscholar.org/CorpusID:195891340
[61]
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. 2019. A Novel Neural Source Code Representation Based on Abstract Syntax Tree. 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) (2019), 783--794. https://api.semanticscholar.org/CorpusID:174799700
[62]
Lei Zhang, Zhemin Yang, Yuyu He, Zhenyu Zhang, Zhiyun Qian, Geng Hong, Yuan Zhang, and Min Yang. 2018. Invetter: Locating Insecure Input Validations in Android Services. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (Toronto, Canada) (CCS '18). Association for Computing Machinery, New York, NY, USA, 1165--1178.
[63]
Yuan Zhang, Jiarun Dai, Xiaohan Zhang, Sirong Huang, Zhemin Yang, Min Yang, and Hao Chen. 2018. Detecting third-party libraries in Android applications with high precision and recall. 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER) (2018), 141--152. https://api.semanticscholar.org/CorpusID:4232741
[64]
Zheng Zhang, Hang Zhang, Zhiyun Qian, and Billy Lau. 2021. An investigation of the android kernel patch ecosystem. In 30th USENIX Security Symposium (USENIX Security 21). 3649--3666.
[65]
Gang Zhao and Jeff Huang. 2018. DeepSim: deep learning code functional similarity. Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2018). https://api.semanticscholar.org/CorpusID:53081316
[66]
Hao Zhou, Haoyu Wang, Xiapu Luo, Ting Chen, Yajin Zhou, and Ting Wang. 2022. Uncovering Cross-Context Inconsistent Access Control Enforcement in Android. In 29th Annual Network and Distributed System Security Symposium, NDSS 2022, San Diego, California, USA, April 24--28, 2022. The Internet Society. https://www.ndss-symposium.org/ndss-paper/auto-draft-190/
[67]
Wu Zhou, Yajin Zhou, Michael Grace, Xuxian Jiang, and Shihong Zou. 2013. Fast, scalable detection of "Piggybacked" mobile applications. In Proceedings of the Third ACM Conference on Data and Application Security and Privacy (San Antonio, Texas, USA) (CODASPY '13). Association for Computing Machinery, New York, NY, USA, 185--196.
[68]
Xiaoyong Zhou, Yeonjoon Lee, Nan Zhang, Muhammad Naveed, and XiaoFeng Wang. 2014. The Peril of Fragmentation: Security Hazards in Android Device Driver Customizations. In Proceedings of the 2014 IEEE Symposium on Security and Privacy (SP '14). IEEE Computer Society, USA, 409--423.
[69]
Yajin Zhou and Xuxian Jiang. 2012. Dissecting Android Malware: Characterization and Evolution. In 2012 IEEE Symposium on Security and Privacy. 95--109.
[70]
Yue Zou, Bihuan Ban, Yinxing Xue, and Yun Xu. 2020. CCGraph: a PDG-based code clone detector with approximate graph matching. 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2020), 931--942. https://api.semanticscholar.org/CorpusID:229703522

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering
October 2024
2587 pages
ISBN:9798400712487
DOI:10.1145/3691620
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2024

Check for updates

Author Tags

  1. Android API security
  2. code clone detection

Qualifiers

  • Research-article

Funding Sources

Conference

ASE '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 57
    Total Downloads
  • Downloads (Last 12 months)57
  • Downloads (Last 6 weeks)57
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media