Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3692070.3692591guideproceedingsArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

StackSight: Unveiling webassembly through large language models and neurosymbolic chain-of-thought decompilation

Published: 21 July 2024 Publication History

Abstract

WebAssembly enables near-native execution in web applications and is increasingly adopted for tasks that demand high performance and robust security. However, its assembly-like syntax, implicit stack machine, and low-level data types make it extremely difficult for human developers to understand, spurring the need for effective WebAssembly reverse engineering techniques. In this paper, we propose StackSight, a novel neurosymbolic approach that combines Large Language Models (LLMs) with advanced program analysis to decompile complex WebAssembly code into readable C++ snippets. StackSight visualizes and tracks virtual stack alterations via a static analysis algorithm and then applies chain-of-thought prompting to harness LLM's complex reasoning capabilities. Evaluation results show that StackSight significantly improves WebAssembly decompilation. Our user study also demonstrates that code snippets generated by StackSight have significantly higher win rates and enable a better grasp of code semantics.

References

[1]
Ahad, A., Jung, C., Askar, A., Kim, D., Kim, T., and Kwon, Y. Pyfet: Forensically equivalent transformation for python binary decompilation. In 2023 IEEE Symposium on Security and Privacy (SP), pp. 3296-3313, 2023.
[2]
Al-Kaswan, A., Ahmed, T., Izadi, M., Sawant, A. A., Devanbu, P., and van Deursen, A. Extending source code pretrained language models to summarise decompiled binaries. In 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 260-271, 2023.
[3]
Athiwaratkun, B., Gouda, S. K., Wang, Z., Li, X., Tian, Y., Tan, M., Ahmad, W. U., Wang, S., Sun, Q., Shang, M., Gonugondla, S. K., Ding, H., Kumar, V., Fulton, N., Farahani, A., Jain, S., Giaquinto, R., Qian, H., Ramanathan, M. K., Nallapati, R., Ray, B., Bhatia, P., Sengupta, S., Roth, D., and Xiang, B. Multi-lingual evaluation of code generation models. In The Eleventh International Conference on Learning Representations, 2023.
[4]
Austin, J., Odena, A., Nye, M., Bosma, M., Michalewski, H., Dohan, D., Jiang, E., Cai, C., Terry, M., Le, Q., and Sutton, C. Program synthesis with large language models, 2021.
[5]
Benali, A. An initial investigation of neural decompilation for webassembly, 2022.
[6]
Cao, Y., Liang, R., Chen, K., and Hu, P. Boosting neural networks to decompile optimized binaries. In Proceedings of the 38th Annual Computer Security Applications Conference, pp. 508-518, 2022.
[7]
Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H. P., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P., Such, F. P., Cummings, D., Plappert, M., Chantzis, F., Barnes, E., Herbert-Voss, A., Guss, W. H., Nichol, A., Paino, A., Tezak, N., Tang, J., Babuschkin, I., Balaji, S., Jain, S., Saunders, W., Hesse, C., Carr, A. N., Leike, J., Achiam, J., Misra, V., Morikawa, E., Radford, A., Knight, M., Brundage, M., Murati, M., Mayer, K., Welinder, P., McGrew, B., Amodei, D., McCandlish, S., Sutskever, I., and Zaremba, W. Evaluating large language models trained on code, 2021.
[8]
Cifuentes, C. and Gough, K. J. Decompilation of binary programs. Software: Practice and Experience, 25(7): 811-829, 1995.
[9]
Cifuentes, C. G. A structuring algorithm for decompilation. 1993. URL https://api.semanticscholar.org/CorpusID:17905992.
[10]
Dai, D., Sun, Y., Dong, L., Hao, Y., Ma, S., Sui, Z., and Wei, F. Why can gpt learn in-context? language models implicitly perform gradient descent as meta-optimizers, 2023.
[11]
Desnos, A. and Gueguen, G. Android: From reversing to decompilation. Proc. of Black Hat Abu Dhabi, 1:1-24, 2011.
[12]
ElWazeer, K., Anand, K., Kotha, A., Smithson, M., and Barua, R. Scalable variable and data type detection in a binary rewriter. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation, pp. 51-60, 2013.
[13]
Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S., and Zhang, J. M. Large language models for software engineering: Survey and open problems. arXiv preprint arXiv:2310.03533, 2023.
[14]
Fang, W. and Jiang, M. Investigating relationships between accuracy and diversity in multi-reference text generation. 2022.
[15]
Fokin, A., Derevenetc, E., Chernov, A., and Troshina, K. Smartdec: approaching c++ decompilation. In 2011 18th Working Conference on Reverse Engineering, pp. 347-356. IEEE, 2011.
[16]
Fu, C., Chen, H., Liu, H., Chen, X., Tian, Y., Koushanfar, F., and Zhao, J. Coda: An end-to-end neural program decompiler. Advances in Neural Information Processing Systems, 32, 2019.
[17]
Fu, Y., Peng, H., Ou, L., Sabharwal, A., and Khot, T. Specializing smaller language models towards multi-step reasoning. In Proceedings of the 40th International Conference on Machine Learning, ICML'23. JMLR.org, 2023.
[18]
Gurdeep Singh, R. and Scholliers, C. Warduino: a dynamic webassembly virtual machine for programming microcontrollers. In Proceedings of the 16th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes, pp. 27-36, 2019.
[19]
Gussoni, A., Di Federico, A., Fezzardi, P., and Agosta, G. A comb for decompiled c code. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, ASIA CCS '20, pp. 637-651, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450367509.
[20]
Haas, A., Rossberg, A., Schuff, D. L., Titzer, B. L., Holman, M., Gohman, D., Wagner, L., Zakai, A., and Bastien, J. Bringing the web up to speed with webassembly. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 185-200, 2017a.
[21]
Haas, A., Rossberg, A., Schuff, D. L., Titzer, B. L., Holman, M., Gohman, D., Wagner, L., Zakai, A., and Bastien, J. Bringing the web up to speed with webassembly. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, pp. 185-200, New York, NY, USA, 2017b. Association for Computing Machinery. ISBN 9781450349888.
[22]
Harrand, N., Soto-Valero, C., Monperrus, M., and Baudry, B. Java decompiler diversity and its application to meta-decompilation. Journal of Systems and Software, 168: 110645, 2020.
[23]
Hilbig, A., Lehmann, D., and Pradel, M. An empirical study of real-world webassembly binaries: Security, languages, use cases. In Proceedings of the web conference 2021, pp. 2696-2708, 2021.
[24]
Katz, D. S., Ruchti, J., and Schulte, E. Using recurrent neural networks for decompilation. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 346-356. IEEE, 2018.
[25]
Katz, O., Olshaker, Y., Goldberg, Y., and Yahav, E. Towards neural decompilation. arXiv preprint arXiv:1905.08325, 2019.
[26]
Kharraz, A., Ma, Z., Murley, P., Lever, C., Mason, J., Miller, A., Borisov, N., Antonakakis, M., and Bailey, M. Outguard: Detecting in-browser covert cryptocurrency mining in the wild. In The World Wide Web Conference, WWW '19, pp. 840-852, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450366748.
[27]
Konoth, R. K., Vineti, E., Moonsamy, V., Lindorfer, M., Kruegel, C., Bos, H., and Vigna, G. Minesweeper: An in-depth look into drive-by cryptocurrency mining and its defense. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS '18, pp. 1714-1730, New York, NY, USA, 2018. Association for Computing Machinery. ISBN 9781450356930.
[28]
Lee, J., Avgerinos, T., and Brumley, D. Tie: Principled reverse engineering of types in binary programs. 2011.
[29]
Lehmann, D. and Pradel, M. Finding the dwarf: recovering precise types from webassembly binaries. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, pp. 410-425, 2022.
[30]
Liang, R., Cao, Y., Hu, P., He, J., and Chen, K. Semantics-recovering decompilation through neural machine translation. arXiv preprint arXiv:2112.15491, 2021.
[31]
Liu, H. and Yao, A. C.-C. Augmenting math word problems via iterative question composing. arXiv preprint arXiv:2401.09003, 2024.
[32]
Liu, R., Garcia, L., and Srivastava, M. Aerogel: Lightweight access control framework for webassembly-based baremetal iot devices. In 2021 IEEE/ACM Symposium on Edge Computing (SEC), pp. 94-105. IEEE, 2021.
[33]
Liu, X., Song, Z., Fang, W., Yang, W., and Wang, W. Wefix: Intelligent automatic generation of explicit waits for efficient web end-to-end flaky tests. In Proceedings of the ACM on Web Conference 2024, pp. 3043-3052, 2024.
[34]
Liu, Z. and Wang, S. How far we have come: Testing decompilation correctness of c decompilers. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 475-487, 2020.
[35]
McCallum, T. Diving into ethereum's virtual machine(evm): the future of ewasm, 2019.
[36]
McConnell, J. Webassembly support now shipping in all major browsers, 2017. URL https://blog.mozilla.org/en/mozilla/webassembly-in-browsers/. The Mozilla Blog.
[37]
Musch, M., Wressnegger, C., Johns, M., and Rieck, K. New kid on the web: A study on the prevalence of webassembly in the wild. In Detection of Intrusions and Malware, and Vulnerability Assessment: 16th International Conference, DIMVA 2019, Gothenburg, Sweden, June 19-20, 2019, Proceedings 16, pp. 23-42. Springer, 2019a.
[38]
Musch, M., Wressnegger, C., Johns, M., and Rieck, K. Thieves in the browser: Web-based cryptojacking in the wild. In Proceedings of the 14th International Conference on Availability, Reliability and Security, ARES '19, New York, NY, USA, 2019b. Association for Computing Machinery. ISBN 9781450371643.
[39]
Noonan, M., Loginov, A., and Cok, D. Polymorphic type inference for machine code. SIGPLAN Not., 51(6):27-41, jun 2016. ISSN 0362-1340.
[40]
OpenAI. Gpt-3.5 turbo, 2024a. URL https://platform.openai.com/docs/models/gpt-3-5-turbo. Accessed on 2024-06-03. OpenAI. Gpt-4 turbo and gpt-4, 2024b. URL https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo. Accessed on 2024-06-03.
[41]
Pearce, H., Tan, B., Krishnamurthy, P., Khorrami, F., Karri, R., and Dolan-Gavitt, B. Pop quiz! can a large language model help with reverse engineering? arXiv preprint arXiv:2202.01142, 2022.
[42]
Pop, V. A. B., Niemi, A., Manea, V., Rusanen, A., and Ekberg, J.-E. Towards securely migrating webassembly enclaves. In Proceedings of the 15th European Workshop on Systems Security, pp. 43-49, 2022.
[43]
Reimers, N. and Gurevych, I. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019.
[44]
Romano, A. and Wang, W. Automated webassembly function purpose identification with semantics-aware analysis. In Proceedings of the ACM Web Conference 2023, pp. 2885-2894, 2023.
[45]
Romano, A., Zheng, Y., and Wang, W. Minerray: Semantics-aware analysis for ever-evolving cryptojacking detection. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1129-1140, 2020.
[46]
Rozière, B., Gehring, J., Gloeckle, F., Sootla, S., Gat, I., Tan, X. E., Adi, Y., Liu, J., Remez, T., Rapin, J., Kozhevnikov, A., Evtimov, I., Bitton, J., Bhatt, M., Ferrer, C. C., Grattafiori, A., Xiong, W., Défossez, A., Copet, J., Azhar, F., Touvron, H., Martin, L., Usunier, N., Scialom, T., and Synnaeve, G. Code llama: Open foundation models for code, 2023.
[47]
The Emscripten project. Emscripten, 2024. URL https://github.com/emscripten-core/emscripten. Accessed on 2024-06-03.
[48]
Wang, R., Shoshitaishvili, Y., Bianchi, A., Machiry, A., Grosen, J., Grosen, P., Kruegel, C., and Vigna, G. Ramblr: Making reassembly great again. In NDSS, 2017.
[49]
Wang, S., Wang, P., and Wu, D. Reassembleable disassembling. In Proceedings of the 24th USENIX Conference on Security Symposium, SEC'15, pp. 627-642, USA, 2015. USENIX Association. ISBN 9781931971232.
[50]
WebAssembly. The webassembly binary toolkit, 2024. URL https://github.com/WebAssembly/wabt. Accessed on 2024-06-03.
[51]
Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., Chi, E. H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., and Fedus, W. Emergent abilities of large language models. Transactions on Machine Learning Research, 2022a. ISSN 2835-8856. Survey Certification.
[52]
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35: 24824-24837, 2022b.
[53]
Wong, W. K., Wang, H., Li, Z., Liu, Z., Wang, S., Tang, Q., Nie, S., and Wu, S. Refining decompiled c code with large language models. arXiv preprint arXiv:2310.06530, 2023.
[54]
Xu, X., Zhang, Z., Feng, S., Ye, Y., Su, Z., Jiang, N., Cheng, S., Tan, L., and Zhang, X. Lmpa: Improving decompilation by synergy of large language model and program analysis. arXiv preprint arXiv:2306.02546, 2023.
[55]
Xu, Z., Wen, C., and Qin, S. Learning types for binaries. In Formal Methods and Software Engineering: 19th International Conference on Formal Engineering Methods, ICFEM 2017, Xi'an, China, November 13-17, 2017, Proceedings, pp. 430-446. Springer, 2017.
[56]
Yakdan, K., Eschweiler, S., Gerhards-Padilla, E., and Smith, M. No more gotos: Decompilation using pattern-independent control-flow structuring and semantic-preserving transformations. In NDSS. Citeseer, 2015.
[57]
Yu, L., Jiang, W., Shi, H., Jincheng, Y., Liu, Z., Zhang, Y., Kwok, J., Li, Z., Weller, A., and Liu, W. Metamath: Bootstrap your own mathematical questions for large language models. In The Twelfth International Conference on Learning Representations, 2023.
[58]
Zan, D., Chen, B., Zhang, F., Lu, D., Wu, B., Guan, B., Yongji, W., and Lou, J.-G. Large language models meet NL2Code: A survey. In Rogers, A., Boyd-Graber, J., and Okazaki, N. (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 7443-7464, Toronto, Canada, July 2023. Association for Computational Linguistics.
[59]
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., and Artzi, Y. Bertscore: Evaluating text generation with bert. In International Conference for Learning Representation (ICLR), 2020.
[60]
Zheng, Q., Xia, X., Zou, X., Dong, Y., Wang, S., Xue, Y., Shen, L., Wang, Z., Wang, A., Li, Y., Su, T., Yang, Z., and Tang, J. Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD '23, pp. 5673-5684, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701030.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'24: Proceedings of the 41st International Conference on Machine Learning
July 2024
63010 pages

Publisher

JMLR.org

Publication History

Published: 21 July 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media