Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3524842.3528477acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
short-paper

Lupa: a framework for large scale analysis of the programming language usage

Published: 17 October 2022 Publication History

Abstract

In this paper, we present Lupa --- a platform for large-scale analysis of the programming language usage. Lupa is a command line tool that uses the power of the IntelliJ Platform under the hood, which gives it access to powerful static analysis tools used in modern IDEs. The tool supports custom analyzers that process the rich concrete syntax tree of the code and can calculate its various features: the presence of entities, their dependencies, definition-usage chains, etc. Currently, Lupa supports analyzing Python and Kotlin, but can be extended to other languages supported by IntelliJ-based IDEs. We explain the internals of the tool, show how it can be extended and customized, and describe an example analysis that we carried out with its help: analyzing the syntax of ranges in Kotlin.

References

[1]
[n.d.]. CLion. https://www.jetbrains.com/clion/. [Online; accessed 8-December-2021].
[2]
[n.d.]. Code Search on GitHub. https://docs.github.com/en/search-github/searching-on-github/searching-code. [Online; accessed 10-January-2022].
[3]
[n.d.]. Github Annual Report. https://octoverse.github.com/#top-languages-over-the-years. [Online; accessed 8-December-2021].
[4]
[n.d.]. IntelliJ IDEA. https://www.jetbrains.com/idea/. [Online; accessed 8-December-2021].
[5]
[n.d.]. Kotlin, Estimated, SlashData Estimate: State of the Developer Nation 20th Edition, April 2021. https://developer-economics.cdn.prismic.io/developer-economics/dbf9f36f-a31a-440a-9c22-c599cc235fa4_20th+edition+-+State+of+the+developer+Nation.pdf. [Online; accessed 25-Juny-2021].
[6]
[n.d.]. Program Structure Interface. https://plugins.jetbrains.com/docs/intellij/psi.html. [Online; accessed 26-November-2021].
[7]
[n.d.]. Publications about Boa. http://boa.cs.iastate.edu/papers/index.php#about. [Online; accessed 8-December-2021].
[8]
[n.d.]. Publications cited Boa. http://boa.cs.iastate.edu/papers/index.php#uses. [Online; accessed 8-December-2021].
[9]
[n.d.]. PyCharm. https://www.jetbrains.com/pycharm/. [Online; accessed 8-December-2021].
[10]
[n.d.]. The IntelliJ Platform. https://www.jetbrains.com/opensource/idea/. [Online; accessed 26-November-2021].
[11]
Barry Boehm. 2006. A view of 20th and 21st century software engineering. In Proceedings of the 28th international conference on Software engineering. 12--29.
[12]
Hudson Borges, Andre Hora, and Marco Tulio Valente. 2016. Understanding the factors that impact the popularity of GitHub repositories. In 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 334--344.
[13]
Subham Bose, Madhuleena Mukherjee, Aditi Kundu, and Madhurima Banerjee. 2018. A comparative study: java vs kotlin programming in android application development. International Journal of Advanced Research in Computer Science 9, 3 (2018), 41.
[14]
Benjamin Canou, Roberto Di Cosmo, and Grégoire Henry. 2017. Scaling up functional programming education: under the hood of the OCaml MOOC. Proceedings of the ACM on Programming Languages 1, ICFP (2017), 1--25.
[15]
Fragkiskos Chatziasimidis and Ioannis Stamelos. 2015. Data collection and analysis of GitHub repositories and users. In 2015 6th International Conference on Information, Intelligence, Systems and Applications (IISA). IEEE, 1--6.
[16]
Ozren Dabic, Emad Aghajani, and Gabriele Bavota. 2021. Sampling Projects in GitHub for MSR Studies. In 18th IEEE/ACM International Conference on Mining Software Repositories, MSR 2021. IEEE, 560--564.
[17]
Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N Nguyen. 2013. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 422--431.
[18]
Robert Dyer, Hridesh Rajan, Hoan Anh Nguyen, and Tien N Nguyen. 2013. A large-scale empirical study of Java language feature usage. (2013).
[19]
Robert Dyer, Hridesh Rajan, Hoan Anh Nguyen, and Tien N Nguyen. 2014. Mining billions of AST nodes to study actual and potential usage of Java language features. In Proceedings of the 36th International Conference on Software Engineering. 779--790.
[20]
Christoph Endres, Andreas Butz, and Asa MacWilliams. 2005. A survey of software infrastructures and frameworks for ubiquitous computing. Mobile Information Systems 1, 1 (2005), 41--80.
[21]
Matheus Flauzino, Júlio Veríssimo, Ricardo Terra, Elder Cirilo, Vinicius HS Durelli, and Rafael S Durelli. 2018. Are you still smelling it? A comparative study between Java and Kotlin language. In Proceedings of the VII Brazilian symposium on software components, architectures, and reuse. 23--32.
[22]
Demetrio Guilardi, Jalves Nicácio, Bianca M Napoleão, and Fabio Petrillo. 2020. AndroidPropTracker: mining lifetime properties of Android projects. In Proceedings of the IEEE/ACM 7th International Conference on Mobile Software Engineering and Systems. 23--26.
[23]
Siim Karus and Harald Gall. 2011. A study of language usage evolution in open source software. In Proceedings of the 8th Working Conference on Mining Software Repositories. 13--22.
[24]
Pavneet Singh Kochhar, Tegawendé F Bissyandé, David Lo, and Lingxiao Jiang. 2013. Adoption of software testing in open source projects-A preliminary study on 50,000 projects. In 2013 17th european conference on software maintenance and reengineering. IEEE, 353--356.
[25]
Raula Gaikovina Kula, Daniel M German, Ali Ouni, Takashi Ishio, and Katsuro Inoue. 2018. Do developers update their library dependencies? Empirical Software Engineering 23, 1 (2018), 384--417.
[26]
Zarina Kurbatova, Yaroslav Golubev, Vladimir Kovalenko, and Timofey Bryksin. 2021. The IntelliJ Platform: a Framework for Building Plugins and Mining Software Data. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW). IEEE, 14--17.
[27]
Davy Landman, Alexander Serebrenik, and Jurgen J. Vinju. 2017. Challenges for Static Analysis of Java Reflection - Literature Review and Empirical Study. 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE) (2017), 507--518.
[28]
Yue Li, Tian Tan, Yulei Sui, and Jingling Xue. 2014. Self-inferencing reflection resolution for Java. In European Conference on Object-Oriented Programming. Springer, 27--53.
[29]
Zheyang Li. 2021. An Empirical Study on Bash Language Usage in Github. Master's thesis. University of Waterloo.
[30]
Petr Maj, Konrad Siek, Alexander Kovalenko, and Jan Vitek. 2021. CodeDJ: Reproducible Queries over Large-Scale Software Repositories. In 35th European Conference on Object-Oriented Programming (ECOOP 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.
[31]
Bruno Gois Mateus and Matias Martinez. 2019. An empirical study on quality of Android applications written in Kotlin language. Empirical Software Engineering 24, 6 (2019), 3356--3393.
[32]
Bruno Gois Mateus and Matias Martinez. 2020. On the adoption, usage and evolution of Kotlin features in Android development. In Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1--12.
[33]
Obaro Odiete, Tanvi Jain, Ifeoma Adaji, Julita Vassileva, and Ralph Deters. 2017. Recommending programming languages by identifying skill gaps using analysis of experts. a study of stack overflow. In Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization. 159--164.
[34]
Kevin R Parker, Joseph T Chao, Thomas A Ottaway, and Jane Chang. 2006. A formal language selection process for introductory programming courses. Journal of Information Technology Education: Research 5, 1 (2006), 133--151.
[35]
Yun Peng, Yu Zhang, and Mingzhe Hu. 2021. An Empirical Study for Common Language Features Used in Python Projects. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 24--35.
[36]
Gerald Schermann, Sali Zumberi, and Jürgen Cito. 2018. Structured information on state and evolution of dockerfiles on github. In Proceedings of the 15th International Conference on Mining Software Repositories. 26--29.
[37]
Sergii Sharov, Vira Kolmakova, Tetiana Sharova, and Anatolii Pavlenko. 2021. Analysis of MOOC on Programming for IT Specialist Training. (2021).
[38]
Francisco Zigmund Sokol, Mauricio Finavaro Aniche, and Marco Aurélio Gerosa. 2013. MetricMiner: Supporting researchers in mining software repositories. In 2013 IEEE 13th International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 142--146.
[39]
Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. Pydriller: Python framework for mining software repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 908--911.
[40]
Alexander Trautsch, Fabian Trautsch, Steffen Herbold, Benjamin Ledel, and Jens Grabowski. 2020. The smartshark ecosystem for software repository mining. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings. 25--28.
[41]
Zhaobin Wang, Ke Liu, Jian Li, Ying Zhu, and Yaonan Zhang. 2019. Various frameworks and libraries of machine learning and deep learning: a survey. Archives of computational methods in engineering (2019), 1--24.
[42]
Yury Zhauniarovich, Maqsood Ahmad, Olga Gadyatskaya, Bruno Crispo, and Fabio Massacci. 2015. Stadyna: Addressing the problem of dynamic code updates in the security analysis of android applications. In Proceedings of the 5th ACM Conference on Data and Application Security and Privacy. 37--48.

Cited By

View all
  • (2023)Identifying Code Changes for Architecture Decay via a Metric Forest Structure2023 ACM/IEEE International Conference on Technical Debt (TechDebt)10.1109/TechDebt59074.2023.00014(62-71)Online publication date: May-2023
  • (2023)Evaluate the Efficiency of Hybrid Model Based on Convolutional Neural Network and Long Short-Term Memory in Information Technology Job Graph NetworkFuture Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications10.1007/978-981-99-8296-7_29(403-417)Online publication date: 17-Nov-2023
  • (2022)Using Blockchain and Artificial Intelligence to build a Job Recommendation System for Students in Information Technology2022 RIVF International Conference on Computing and Communication Technologies (RIVF)10.1109/RIVF55975.2022.10013916(364-369)Online publication date: 20-Dec-2022

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories
May 2022
815 pages
ISBN:9781450393034
DOI:10.1145/3524842
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2022

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Short-paper

Conference

MSR '22
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Identifying Code Changes for Architecture Decay via a Metric Forest Structure2023 ACM/IEEE International Conference on Technical Debt (TechDebt)10.1109/TechDebt59074.2023.00014(62-71)Online publication date: May-2023
  • (2023)Evaluate the Efficiency of Hybrid Model Based on Convolutional Neural Network and Long Short-Term Memory in Information Technology Job Graph NetworkFuture Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications10.1007/978-981-99-8296-7_29(403-417)Online publication date: 17-Nov-2023
  • (2022)Using Blockchain and Artificial Intelligence to build a Job Recommendation System for Students in Information Technology2022 RIVF International Conference on Computing and Communication Technologies (RIVF)10.1109/RIVF55975.2022.10013916(364-369)Online publication date: 20-Dec-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media