Abstract
The Python programming language has been picking up traction in Industry for the past few years in virtually all application domains. Python is known for its high calibre and passionate community of developers. Empirical research on Python systems has potential to promote a healthy environment, where claims and beliefs held by the community are supported by data. To facilitate such research, a corpus of 132 open source python projects have been identified, basic information, quality as well as complexity metrics has been collected and organized into CSV files. Collectively, the list consists of 36, 635 python modules, 59, 532 classes, 253, 954 methods and 84, 892 functions. Projects in the selected list span various application domains including Web/APIs, Scientific Computing, Security and more.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
An empirical study of dynamic types for python projects. In: 8th International Conference (SATE)), November 2018
Akerblom, B., Wrigstad, T.: Measuring polymorphism in Python programs. In: Proceedings of the 11th Symposium on Dynamic Languages, DLS 2015. ACM (2015)
Alexandru, C.V., Merchante, J.J., Panichella, S., Proksch, S., Gall, H.C., Robles, G.: On the usage of Pythonic idioms. In: Proceedings of the 2018 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Onward! 2018. ACM (2018)
Chen, Z., Ma, W., Lin, W., Chen, L., Xu, B.: Tracking down dynamic feature code changes against Python software evolution. In: 2016 Third International Conference on Trustworthy Systems and their Applications (TSA), September 2016
Destefanis, G., Ortu, M., Porru, S., Swift, S., Marchesi, M.: A statistical comparison of Java and Python software metric properties. In: 2016 IEEE/ACM 7th International Workshop on Emerging Trends in Software Metrics (WETSoM), May 2016
Destefanis, G., Counsell, S., Concas, G., Tonelli, R.: Software metrics in Agile Software: an empirical study. In: Agile Processes in Software Engineering and Extreme Programming, pp. 157–170. Springer, Heidelberg (2014)
Guo, P.: Python is now the most popular introductory teaching language at top U.S. universities (2014). https://cacm.acm.org/blogs/blog-cacm/176450-python-is-now-the-most-popular-introductory-teaching-language-at-top-u-s-universities/fulltext
Lin, W., Chen, Z., Ma, W., Chen, L., Xu, L., Xu, B.: An empirical study on the characteristics of Python fine-grained source code change types. In: 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), October 2016
Malloy, B.A., Power, J.F.: Quantifying the transition from Python 2 to 3: an empirical study of Python applications. In: 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), November 2017
Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceedings of the 28th International Conference on Software Engineering, ICSE 2006. ACM (2006)
Nanz, S., Furia, C.A.: A comparative study of programming languages in Rosetta code. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 1, May 2015
Orrú, M., Tempero, E.D., Marchesi, M., Tonelli, R., Destefanis, G.: A curated benchmark collection of Python systems for empirical studies on software engineering. In: Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering, PROMISE 2015. ACM (2015)
The software quality company. Python is TOIBE’s programming language of the year 2018 (2019)
In, H., Lee, T., Lee, J.B.: A study of different coding styles affecting code readability. Int. J. Softw. Eng. Appl. 7(5), 413–422 (2013)
Tempero, E., Anslow, C., Dietrich, J., Han, T., Li, J., Lumpe, M., Melton, H., Noble, J.: The Qualitas Corpus: a curated collection of Java code for empirical studies. In: 2010 Asia Pacific Software Engineering Conference (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Omari, S., Martinez, G. (2020). Enabling Empirical Research: A Corpus of Large-Scale Python Systems. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Proceedings of the Future Technologies Conference (FTC) 2019. FTC 2019. Advances in Intelligent Systems and Computing, vol 1070. Springer, Cham. https://doi.org/10.1007/978-3-030-32523-7_49
Download citation
DOI: https://doi.org/10.1007/978-3-030-32523-7_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32522-0
Online ISBN: 978-3-030-32523-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)