Computer Science > Software Engineering
[Submitted on 13 Jun 2024]
Title:Input-Gen: Guided Generation of Stateful Inputs for Testing, Tuning, and Training
View PDF HTML (experimental)Abstract:The size and complexity of software applications is increasing at an accelerating pace. Source code repositories (along with their dependencies) require vast amounts of labor to keep them tested, maintained, and up to date. As the discipline now begins to also incorporate automatically generated programs, automation in testing and tuning is required to keep up with the pace - let alone reduce the present level of complexity. While machine learning has been used to understand and generate code in various contexts, machine learning models themselves are trained almost exclusively on static code without inputs, traces, or other execution time information. This lack of training data limits the ability of these models to understand real-world problems in software. In this work we show that inputs, like code, can be generated automatically at scale. Our generated inputs are stateful, and appear to faithfully reproduce the arbitrary data structures and system calls required to rerun a program function. By building our tool within the compiler, it both can be applied to arbitrary programming languages and architectures and can leverage static analysis and transformations for improved performance. Our approach is able to produce valid inputs, including initial memory states, for 90% of the ComPile dataset modules we explored, for a total of 21.4 million executable functions. Further, we find that a single generated input results in an average block coverage of 37%, whereas guided generation of five inputs improves it to 45%.
Current browse context:
cs.SE
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.