Reproduction: Assessing Robustness of ML-Based Program Analysis Tools using Metamorphic Program Transformations

1. TU Delft

# Lampion CodeBERT Code2Text Reproduction

Welcome to the reproduction package for the Lampion CodeBERT Code2Text Grid-Experiment.
This is meant to give easy access for the final experiments shown in the accomodating paper.

The creation process for these files, as well as a bigger overview, can be found [in the repository](https://github.com/ciselab/Lampion/tree/main/Experiments/CodeBert_CodeToText/).

## Requirements

- Linux Operating System
- Docker v20.10.13
- Docker Compose v2.2.2
- [CodeBERT Experiment Dockerimage](https://github.com/ciselab/CodeBert-CodeToText-Reproduction) v.1.3
- [CodeBERT Python-Preprocessing Image](https://github.com/ciselab/Lampion/tree/main/Experiments/CodeBert_CodeToText/preprocessing-python) v1.1
- [CodeBERT Java-Preprocessing Image](https://github.com/ciselab/Lampion/tree/main/Experiments/CodeBert_CodeToText/preprocessing-java) v1.1

In case that no GPUs are available / configured, the experiment will default to using CPUs.

## How To

1. Prepare the requirements (namely, download or build the images)
2. Ship the folders to your GPU-Server
3. Run the `replicator.sh` (this will need a lot of space!)
4. Run the `runner.sh` in background per: `nohup ./runner.sh >runner.log &`
5. Wait (estimate ~1h+ per experiment)
6. Optional: Run `extractor.sh` to only get output files (to not copy model-replicas and data-replicas on your local computer)

## Contents

- Pretrained Models for Java and Python
- Cleaned Test-Datasets
- docker-composes to run experiments
- helper shell-files

The pretrained models are those that scored best in BLEU in training ("best bleu").
The models were trained as per default configuration in [CodeXGlue Readme](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Text/code-to-text)

Not-Contents:

- Training & Validation Files
- File Cleaning Process
- Code-Files to run experiments
- Dockerfiles to create images

Most of the Non-Content elements are available FOSS [in the repository](https://github.com/ciselab/Lampion/).

## Licence(s)

The code and artifacts provided by the authors are under MIT Licence.
The NVidia containers come with an implicit License.

## Used Environment

We are unfortunately aware that GPU Containers are very fragile.
We hope that we figured most elements out, as it worked for us on multiple different machines.
Never the less, here are the used specs to produce our results:

- Linux 5.4.0 Generic Ubuntu
- NVidia A40 (Graphics Card)
- CUDA Version 11.6
- NVidia Driver Version 510.47.03
- NVidia Docker 2.9.1

Notes

This reproduction Package was created using the Experiment-Repository: https://github.com/ciselab/Lampion/

Files

java-experiment-setup.zip

Files (1.3 GB)

Name	Size	Download all
java-experiment-setup.zip md5:3a6fa18253c2de94b9801230f28c0ad1	644.6 MB	Preview Download
python-experiment-setup.zip md5:a41ec29c0094758ad3951f31f68e1a9a	646.3 MB	Preview Download
README.md md5:01977755f51d0590470b05ff352b6dcb	2.6 kB	Preview Download

109

Views

Downloads

Show more details

	All versions	This version
Views	109	39
Downloads	13	5
Data volume	6.5 GB	3.2 GB

More info on how stats are collected....

DOI

Resource type

Software

Publisher

Zenodo

Languages

English

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: March 22, 2022
Modified: March 31, 2022

Reproduction: Assessing Robustness of ML-Based Program Analysis Tools using Metamorphic Program Transformations

Creators

Description

Notes

Files

java-experiment-setup.zip

Files (1.3 GB)