Nothing Special   »   [go: up one dir, main page]

There is a newer version of the record available.

Published March 22, 2022 | Version 1.0
Software Open

Reproduction: Assessing Robustness of ML-Based Program Analysis Tools using Metamorphic Program Transformations

Description

# Lampion CodeBERT Code2Text Reproduction

Welcome to the reproduction package for the Lampion CodeBERT Code2Text Grid-Experiment.
This is meant to give easy access for the final experiments shown in the accomodating paper.

The creation process for these files, as well as a bigger overview, can be found [in the repository](https://github.com/ciselab/Lampion/tree/main/Experiments/CodeBert_CodeToText/).

## Requirements

- Linux Operating System
- Docker v20.10.13
- Docker Compose v2.2.2
- [CodeBERT Experiment Dockerimage](https://github.com/ciselab/CodeBert-CodeToText-Reproduction) v.1.3
- [CodeBERT Python-Preprocessing Image](https://github.com/ciselab/Lampion/tree/main/Experiments/CodeBert_CodeToText/preprocessing-python) v1.1
- [CodeBERT Java-Preprocessing Image](https://github.com/ciselab/Lampion/tree/main/Experiments/CodeBert_CodeToText/preprocessing-java) v1.1

In case that no GPUs are available / configured, the experiment will default to using CPUs.

## How To

1. Prepare the requirements (namely, download or build the images)
2. Ship the folders to your GPU-Server
3. Run the `replicator.sh` (this will need a lot of space!)
4. Run the `runner.sh` in background per: `nohup ./runner.sh >runner.log &`
5. Wait (estimate ~1h+ per experiment)
6. Optional: Run `extractor.sh` to only get output files (to not copy model-replicas and data-replicas on your local computer)

## Contents

- Pretrained Models for Java and Python
- Cleaned Test-Datasets
- docker-composes to run experiments
- helper shell-files

The pretrained models are those that scored best in BLEU in training ("best bleu").
The models were trained as per default configuration in [CodeXGlue Readme](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Text/code-to-text)

Not-Contents:

- Training & Validation Files
- File Cleaning Process
- Code-Files to run experiments
- Dockerfiles to create images

Most of the Non-Content elements are available FOSS [in the repository](https://github.com/ciselab/Lampion/).

## Licence(s)

The code and artifacts provided by the authors are under MIT Licence.
The NVidia containers come with an implicit License.

## Used Environment

We are unfortunately aware that GPU Containers are very fragile.
We hope that we figured most elements out, as it worked for us on multiple different machines.
Never the less, here are the used specs to produce our results:

- Linux 5.4.0 Generic Ubuntu
- NVidia A40 (Graphics Card)
- CUDA Version 11.6
- NVidia Driver Version 510.47.03
- NVidia Docker 2.9.1

Notes

This reproduction Package was created using the Experiment-Repository: https://github.com/ciselab/Lampion/

Files

java-experiment-setup.zip

Files (1.3 GB)

Name Size Download all
md5:3a6fa18253c2de94b9801230f28c0ad1
644.6 MB Preview Download
md5:a41ec29c0094758ad3951f31f68e1a9a
646.3 MB Preview Download
md5:01977755f51d0590470b05ff352b6dcb
2.6 kB Preview Download