Nothing Special   »   [go: up one dir, main page]

SPLASH 2021
Sun 17 - Fri 22 October 2021 Chicago, Illinois, United States

Help others to build upon the contributions of your paper!

The Artifact Evaluation process is a service provided by the community to help authors of accepted papers provide more substantial supplements to their papers so future researchers can more effectively build on and compare with previous work.

Authors of papers that pass Round 1 of PACMPL (OOPSLA) will be invited to submit an artifact that supports the conclusions of their paper. The AEC will read the paper and explore the artifact to give feedback about how well the artifact supports the paper and how easy it is for future researchers to use the artifact.

This submission is voluntary. Papers that go through the Artifact Evaluation process successfully will receive a seal of approval printed on the first page of the paper. Authors of papers with accepted artifacts are encouraged to make these materials publicly available upon publication of the proceedings, by including them as “source materials” in the ACM Digital Library.

See the Call for Artifacts tab for more information.

In an effort to reach a broader reviewing audience, we are also accepting self-nominations for artifact review. Please see the Call for Self-Nominations tab for more information.

Call for Artifacts

Help others to build upon the contributions of your paper!

The Artifact Evaluation process is a service provided by the community to help authors of accepted papers provide more substantial supplements to their papers so future researchers can more effectively build on and compare with previous work.

Authors of papers that pass Round 1 of PACMPL (OOPSLA) will be invited to submit an artifact that supports the conclusions of their paper. The AEC will read the paper and explore the artifact to give feedback about how well the artifact supports the paper and how easy it is for future researchers to use the artifact.

This submission is voluntary. Papers that go through the Artifact Evaluation process successfully will receive a seal of approval printed on the first page of the paper. Authors of papers with accepted artifacts are encouraged to make these materials publicly available upon publication of the proceedings, by including them as “source materials” in the ACM Digital Library.

Important Dates

  • July 12: Authors of papers accepted in Phase 1 submit artifacts (one week after Phase 1 notification)
  • July 21-26: Authors may respond to issues found following kick-the-tires instructions
  • September 1: Artifact notifications sent out

New This Year

  • More interaction with reviewers
    • In prior years, OOPSLA, like other AECs, adopted a kick-the-tires phase, and this was previously the only interaction permitted with reviewers. This year, we are retaining the kick-the-tires phase, but following the kick-the-tires responses we are allowing additional rounds of interaction with reviewers, in the hope that more artifacts that were just short of being Functional in previous years can have more opportunities to make small corrections. After the kick-the-tires response, reviewers will be able to post author-visible comments with questions for authors at any time, and authors may respond to those reviewer questions/requests. Such interaction is on the reviewers’ initiative; authors will be asked not to post unless in response to reviewer requests.
  • Unlike previous years, papers may seek an Available badge without going through the AEC
    • In prior years, OOPSLA awarded Available badges only to those artifacts deemed Functional. Artifacts that were not submitted to the AEC, or were submitted but did not earn the functional badge were previously ineligible for the Available badge. This year, papers may receive an availability badge even if the AEC did not review the artifact, and even in the case where the AEC did review the artifact but did not award the Functional badge.

These changes are based on the Chairs’ Report from Artifact Evaluation for OOPSLA 2020.

Selection Criteria

The artifact is evaluated in relation to the expectations set by the paper. For an artifact to be accepted, it must support all the main claims made in the paper. Thus, in addition to just running the artifact, the evaluators will read the paper and may try to tweak provided inputs or otherwise slightly generalize the use of the artifact from the paper in order to test the artifact’s limits.

Artifacts should be:

  • consistent with the paper,
  • as complete as possible,
  • well documented, and
  • easy to reuse, facilitating further research.

The AEC strives to place itself in the shoes of such future researchers and then to ask: how much would this artifact have helped me? Please see details of the outcomes of artifact evaluation (badges) for further guidance on what these mean.

Submission Process

All papers that pass phase 1 of OOPSLA reviewing are eligible to submit artifacts.

Your submission should consist of three pieces:

  1. an overview of your artifact,
  2. a URL pointing to either:
  • a single file containing the artifact (recommended), or
  • the address of a public source control repository
  1. A hash certifying the version of the artifact at submission time: either
  • an md5 hash of the single file file (use the md5 or md5sum command-line tool to generate the hash), or
  • the full commit hash for the (e.g., from git reflog --no-abbrev)

The URL must be a Google Drive, Dropbox, Github, Bitbucket, or (public) Gitlab URL, to help protect the anonymity of the reviewers. You may upload your artifact directly if it’s a single file less than 15 MB.

Artifacts do not need to be anonymous; reviewers will be aware of author identities.

Overview of the Artifact

Your overview should consist of two parts:

  • a Getting Started Guide and
  • Step-by-Step Instructions for how you propose to evaluate your artifact (with appropriate connections to the relevant sections of your paper);

The Getting Started Guide should contain setup instructions (including, for example, a pointer to the VM player software, its version, passwords if needed, etc.) and basic testing of your artifact that you expect a reviewer to be able to complete in 30 minutes. Reviewers will follow all the steps in the guide during an initial kick-the-tires phase. The Getting Started Guide should be as simple as possible, and yet it should stress the key elements of your artifact. Anyone who has followed the Getting Started Guide should have no technical difficulties with the rest of your artifact.

The Step by Step Instructions explain how to reproduce any experiments or other activities that support the conclusions in your paper. Write this for readers who have a deep interest in your work and are studying it to improve it or compare against it. If your artifact runs for more than a few minutes, point this out, note how long it is expected to run (roughly) and explain how to run it on smaller inputs. Reviewers may choose to run on smaller inputs or larger inputs depending on available hardware.

Where appropriate, include descriptions of and links to files (included in the archive) that represent expected outputs (e.g., the log files expected to be generated by your tool on the given inputs); if there are warnings that are safe to be ignored, explain which ones they are.

The artifact’s documentation should include the following:

  • A list of claims from the paper supported by the artifact, and how/why.
  • A list of claims from the paper not supported by the artifact, and why not.

Example: Performance claims cannot be reproduced in VM, authors are not allowed to redistribute specific benchmarks, etc. Artifact reviewers can then center their reviews / evaluation around these specific claims, though the reviewers will still consider whether the provided evidence is adequate to support claims that the artifact works.

Packaging the Artifact

When packaging your artifact, please keep in mind: a) how accessible you are making your artifact to other researchers, and b) the fact that the AEC members will have a limited time in which to make an assessment of each artifact.

Your artifact can contain a bootable virtual machine image with all of the necessary libraries installed. Using a virtual machine provides a way to make an easily reproducible environment — it is less susceptible to bit rot. It also helps the AEC have confidence that errors or other problems cannot cause harm to their machines. This is recommended.

Submitting source code that must be compiled is permissible. A more automated and/or portable build — such as a Docker file or a build tool that manages all compilation and dependencies (e.g., maven, gradle, etc.) — improves the odds the AEC will not be stuck getting different versions of packages working (particularly different releases of programming languages).

Authors submitting machine-checked proof artifacts should consult Marianna Rapoport’s Proof Artifacts: Guidelines for Submission and Reviewing.

You should make your artifact available as a single archive file and use the naming convention <paper #>.<suffix>, where the appropriate suffix is used for the given archive format. Please use a widely available compressed archive format such as ZIP (.zip), tar and gzip (.tgz), or tar and bzip2 (.tbz2). Please use open formats for documents.

Based on the outcome of the OOPSLA 2019 AEC, the strongest recommendation we can give for ensuring quality packaging is to test your own directions on a fresh machine (or VM), following exactly the directions you have prepared.

While publicly available artifacts are often easier to review, and considered to be in the best interest of open science, artifacts are not required to be public and/or open source. Artifact reviewers will be instructed that the artifacts are for use only for artifact evaluation, that submitted versions of artifacts may not be made public by reviewers, and that copies of artifacts must not be kept beyond the review period. There is an additional badge specifically for making artifacts available in reliable locations (see below), and we strongly encourage authors of accepted artifacts to pursue it, but it is a separate process from evaluation of functionality, and it is not required.

Review Process Overview

After submitting their artifact, there is a short window of time in which the reviewers will work through only the kick-the-tires instructions, and upload preliminary reviews indicating whether or not they were able to get those 30-or-so minutes of instructions working. At that point the preliminary reviews will be shared with authors, who may make modest updates and corrections in order to resolve any issues the reviewers encountered.

This year we allow additional rounds of interaction with reviewers in the case new issues are discovered after that window (see ‘New This Year’ above).

Badges

The artifact evaluation committee evaluates each artifact for the awarding of one or two badges:

Functional: This is the basic “accepted” outcome for an artifact. An artifact can be awarded a functional badge if the artifact supports all claims made in the paper, possibly excluding some minor claims if there are very good reasons they cannot be supported. In the ideal case, an artifact with this designation includes all relevant code, dependencies, input data (e.g., benchmarks), and the artifact’s documentation is sufficient for reviewers to reproduce the exact results described in the paper. If the artifact claims to outperform a related system in some way (in time, accuracy, etc.) and the other system was used to generate new numbers for the paper (e.g., an existing tool was run on new benchmarks not considered by the corresponding publication), artifacts should include a version of that related system, and instructions for reproducing the numbers used for comparison as well. If the alternative tool crashes on a subset of the inputs, simply note this expected behavior.

Deviations from this ideal must be for good reason. A non-exclusive list of justifiable deviations includes:

  • Some benchmark code is subject to licensing or intellectual property restrictions and cannot legally be shared with reviewers (e.g., licensed benchmark suites like SPEC, or when a tool is applied to private proprietary code). In such cases, all available benchmarks should be included. If all benchmark data from the paper falls into this case, alternative data should be supplied: providing a tool with no meaningful inputs to evaluate on is not sufficient to justify claims that the artifact works.
  • Some of the results are performance data, and therefore exact numbers depend on the particular hardware. In this case, artifacts should explain how to recognize when experiments on other hardware reproduce the high-level results (e.g., that a certain optimization exhibits a particular trend, or that comparing two tools one outperforms the other in a certain class of cases).
  • In some cases repeating the evaluation may take a long time. Reviewers may not reproduce full results in such cases.

In some cases, the artifact may require specialized hardware (e.g., a CPU with a particular new feature, or a specific class of GPU, or a cluster of GPUs). For such cases, authors should contact the Artifact Evaluation Co-Chairs (Anders Møller, Ana Milanova, and Colin Gordon) as soon as possible after round 1 notification to work out how to make these possible to evaluate. In past years one outcome was that an artifact requiring specialized hardware paid for a cloud instance with the hardware, which reviewers could access remotely.

Reusable: This badge may only be awarded to artifacts judged functional. A Reusable badge is given when reviewers feel the artifact is particularly well packaged, documented, designed, etc. to support future research that might build on the artifact. For example, if it seems relatively easy for others to reuse this directly as the basis of a follow-on project, the AEC may award a Reusable badge.

For binary-only artifacts to be considered Reusable, it must be possible for others to directly use the binary in their own research, such as a JAR file with very high quality client documentation for someone else to use it as a component of their own project.

Artifacts with source can be considered Reusable: - if they can be reused as components, - if others can learn from the source and apply the knowledge elsewhere (e.g., learning an implementation or proof/formalization technique for use in a separate codebase), or - if others can directly modify and/or extend the system to handle new or expanded use cases.

Artifacts given one or both of the Functional and Reusable badges are generally referred to as accepted.

After decisions on the Functional and Reusable badges have been made, authors of any artifacts (including those not reviewed by the AEC, and those reviewed but not found Functional during reviewing) can earn an additional badge for their artifact durably available:

Available: This badge is automatically earned by artifacts that are made available publicly in an archival location. We strongly suggest, but do not require, that artifacts that were evaluated as Functional archive the evaluated version. There are two routes for this:

  1. Authors upload a snapshot of the complete artifact to Zenodo, which provides a DOI specific to the artifact. Note that Github, etc. are not adequate for receiving this badge (see FAQ), and that Zenodo provides a way to make subsequent revisions of the artifact available and linked from the specific version.
  2. Authors can work with Conference Publishing to upload their artifacts directly to the ACM, where the artifact will be hosted alongside the paper.

Common issues

Common issues in the kick-the-tires phase in last year’s artifact evaluation included:

  • Overstating platform support. Several artifacts claiming the need for only UNIX-like systems failed severely under macOS — in particular those requiring 32-bit compilers, which are no longer present in newer macOS versions. We recommend future artifacts scope their claimed support more narrowly. Generally this could be fixed by the authors providing a Dockerfile.
  • Missing dependencies, or poor documentation of dependencies.
  • As with last year, the single most effective way to avoid these sorts of issues ahead of time is to run the instructions independently on a fresh machine, VM, or Docker container.

Common issues found during last year’s full review phase included:

  • Comparing against existing tools on new benchmarks, but not including ways to reproduce the other tools’ executions. This was explicitly mentioned in the call for artifacts.
  • Not explaining how to interpret results. Several artifacts ran successfully and produced the output that was the basis for the paper, but without any way for reviewers to compare these for consistency with the paper. Examples included generating a list of warnings without documenting which were true vs. false positives, and generating large tables of numbers that were presented graphically in the paper without providing a way to generate analogous visualizations.

COI

Conflict of interests for AEC members are handled by the chairs. Conflicts of interest involving one of the AEC chairs are handled by the other AEC chairs, or the PC of the conference if all chairs are conflicted. Artifacts involving an AEC chair author must be unambiguously accepted (they may not be borderline), and they may not be considered for the distinguished artifact award.

FAQ

This list will be updated with useful questions as time goes on.

My artifact requires hundreds of GB of RAM / hundreds of CPU hours / a specialized GPU / etc., that the AEC members may not have access to. How can we submit an artifact?
If the tool can run on an average modern machine, but may run extremely slow in comparison to the hardware used for the paper's evaluation, please document the expected running time on your own hardware, and point to examples the AEC may be able to replicate in less time. If your system will simply not work at all without hundreds of GB or RAM, or other hardware requirements that most typical graduate student machines will not satisfy, please contact the AEC chairs in advance to make arrangements. In the past this has included options such as the authors paying for a cloud instance with the required hardware, which reviewers can have anonymous access to (the AEC chairs play proxy to communicate when the instance may be off to save the authors money). Submissions using cloud instances or similar that are not cleared with the AEC Chairs in advance will be summarily rejected
Can my artifact be accepted if some of the paper’s claims are not supported by the artifact, for example if some benchmarks are omitted or the artifact does not include tools we experimentally compare against in the paper?
In general yes (if good explanations are provided, as explained above), but if such claims are essential to the overall results of the paper, the artifact will be rejected. As an extreme example, an artifact consisting of a working tool submitted with no benchmarks (e.g., if all benchmarks have source that may not be redistributed) would be rejected.
Why do we need to use Zenodo for the Available badge? Why not Github?
Commercial repositories are unreliable, in that there is no guarantee the evaluated artifact will remain available indefinitely. Contrary to popular belief, it is possible to rewrite git commit history in a public repository (see docs on git rebase and the "--force" option to git push, and note that git tags are mutable). Users can delete public repositories, or their accounts. And in addition to universities deleting departmental URLs over time, hosting companies also sometimes simply delete data: Bidding farewell to Google Code (2015), Sunsetting Mercurial Support in Bitbucket (2019).
Reviewers identified things to fix in documentation or scripts for our artifact, and we'd prefer to publish the fixed version. Can we submit the improved version for the Available badge?
Yes.
Can I get the Available badge without submitting an artifact? I'm still making the thing available!
Yes.
Can I get the Available badge for an artifact that was not judged to be Functional? I'm still making the thing available!
Yes.

Contact

Please contact Anders Møller, Ana Milanova, and Colin Gordon if you have any questions.

This year the OOPSLA 2021 Artifact Evaluation Chairs are seeking (self!) nominations for the Artifact Evaluation Committee (AEC). If you are a senior PhD student or post-doc with expertise relevant to the kinds of artifacts submitted to OOPSLA, please read the rest of this message and apply: https://forms.gle/kZjavNtX46o1sqV17

If you are not, but know someone who might be interested, please let them know about this.

Generally, the bar for “senior” PhD student has been authorship on one paper at a SIGPLAN conference or a related conference (e.g., ICSE, FSE, ASE, ISSTA, ECOOP, ESOP, etc.), though this should be interpreted as a rough guideline rather than a hard requirement on where you have published. Prior experience with artifact evaluation (as a submitter or reviewer) is a plus, but also not required.

The AEC’s work will mainly occur between the phase 1 notifications for OOPSLA (July 2, 2021) and the due date for phase 2 revisions (August 13, 2021).

For more information on artifact reviewing, consult the calls for artifacts.

If you have questions, don’t hesitate to contact the 2021 AEC chairs Anders Møller, Ana Milanova, and Colin Gordon).

Results Overview

We received 48 artifact submissions. Of these:

  • 4 were deemed non-functional. This is only an indication that the AEC was not able to reproduce all relevant claims to their satisfaction, and not an indictment of the corresponding paper.
  • 44 were accepted in some way (91.6% acceptance), broken down as
    • 16 were awarded the Functional badge
    • 28 were awarded both Functional and Reusable badges (58% of accepted artifacts were reusable).

The total number of artifacts submitted dropped by about 30% from OOPSLA 2020, and is closer to the total for OOPSLA 2019. The proportion of accepted artifacts deemed reusable is similar to previous years.

This year the OOPSLA AEC allowed any paper to seek an Available badge, regardless of evaluation, both to sync some of our policies with other SIGPLAN AECs and following the rationale that even for papers which were not evaluated by the AEC, any available source or tool is better than no source or tool, so it is better to encourage this.

The proportion of submitted artifacts deemed at least Functional is an all-time high for OOPSLA Artifact Evaluation. We attribute this in part to the changes made to the process this year (described in detail in the Call for Artifacts), which allowed multiple rounds of author corrections to packaging and clarifications of instructions, compared to the single attempt allowed in prior years. We view this as achieving the goals of the process change, which was made in hopes that more artifacts that in prior processes might have been deemed non-functional for fixable packaging or presentation issues found after the kick-the-tires phase, would get an opportunity to make additional fixes. A number of artifacts clearly benefited from this approach. The process is certainly not perfect, however, as the additional rounds of interaction with authors also in some cases significantly increased reviewer workload.

37 senior PhD students and post-docs wrote 144 reviews, plus additional rounds of anonymous discussion with authors via HotCRP.

Distinguished Artifacts

  • Rich Specifications for Ethereum Smart Contract Verification
    • Christian Bräm, Marco Eilers, Peter Müller, Robin Sierra, and Alexander J. Summers
  • Data-Driven Abductive Inference of Library Specifications
    • Zhe Zhou, Robert Dickerson, Benjamin Delaware, and Suresh Jagannathan
  • Solver-based Gradual Type Migration
    • Luna Phipps-Costin, Carolyn Anderson, Michael Greenberg, and Arjun Guha

Distinguished Artifact Reviewers

  • Robert Sison
    • University of Melbourne
  • Kristóf Marussy
    • Budapest University of Technology and Economics