Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation.

AllBooks Shopping News Images Maps Videos

Search tools

Scholarly articles for Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation.

scholar.google.com › citations

… human evaluation: A brief introduction to user studies …
Schuff · Cited by 18

Reliability testing for natural language processing …
Tan · Cited by 34

Menli: Robust evaluation metrics from natural language …
Chen · Cited by 33

Defining and Detecting Vulnerability in Human Evaluation Guidelines - arXiv

arxiv.org › cs

Jun 12, 2024 · We take an initial step towards reliable evaluation guidelines and propose the first human evaluation guideline dataset by collecting annotations of guidelines.

Defining and Detecting Vulnerability in Human Evaluation Guidelines

aclanthology.org › 2024.naacl-long.441

Unreliable evaluation guidelines can yield inaccurate assessment outcomes, potentially impeding the advancement of NLG in the right direction. To address these ...

Defining and Detecting Vulnerability in Human Evaluation Guidelines - arXiv

arxiv.org › html

Jun 12, 2024 · By proposing a taxonomy of guideline vulnerabilities, we constructed the first annotated human evaluation guideline dataset. We then explored ...

Defining and Detecting Vulnerability in Human Evaluation Guidelines

www.aimodels.fyi › papers › arxiv › defi...

Jun 12, 2024 · The researchers conducted a preliminary study to identify areas where these guidelines may be prone to biases or inconsistencies, with the goal ...

How to Build Reliable Human Annotation Guidelines with LLMs

medium.com › tr-labs-ml-engineering-blog

Aug 26, 2024 · The study identifies eight common vulnerabilities in human annotation guidelines and proposes a principled approach using LLMs to create more reliable ...

Evaluation in the context of natural language generation - ScienceDirect

www.sciencedirect.com › article › abs › pii

Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation. 2024, arXiv. Multilinguality in ...

Defining and Detecting Vulnerability in Human Evaluation Guidelines

lingo.iitgn.ac.in › 2024.naacl-long.441

Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation. Jie Ruan | Wenqing Wang | Xiaojun ...

nlg evaluation | Papers With Code

paperswithcode.com › task › nlg-evaluati...

Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation · EnablerRx/GuidelineVulnDetect • 12 ...

One-day class on NLG evaluation - Ehud Reiter's Blog

ehudreiter.com › 2024/09/09 › one-day-...

Sep 9, 2024 · Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation. Proc of NAACL-2024

Jie Ruan - CatalyzeX

www.catalyzex.com › author

Unreliable evaluation guidelines can yield inaccurate assessment outcomes, potentially impeding the advancement of NLG in the right direction. To address these ...