SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Black-Box Machine-Generated Text Detection
Creators
Description
Large language models (LLMs) are becoming mainstream and easily accessible, ushering in an explosion of machine-generated content over various channels, such as news, social media, question-answering forums, educational, and even academic contexts. Recent LLMs, such as ChatGPT and GPT-4, generate remarkably fluent responses to a wide variety of user queries. The articulate nature of such generated texts makes LLMs attractive for replacing human labor in many scenarios. However, this has also resulted in concerns regarding their potential misuse, such as spreading misinformation and causing disruptions in the education system. Since humans perform only slightly better than chance when classifying machine-generated vs. human-written text, there is a need to develop automatic systems to identify machine-generated text with the goal of mitigating its potential misuse.
We offer three subtasks over two paradigms of text generation: (1) full text when a considered text is entirely written by a human or generated by a machine; and (2) mixed text when a machine-generated text is refined by a human or a human-written text paraphrased by a machine.
Files
SemEval2024-Task8-code.zip
Files
(420.7 MB)
Name | Size | Download all |
---|---|---|
md5:7ea2a43f0b410e1bcdcdf25299934d83
|
1.2 MB | Preview Download |
md5:a126d3369b55931ac43da585aededef6
|
419.5 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/mbzuai-nlp/SemEval2024-task8