SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Black-Box Machine-Generated Text Detection

1. Mohamed bin Zayed University of Artificial Intelligence
2. Institute of Information Science and Technology, Italy
3. TU Darmstadt
4. University of Cambridge
5. New York University Abu Dhabi

Large language models (LLMs) are becoming mainstream and easily accessible, ushering in an explosion of machine-generated content over various channels, such as news, social media, question-answering forums, educational, and even academic contexts. Recent LLMs, such as ChatGPT and GPT-4, generate remarkably fluent responses to a wide variety of user queries. The articulate nature of such generated texts makes LLMs attractive for replacing human labor in many scenarios. However, this has also resulted in concerns regarding their potential misuse, such as spreading misinformation and causing disruptions in the education system. Since humans perform only slightly better than chance when classifying machine-generated vs. human-written text, there is a need to develop automatic systems to identify machine-generated text with the goal of mitigating its potential misuse.

We offer three subtasks over two paradigms of text generation: (1) full text when a considered text is entirely written by a human or generated by a machine; and (2) mixed text when a machine-generated text is refined by a human or a human-written text paraphrased by a machine.

Files

SemEval2024-Task8-code.zip

Files (420.7 MB)

Name	Size	Download all
SemEval2024-Task8-code.zip md5:7ea2a43f0b410e1bcdcdf25299934d83	1.2 MB	Preview Download
SemEval2024-Task8-data.zip md5:a126d3369b55931ac43da585aededef6	419.5 MB	Preview Download

Additional details

Repository URL: https://github.com/mbzuai-nlp/SemEval2024-task8

Views

Downloads

Show more details

	All versions	This version
Views	37	37
Downloads	8	8
Data volume	2.5 GB	2.5 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Association for Computational Linguistics

Conference

Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024) , Mexico City, Mexico, June, 2024

Languages

English, Mandarin Chinese, Indonesian, Bulgarian, Urdu, Arabic, German, Italian, Russian

Technical metadata

Created: May 13, 2024
Modified: May 13, 2024

SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Black-Box Machine-Generated Text Detection

Creators

Description

Files

SemEval2024-Task8-code.zip

Files (420.7 MB)

Additional details

Software