Computer Science > Machine Learning

arXiv:2306.06135 (cs)

[Submitted on 9 Jun 2023]

Title:Safety and Fairness for Content Moderation in Generative Models

Authors:Susan Hao, Piyush Kumar, Sarah Laszlo, Shivani Poddar, Bhaktipriya Radharapu, Renee Shelby

View PDF

Abstract:With significant advances in generative AI, new technologies are rapidly being deployed with generative components. Generative models are typically trained on large datasets, resulting in model behaviors that can mimic the worst of the content in the training data. Responsible deployment of generative technologies requires content moderation strategies, such as safety input and output filters. Here, we provide a theoretical framework for conceptualizing responsible content moderation of text-to-image generative technologies, including a demonstration of how to empirically measure the constructs we enumerate. We define and distinguish the concepts of safety, fairness, and metric equity, and enumerate example harms that can come in each domain. We then provide a demonstration of how the defined harms can be quantified. We conclude with a summary of how the style of harms quantification we demonstrate enables data-driven content moderation decisions.

Comments:	CVPR Workshop Paper
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2306.06135 [cs.LG]
	(or arXiv:2306.06135v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2306.06135

Submission history

From: Susan Hao [view email]
[v1] Fri, 9 Jun 2023 01:37:32 UTC (80 KB)

Computer Science > Machine Learning

Title:Safety and Fairness for Content Moderation in Generative Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Safety and Fairness for Content Moderation in Generative Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators