Computer Science > Computation and Language

arXiv:2406.00983 (cs)

[Submitted on 3 Jun 2024]

Title:Take its Essence, Discard its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect

Authors:Junyu Lu, Bo Xu, Xiaokun Zhang, Kaiyuan Liu, Dongyu Zhang, Liang Yang, Hongfei Lin

Abstract:Current methods of toxic language detection (TLD) typically rely on specific tokens to conduct decisions, which makes them suffer from lexical bias, leading to inferior performance and generalization. Lexical bias has both "useful" and "misleading" impacts on understanding toxicity. Unfortunately, instead of distinguishing between these impacts, current debiasing methods typically eliminate them indiscriminately, resulting in a degradation in the detection accuracy of the model. To this end, we propose a Counterfactual Causal Debiasing Framework (CCDF) to mitigate lexical bias in TLD. It preserves the "useful impact" of lexical bias and eliminates the "misleading impact". Specifically, we first represent the total effect of the original sentence and biased tokens on decisions from a causal view. We then conduct counterfactual inference to exclude the direct causal effect of lexical bias from the total effect. Empirical evaluations demonstrate that the debiased TLD model incorporating CCDF achieves state-of-the-art performance in both accuracy and fairness compared to competitive baselines applied on several vanilla models. The generalization capability of our model outperforms current debiased models for out-of-distribution data.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.00983 [cs.CL]
	(or arXiv:2406.00983v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.00983

Submission history

From: Junyu Lu [view email]
[v1] Mon, 3 Jun 2024 04:34:30 UTC (1,007 KB)

Computer Science > Computation and Language

Title:Take its Essence, Discard its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Take its Essence, Discard its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators