Hate speech prediction

This project develops a model for binary classification on hate speech sentences. To accomplish this I am using the Bert (Bidirectional Encoder Representations from Transformers) language model. Components of the project:

A notebook with EDA on the dataset found in notebooks/EDA.ipynb.
A notebook with Bert modeling approaches (the meat of the project) found in notebooks/bert.ipynb.
An API for prediction around the final model.

Background

The baseline work is described in the following paper: https://arxiv.org/pdf/1809.04444.pdf. This previous work includes a dataset scraped from a white nationalist forum (https://www.stormfront.org/forum/), some data exploration, and some baseline models. It also includes a description of the annotation process.

Classifying hate speech is not a straight forward task. What is considered hate speech might be very subjective. Researches describe the issues encountered when annotating/labeling the dataset. They chose to label text at sentence level because: “Sentence-level annotation allows to work with the minimum unit containing hate speech and reduce noise introduced by other sentences that are clean” (page 3). The authors mention that there is no consensus as to what exactly hate speech is, so they set up a criteria/guidelines for annotators to accurately label the sentences. According to this criteria hate speech must include (page 3):

a deliberate attack
directed towards a specific group of people
motivated by aspects of the group’s identity.

Even with these guidelines there was multiple differences among annotators. Some questions that might arise include (some are mentioned in the paper):

can a factual statement constitute an attack?
can a single individual constitute an entire group if the offense can be generalized?
when does a trait becomes associated with an entire group's identity? Couldn't a trait be mentioned only in reference of a single individual or perhaps a subgroup?

Some of these differences where discussed in order to homogenize the results as much as possible.

Dataset

Dataset can be found here: https://github.com/Vicomtech/hate-speech-dataset.git. It contains the original train/test split. There are in total 2392 observations with 1914 and 478 instances for training and testing respectively. Classes are balanced in both sets. Dataset head:

file_id text	gSet	label	set	label
0	14103132_1	Five is about what I 've found also .	train	noHate
1	14663016_2	Mexicans have plenty of open space in their co...	train	hate
2	13860974_1	I didnt know that bar was owned by a negro i w...	train	hate
3	30484029_2	If I had one it would 've probably made the li...	train	noHate
4	13864997_3	Most White Western Women have the racial sense...	train	hate

Some of the top keywords by log-likelihood test found in out group of interest hate include:

keyword	likelihood
black	58.513906
jews	54.982564
negro	40.754824
ape	33.393051
race	24.915494
scum	24.367472

For more data exploration go to notebooks/EDA.ipynb (data is downloaded automatically in the notebook so you don't need to download it yourself).

Bert Model

My main goal in the data modeling phase was to compare 2 approaches: fine-tuning vs. embeddings. The first part of the notebook involves fine-tuning a Bert model (Distilbert) using the transformers (https://huggingface.co/transformers/) library. The only hyperparameter changed was number of epochs. Results:

model	epoch	eval_train_loss	eval_loss	eval_accHate	eval_accNoHate	eval_accAll
0	2.0	0.150157	0.438262	0.807531	0.836820	0.822176
1	3.0	0.048003	0.556381	0.799163	0.853556	0.826360
2	4.0	0.013872	0.701802	0.861925	0.820084	0.841004

We can notice the evaluation loss for 2 epochs is the lowest (this is the most relevant metric). For the second part of the notebook I used the pre-trained model (without tuning) to extract embeddings for the entire sentences and pass them as the input to the default head classifier (a 2 layer NN). I could have used any other classifier but I chose the same default head to make a fair comparison with the fine-tuning approach. There are multiple strategies to select embeddings; combinations I tried included:

CLS: embedding for [CLS] token.
LAST_MEAN: mean of all word embeddings from last layer.
LAST_MAX: max pooling of all word embeddings from last layer (max at each dimension).
LAST2_MEAN: mean of all word embeddings from second to last layer.
LAST2_MAX: max pooling of all word embeddings from second to last layer.

It should be noted that, although the [CLS] token is used for a general representation of the sentence for classification problems, this representation is meaningless unless fine-tuning occurs, otherwise the model wouldn't be motivated to fully funnel the sentence meaning in that single vector. The [CLS] token by itself is, therefore, a weak embedding; it would be better to test some other aggregations of the other vectors found in the hidden states. You can find more information about this topic in the bert-as-service library documentation (https://github.com/hanxiao/bert-as-service#speech_balloon-faq and https://hanxiao.io/2019/01/02/Serving-Google-BERT-in-Production-using-Tensorflow-and-ZeroMQ/).

Results for different strategies (all run for 2 epochs and same default parameters):

model	emb-strategy	eval_train_loss	eval_loss	eval_accHate	eval_accNoHate	eval_accAll
0	CLS	0.549668	0.574593	0.786611	0.673640	0.730126
1	LAST_MEAN	0.540307	0.559487	0.836820	0.698745	0.767782
2	LAST_MAX	0.601751	0.612578	0.828452	0.682008	0.755230
3	LAST2_MEAN	0.511449	0.542598	0.811715	0.686192	0.748954
4	LAST2_MAX	0.585415	0.599830	0.824268	0.686192	0.755230

We can see the best strategy by eval_loss is LAST2_MEAN. This is also the default implementation by the bert-as-service library.

Prediction API

The fine-tuned model has a small API for prediction (nlp_bert/nlp-api.py). To get predictions you send a POST request to endpoint:

model/predict
# to get embeddings
model/predict?embeddings=true

You can pass up to 5 sentences at a time (you can send more but the API will take only the first 5). POST request example:

curl -X POST "http://127.0.0.1:8000/model/predict?embeddings=true" \
    -H  "accept: application/json" \
    -H  "Content-Type: application/json" \
    -d "{\"text\":
            [\"Don't bother with black culture , it 's not worth the effort .\",
             \"Good going libtards , you ruined Rhodesia .\", 
             \"I would n't marry a asian even after cirurgy .\"]
        }"

Predictions at index 0 represent the noHAte class; index 1 represent hate class. The embeddings come from the [CLS] token. Output example:

{
  "predictions": [
    [
      0.44551581144332886,
      0.5544842481613159
    ],
    [
      0.44643911719322205,
      0.5535609126091003
    ],
    [
      0.47761672735214233,
      0.5223833322525024
    ]
  ],
  "embeddings": [
      [
        -0.0784786194562912,
        0.15456458926200867,
        -0.2955276072025299,
        -0.07512544095516205,
        -0.29915621876716614,
        ...
  ]
}

Project was built with poetry (https://python-poetry.org/), so it is recommended to get poetry first to install all the dependencies. After poetry installation you can simply clone this repo, navigate to the root directory, and run:

# install dependencies
poetry install

# running the API
cd nlp_bert
uvicorn nlp-api:app

The fine-tuned model will be downloaded automatically if its directory is not found.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
nlp_bert		nlp_bert
notebooks		notebooks
tests		tests
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repos 9A83 itory files navigation

Hate speech prediction

Background

Dataset

Bert Model

Prediction API

About

Uh oh!

Releases

Packages

Languages

PyAntony/hate-speech

Folders and files

Latest commit

History

Repos 9A83 itory files navigation

Hate speech prediction

Background

Dataset

Bert Model

Prediction API

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages