Nothing Special   »   [go: up one dir, main page]

Skip to content

hsdslab/topicality-online

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Topicality Boosts Popularity Online

Supplementary data for the paper the Topicality Boosts Popularity: A Comparative Analysis of NYT Articles and Reddit Memes - Barnes et al. (2024)

How to cite

@article{barnes2024topicality,
   title={Topicality boosts popularity: a comparative analysis of NYT articles and Reddit memes},
   author={Barnes, Kate and Juh{'a}sz, P{'e}ter and Nagy, Marcell and Molontay, Roland},
   journal={Social Network Analysis and Mining},
   volume={14},
   number={1},
   pages={119},
   year={2024},
   publisher={Springer}
}

Source

The Reddit data was collected using the Pushshift API, and it consists of all posts starting from January 1st, 2018 to November 14th, 2022 from the r/memes subreddits.

The New York Times data was collected using the NYT Archive API, and it consists of all articles published in the same time frame.

Open access to both data sets in the data folder.

Summary of data

A total of 255,783 NYT articles were distilled into 2 sets of 120 topics using the BERTopic and LDA algorithms.

After cleaning steps, the Reddit data set included 899,766 meme posts. We used several meta data features and engineered novel features from the meme images:

Feature Type Description
score int upvotes - downvotes
comments int number of comments
viral binary indicates if meme score is in top 5%
date string date on which meme was posted
day-of-week category day on which meme was posted
all-text string title, BLIP image caption, OCR text
title length int number of characters in title
thumbnail size float size of image thumbnail
over18 binary Reddit content warning
emoji binary identifies emojis in text
sentiment float text valence score
HSV/RGB floats mean image values
10 colors floats normalized pixels of color in image
face binary indicates face in image
topic int highest probability topic assigned to meme
probability float topic's probability
entropy float entropy of topic distribution assigned to meme
monthly float monthly average topicality
daily float daily average topicality
slope float slope of topicality distribution

Code files

Python code files with exploratory analysis of the data sets, feature extraction, topic analysis and popularity prediction are available in the code folder.

Supplementary figures

The 5-year daily distributions of all topics explored in this project can be found in the figures folder or at this link. Daily and monthly distributions for 23 of these topics are displayed and discussed in more detail in the article.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%