Nothing Special   »   [go: up one dir, main page]

Bots and Automation Over Twitter During The U.S. Election: Comprop

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Bots and Automation over Twitter during the U.S.

Election

COMPROP DATA MEMO 2016.4 / 17 NOV 2016

Bence Kollanyi Philip N. Howard Samuel C. Woolley


Corvinus University Oxford University University of Washington
kollanyi@gmail.com philip.howard@oii.ox.ac.uk samwooll@uw.edu
@bencekollanyi @pnhoward @samuelwoolley

ABSTRACT
Bots are social media accounts that automate interaction with other users, and political bots have been
particularly active on public policy issues, political crises, and elections. We collected data on bot activity using
the major hashtags related to the U.S. Presidential Election. We find that that political bot activity reached an
all-time high for the 2016 campaign. (1) Not only did the pace of highly automated pro-Trump activity increase
over time, but the gap between highly automated pro-Trump and pro-Clinton activity widened from 4:1 during
the first debate to 5:1 by election day. (2) The use of automated accounts was deliberate and strategic throughout
the election, most clearly with pro-Trump campaigners and programmers who carefully adjusted the timing of
content production during the debates, strategically colonized pro-Clinton hashtags, and then disabled activities
after Election Day.

WHAT ARE POLITICAL BOTS? application programming interface (API) that enables
A growing number of political actors and real-time posting and parsing of information.
governments worldwide are employing both people Bots are versatile, cheap to produce, and ever
and bots to shape political conversation. [1], [2] Bots evolving. Unscrupulous Internet users now deploy
can perform legitimate tasks like delivering news and bots beyond mundane commercial tasks like
information, or undertake malicious activities like spamming. Bots are the primary applications used in
spamming, harassment and hate speech. Whatever carrying out DDoS and virus attacks, email
their uses, bots on social media platforms are able to harvesting, and content theft. A subset of social bots
rapidly deploy messages, replicate themselves, and are given overtly political tasks and the use of political
pass as human users. bots varies from country to country. Political actors
Networks of such bots are called “botnets,” and governments worldwide have begun using bots to
a term combining “robot” with “networks” and a term manipulate public opinion, choke off debate, and
that is generally used to describe a collection of muddy political issues. Political bots tend to be
connected computers with programs that developed and deployed in sensitive political
communicate across multiple devices to perform moments when public opinion is polarized. How were
some task. There are legitimate botnets, like the Carna highly automated accounts used around Election Day
botnet, which gave us our first real census of device in the United States?
networks, and there are malicious botnets, like those
that are created to launch spam and distributed denial- SAMPLING AND METHOD
of-service (DDoS) attacks and to engineer theft of This data set contains approximately 19.4m tweets
confidential information, click fraud, cyber-sabotage, collected November 1-9, using a combination of
and cyberwarfare. [3], [4] Over social media, botnets hashtags associated with the primary Presidential
are interconnected automated accounts built to follow candidates. Since our purpose is to discern how bots
and re-message one another. These social botnets, are being used to amplify political communication,
often comprised of hundreds of unique accounts, can the analysis focuses upon the 18.9m tweets captured.
be controlled by one user operating from a single Twitter provides free access to a sample of
computer. the public tweets posted on the platform. The
Social bots are particularly prevalent on platform’s precise sampling method is not known, but
Twitter, but they are found on many different the company itself reports that the data available
platforms that increasingly form part of the system of through the Streaming API is at most one percent of
political communication in many countries. [5] the overall global public communication on Twitter
Highly automated accounts post, tweet, or message of any given time. [6] In order to get the most complete
their own accord. The most rudimentary bot profiles and relevant data set, the tweets were collected by
lack basic account information such as coherent following particular hashtags identified by the team as
screen names or profile pictures. Such accounts have being actively used during the debate. A few
become known as “Twitter eggs” because the default additional tags were added in the week before the
profile picture on that social media site is of an egg. election as they rose to prominence. The
While social media users get access from front-end programming of the data collection and most of the
websites, bots get access directly through a code-to- analysis were done by using the statistics package R.
code connection, mainly through the site’s wide-open

1
Selecting tweets on the basis of hashtags has
Table 1: Twitter Activity around Voting Day, 2016
the advantage of capturing the content most likely to All Tweets in Sample
be about this important political event. The streaming N %
API yields (1) tweets which contain the keyword or Pro-Trump 10,426,547 55.1
the hashtag; (2) tweets with a link to a web source, Pro-Clinton 3,618,778 19.1
Neutral 2,879,084 15.2
such as a news article, where the URL or the title of
Trump-Neutral 434,897 2.3
the web source includes the keyword or hashtag; (3) Clinton-Neutral 217,509 1.2
retweets that contain a message’s original text, Trump-Clinton 1,233,872 6.5
wherein the keyword or hashtag is used either in the Trump-Clinton-Neutral 99,563 0.5
retweet or in the original tweet; and (4) quote tweets Total 18,910,250 100.0
Source: Authors’ calculations from data sampled 1-9/11/16.
where the original text is not included but Twitter uses Note: Pro-Trump hashtags include #AmericaFirst, #benghazi,
a URL to refer to the original tweet. #CrookedHillary, #DrainTheSwamp, #lockherup, #maga3x,
Our method counted tweets with selected #MAGA, #MakeAmericaGreatAgain, #NeverHillary,
hashtags in a simple manner. Each tweet was coded #PodestaEmails, #projectveritas, #riggedelection, #tcot,
#Trump2016, #Trump, #TrumpPence16, #TrumpTrain,
and counted if it contained one of the specific #VoterFraud, #votetrump, #wakeupamerica; pro-Clinton
hashtags that were being followed. If the same hashtags include #Clinton, #ClintonKaine16, #democrats, #dems,
hashtag was used multiple times in a tweet, this #dnc, #dumptrump, #factcheck, #hillary2016, #Hillary,
method still counted that tweet only once. If a tweet #HillaryClinton, #hillarysupporter, #hrc, #ImWithHer,
#LastTimeTrumpPaidTaxes, #NeverTrump, #OHHillYes, #p2,
contained more than one selected hashtag, it was #strongertogether, #trumptape, #uniteblue; neutral hashtags
credited to all the relevant hashtag categories. include #Election2016, #Elections2016, #uselections, #uselection,
Unfortunately, not enough users geotag their #earlyvote, #iVoted, #Potus.
profiles to allow analysis of the distribution of this
support around the world or within the United States. Figure 1: Hourly Twitter Traffic, by Candidate Camp
Furthermore, analyzing sentiment on social media
such as Twitter is difficult. [7], [8] Contributions
using none of these hashtags were not captured in this
data set. It is also possible that users who used one or
more of these hashtags, but were not discussing the
election, had their tweet captured. Moreover, if people
tweeted about the election, but did not use one of these
hashtags or identify a candidate account, their
contributions were not analyzed here. Any
comparison with previous data memos should
consider that they were are based on shorter sample
periods taken during the presidential debates, taken on
different days of the week, and use a larger number of
Source: Authors’ calculations from data sampled 1-9/11/16.
relevant hashtags. Note: This figure is based on the hashtags used in the tweets
Much smaller proportions of the tweets were
FINDINGS AND ANALYSIS
categorized for mixes of hashtags. As human users
This sample allows us to draw some clear conclusions
made up their minds about whom to vote for and
about the character and process of political
began expressing their preferences over Twitter, the
conversation over Twitter during the election.
proportion of clearly pro-Trump and pro-Clinton
Specifically, we are able to both parse out the amount
content using hashtags from each camp rose to 74.2
of social media content related to the two major
percent.
candidates and investigate how much of this content
Figure 1 displays the rhythm of this traffic
is driven by highly automated accounts. We can parse
over the sample period. It reveals that, in contrast with
the volume of tweets by perspective, assess the level
the findings from our analysis of the debates, most
of automation behind the different perspectives, and
tweets contained either pro-Trump or pro-Clinton
evaluate the particular contribution of bots to the
hashtags. The use of neutral hashtags diminished by
traffic on this issue.
Election Day. Large dips in traffic coincide with night
Comparing the Candidates on Twitter.
time in the United States. Figure 1 includes a total of
Table 1 reveals that 18.90m tweets used some
18.9m tweets from 3.7m users who tweeted using the
combinations of these hashtags. This table reveals that
sampled hashtags, but not the candidate’s user names
the overall volume of pro-Twitter Trump traffic (55.1
because the @ mentions reveal little about the
percent), was much greater than the volume of tweets
political affinity of the user. During the election itself,
containing only hashtags associated with the Clinton
the amount of candidate-committed traffic
camp (19.1 percent). The overall volume of neutral
outstripped the volume of neutral traffic.
election-related traffic (15.2 percent) was also
Automated Political Traffic. A fairly
significantly smaller than the pro-Trump traffic.
consistent proportion of the traffic on these hashtags
was generated by highly automated accounts. These

2
accounts are often bots that are either irregularly Table 2: Twitter Content, By Hashtag and Level of Automation
curated by people or actively maintained by people Low High All
who employ scheduling algorithms and other % % N %
applications for automating social media Exclusive Hashtag Clusters
Pro-Trump 77.1 22.9 10,426,547 100
communication. We define a high level of automation
Pro-Clinton 86.4 13.6 3,618,778 100
as accounts that post at least 50 times a day using one Neutral 96.4 3.6 2,879,084 100
of these election related hashtags, meaning 450 or Mixed Hashtag Clusters
more tweets on at least one of these hashtags during Trump-Neutral 83.0 17.0 434,897 100
Clinton-Neutral 92.3 7.7 217,509 100
the data collection period.
Trump-Clinton 75.5 24.5 1,233,872 100
Extremely active human users might achieve Trump-Clinton-Neutral 86.7 13.3 99,563 100
this pace of social activity, especially if they are Sum 82.1 17.9 18,910,250 100
simply retweeting the content they find in their social Source: Authors’ calculations from data sampled 1-9/11/16.
media feed. And some bots may be relatively Note: Low volume users are average human users, high volume
accounts post more than 50 times per day on average.
dormant, waiting to be activated and tweeting only
occasionally. But this metric captures accounts Figure 2: Total Hourly Twitter Traffic around Voting Day,
generating significant amounts of issue-specific 2016, by Level of Automation
traffic wherein high levels of automation probable.
Finally, self-disclosed bots were identified by
searching for the term “bot” in either the tag or
account description. While this is a small proportion
of the overall accounts, we expect the actual number
of bots to be much higher—many bots, after all, are
built to avoid obvious methods of identification.
Future research will involve a more detailed analysis
of the disclosed and hidden bots and searching for a
wider range of terms referring to bots in the account
name and description data.
Table 2 reveals the different levels of
automation behind the traffic associated with clusters
of hashtags. To track the activity of political bots Source: Authors’ calculations from data sampled 1-9/11/16.
around election time, we have clustered the hashtags Note: We define heavily automated accounts as tweeting 50 times
or more per day on election topics.
by their candidate associations. To evaluate the role
of automation, we organize these clusters of opinion
based on hashtag use. After this, we create a use of software applications to automate their Twitter
subcategory of accounts that use high levels of presence and thus dominate conversation. During
automation. Table 2 indicates the level of traffic, by waking hours, highly automated accounts were
political camp and associated hashtags. This table generating between 20 and 25 percent of the traffic
distinguishes between the messages that exclusively about the election during the days leading up to the
used a hashtag known to be associated with a vote. On Election Day, the server was recording 170K
perspective and then the combinations of mixed tweets per hour and we reached the cap set by Twitter
tagging that are possible. When comparing the highly for capturing data—again, one percent of global
automated accounts tweeting for Trump versus those traffic captured in real time. The pace of automated
messaging for Clinton, it appears that the pro-Trump political campaigning dropped off after Election
tweets out-numbered pro-Clinton tweets 5:1 during Day—a reminder that campaigners and programmers
this period. behind bot accounts often disable their purpose-built
Table 2 also reveals that automation is used automation on victory.
at several different levels by accounts taking different Additional Observations on Automation.
perspectives in the election. The accounts using To understand the distribution of content production
exclusively neutral hashtags are rarely automated across these users, we then look at segments of the
(only about 4 percent reveal a high level of total population of contributors to these hashtags.
automation). However, one-third of all the tweets There is a noticeable difference between the usage
using a mixture of all hashtags are generated by patterns of typical human users and accounts that are
accounts that use high level of automation. bots or otherwise highly automated. For example, the
Figure 2 reveals the relative flow of traffic top 20 accounts, which were mostly bots and highly
overall alongside traffic from accounts with high automated accounts, averaged over 1,300 tweets a day
levels of automation. As with many Twitter-based and they generated more than 234,000 tweets during
conversations surrounding political events, the most this short period. The top 100 accounts, most of which
active accounts here are either obvious bots or users still used high levels of automation, generated around
with such high levels of automation that they are 450,000 tweets at an average rate of 500 tweets per
essentially bot-driven accounts—most likely making day. In contrast, the average account in the whole

3
sample generated only one tweet every second day.
While heavily automated accounts are usually the Table 3: Summary of Highly Automated Activity
most active, there is a long tail of human users with

Second Debate

Third Debate
only occasional Twitter activity.

First Debate
Highly automated accounts—the accounts

Election
that tweeted 450 or more times with a related hashtag
and user mention during the data collection period—
generated close to 18 percent of all Twitter traffic For each pro-Clinton tweet from a
about the Presidential election. Interestingly, Figure 2 highly automated account, the 4.4 4.2 6.9 4.9
number of pro-Trump tweets
also shows that automated postings significantly Percent of pro-Trump content from
decreased the day after election whereas, in the days highly automated accounts that
immediately before the election, highly automated either used pro-Trump hashtags 67.2 66.6 67.2 81.9
accounts generated between as much 25 percent of all or mixed with the pro-Clinton or
Neutral hashtags
the Twitter activity on these political hashtags. That Proportion of hashtag sample
volume is significant, considering that this number of generated by highly automated 23.3 26.1 27.2 17.9
posts was generated by only 4,160 highly automated accounts
accounts in a sample of more than 3.7m users. It is Source: Authors’ calculations from data sampled during the first
very difficult for human users to maintain this rapid debate (26-29/09), second debate (9-12/10), third debate (19-
22/10), and election (1-9/11).
pace of social media activity without some level of Note: We define heavily automated accounts as tweeting 50 times
account automation, though it is likely that not all of or more per day on election topics.
these are bot accounts.
traffic grew from 10.4 to 19.1 percent. As voters made
up their minds about who to vote for, their expression
CONCLUSIONS of commitment solidified in the use of clear candidate
Across the first three debates and the election (See specific hashtags. In the first debate 52.7 percent of
Data Memos 2016.1, 2016.2, and 2016.3) we find that the content was associated with a defined camp, but
the proportion of highly automated twitter activity by the election 74.2 percent of the content was
changed over time, increasing during the debates from associated with one candidate or the other.
23 to 27 percent, and then dropping to 18 percent In the first debate we scooped 9.0m tweets
during the lead up to the election. The pace of highly from 2.0m users who contributed to using 52
automated pro-Trump social media activity grew hashtags. For the second we scooped 11.5m tweets
from the first debate to the election. During the first from 2.0m users who contributed to 66 hashtags. For
debate, highly automated accounts generated four the third we scooped 10.0m tweets from 1.6m users
pro-Trump tweets for every pro-Clinton tweet. But who contributed to 72 hashtags. For the election
by Election Day, the highly automated accounts sample, we scooped 19.4m tweets from 3.7m unique
generated five pro-Trump tweets for every pro- users who contributed to 47 hashtags. We distinguish
Clinton tweet. between relatively low activity users who tweet
Table 3 summarizes the important trends occasionally and highly automated accounts that
across the major events of the 2016 campaigns season. generate more than 50 tweets day using at least one of
Pro-Trump traffic was many times higher than pro- these hashtags over the sample period.
Clinton traffic. Moreover, pro-Trump hashtags were Automated accounts tweeting with pro-
inserted into more and more combinations of neutral Clinton hashtags increased their activities from over
and pro-Clinton hashtags, such that by the time of the the course of the campaign period but still never
election fully 81.9 percent of the highly automated reached the level of automation behind pro-Trump
content involved some pro-Trump messaging. In traffic. In this sample the dominance of highly
many kinds of Twitter conversations, this means that automated pro-Trump tweets increased over
the pro-Trump accounts were moving into the automated pro-Clinton tweets to a level of 5:1.
political conversations that had previously involved We find that that political bot activity
neutral or pro-Clinton hashtags. The proportion of the reached an all-time high for the 2016 campaign. Not
overall sample generated by automation increased only did the pace of highly automated pro-Trump
over the debates. This proportion appears to have activity increase over time, but the gap between
diminished during the election—to 17.9 percent—but highly automated pro-Trump and pro-Clinton activity
this reflects the longer sample period and the fact that widened from 4:1 during the first debate to 5:1 by
many of the highly automated accounts were disabled election day. The use of automated accounts was
after Election Day. deliberate and strategic throughout the election, most
In the last debate 30.8 percent of the traffic clearly with pro-Trump campaigners and
about the debates was using relatively neutral programmers who carefully adjusted the timing of
hashtags, but this proportion was halved by Election content production during the debates, strategically
Day. In the lead up to Election Day, only 15.2 percent colonized pro-Clinton hashtags, and then disabled
of the traffic was using neutral hashtags, pro-Trump automated activities after Election Day.
traffic grew from 46.7 to 55.1 percent and pro-Clinton

4
ABOUT THE PROJECT [7] Z. Chu, S. Gianvecchio, H. Wang, and S.
The Project on Computational Propaganda Jajodia, “Who is tweeting on Twitter: human,
(www.politicalbots.org) involves international, and bot, or cyborg?,” in Proceedings of the 26th
interdisciplinary, researchers in the investigation of annual computer security applications
the impact of automated scripts—computational conference, 2010, pp. 21–30.
propaganda—on public life. Data Memos are [8] Cook, David, Waugh, Benjamin, Abdinpanah,
designed to present quick snapshots of analysis on Maldini, Hashimi, Omid, and Rahman,
current events in a short format. They reflect Shaquille Abdul, “Twitter Deception and
methodological experience and considered analysis, Influence: Issues of Identity, Slacktivism, and
but have not been peer-reviewed. Working Papers Puppetry,” Journal of Information Warfare,
present deeper analysis and extended arguments that vol. 13, no. 1.
have been collegially reviewed and that engage with [9] P. N. Howard, Pax Technica: How the Internet
public issues. The Project’s articles, book chapters of Things May Set Us Free. New Haven, CT:
and books are significant manuscripts that have been Yale University Press, 2015.
through peer review and formally published. [10] D. W. Butrymowicz, “Loophole.com: How the
Fec’s Failure to Fully Regulate the Internet
ACKNOWLEDGMENTS AND DISCLOSURES Undermines Campaign Finance Law,”
The authors gratefully acknowledge the support of the Columbia Law Review, pp. 1708–1751, 2009.
National Science Foundation, “EAGER CNS: [11] P. N. Howard, New Media Campaigns and the
Computational Propaganda and the Production / Managed Citizen. New York, NY: Cambridge
Detection of Bots,” BIGDATA-1450193, 2014-16, University Press, 2006.
Philip N. Howard, Principle Investigator and the
European Research Council, “Computational
Propaganda: Investigating the Impact of Algorithms
and Bots on Political Discourse in Europe,” Proposal
648311, 2015-2020, Philip N. Howard, Principal
Investigator. Project activities were approved by the
University of Washington Human Subjects
Committee, approval #48103-EG and the University
of Oxford’s Research Ethics Committee. Any
opinions, findings, and conclusions or
recommendations expressed in this material are those
of the authors and do not necessarily reflect the views
of the National Science Foundation or the European
Research Council.

REFERENCES
[1] M. C. Forelle, P. N. Howard, A. Monroy-
Hernandez, and S. Savage, “Political Bots and
the Manipulation of Public Opinion in
Venezuela,” Project on Computational
Propaganda, Oxford, UK, Working Paper
2015.1, Jul. 2015.
[2] P. N. Howard and B. Kollanyi, “Bots,
#StrongerIn, and #Brexit: Computational
Propaganda during the UK-EU Referendum,”
arXiv:1606.06356 [physics], Jun. 2016.
[3] “Carna botnet,” Wikipedia. 24-Nov-2015.
[4] “Denial-of-service attack,” Wikipedia. 15-Oct-
2016.
[5] A. Samuel, “How Bots Took Over Twitter,”
Harvard Business Review, 19-Jun-2015.
[Online]. Available:
https://hbr.org/2015/06/how-bots-took-over-
twitter. [Accessed: 23-Jun-2016].
[6] F. Morstatter, J. Pfeffer, H. Liu, and K. M.
Carley, “Is the Sample Good Enough?
Comparing Data from Twitter’s Streaming API
with Twitter’s Firehose,” arXiv:1306.5204
[physics], Jun. 2013.

You might also like