Data Analytics For The Social Sciences: Applications in R 1st Edition Garson Download PDF
Data Analytics For The Social Sciences: Applications in R 1st Edition Garson Download PDF
Data Analytics For The Social Sciences: Applications in R 1st Edition Garson Download PDF
com
https://ebookmeta.com/product/data-analytics-for-
the-social-sciences-applications-in-r-1st-edition-
garson/
OR CLICK BUTTON
DOWLOAD EBOOK
https://ebookmeta.com/product/social-big-data-analytics-
practices-techniques-and-applications-bilal-abu-salih/
https://ebookmeta.com/product/social-data-analytics-1st-edition-
beheshti/
https://ebookmeta.com/product/data-analytics-for-social-
microblogging-platforms-1st-edition-soumi-dutta/
https://ebookmeta.com/product/urban-analytics-with-social-media-
data-foundations-applications-and-platforms-1st-edition-tan-
yigitcanlar/
Online Learning Analytics (Data Analytics Applications)
1st Edition Jay Liebowitz
https://ebookmeta.com/product/online-learning-analytics-data-
analytics-applications-1st-edition-jay-liebowitz/
https://ebookmeta.com/product/advances-in-big-data-analytics-1st-
edition-hamid-r-arabnia/
https://ebookmeta.com/product/exploratory-data-analytics-for-
healthcare-1st-edition-r-lakshmana-kumar-editor/
https://ebookmeta.com/product/business-analytics-data-science-
for-business-problems-walter-r-paczkowski/
https://ebookmeta.com/product/regression-analysis-in-r-a-
comprehensive-view-for-the-social-sciences-1st-edition-jocelyn-e-
bolin/
Data Analytics for the Social Sciences
Data Analytics for the Social Sciences is an introductory, graduate-level treatment of data analytics for social
science. It features applications in the R language, arguably the fastest growing and leading statistical tool for
researchers.
The book starts with an ethics chapter on the uses and potential abuses of data analytics. Chapters 2 and 3 show
how to implement a broad range of statistical procedures in R. Chapters 4 and 5 deal with regression and classifica-
tion trees and with random forests. Chapter 6 deals with machine learning models and the “caret” package, which
makes available to the researcher hundreds of models. Chapter 7 deals with neural network analysis, and Chapter
8 deals with network analysis and visualization of network data. A final chapter treats text analysis, including web
scraping, comparative word frequency tables, word clouds, word maps, sentiment analysis, topic analysis, and more.
All empirical chapters have two “Quick Start” exercises designed to allow quick immersion in chapter topics, fol-
lowed by “In Depth” coverage. Data are available for all examples and runnable R code is provided in a “Command
Summary”. An appendix provides an extended tutorial on R and RStudio. Almost 30 online supplements provide
information for the complete book, “books within the book” on a variety of topics, such as agent-based modeling.
Rather than focusing on equations, derivations, and proofs, this book emphasizes hands-on obtaining of output
for various social science models and how to interpret the output. It is suitable for all advanced level undergraduate
and graduate students learning statistical data analysis.
G. David Garson teaches advanced research methodology in the School of Public and International Affairs, North
Carolina State University, USA. Founder and longtime editor emeritus of the Social Science Computer Review, he
is president of Statistical Associates Publishing, which provides free digital texts worldwide. His degrees are from
Princeton University (BA, 1965) and Harvard University (PhD, 1969).
Data Analytics for the Social Sciences
Applications in R
G. David Garson
First published 2022
by Routledge
2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN
and by Routledge
605 Third Avenue, New York, NY 10158
The right of G. David Garson to be identified as author of this work has been asserted by them in accordance with sections 77
and 78 of the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic,
mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information
storage or retrieval system, without permission in writing from the publishers.
Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification
and explanation without intent to infringe.
DOI: 10.4324/9781003109396
Typeset in Times
by KnowledgeWorks Global Ltd.
• All empirical chapters have two “Quick Start” exercises designed to allow students to immerse themselves
quickly in and obtain successful results from R analyses related to the chapter topic.
• In the Support Material (www.routledge.com/9780367624293), all chapters have an abstract, which gives an
overview of the contents.
• All chapters have “Review Questions” for students in the student section of the Support Material (www.
routledge.com/9780367624293), with answers and comments in the instructor section.
• All chapters have text boxes highlighting the applicability of the chapter topic, and of R, to recent published
examples of social science research.
xviii Preface
In terms of organization of the book, I chose to start with a chapter on the uses and potential abuses of data analyt-
ics, emphasizing issues in ethics. Chapters 2 and 3 show how to implement a broad range of statistical procedures
in R. I thought this to be important in order that students and researchers see data analytics in R as something
having great continuity with what they already know. Chapters 4 and 5 deal with regression and classification trees
and with random forests. In addition to being valuable tools for prediction and classification in their own right,
these particular tools are often found desirable because they imitate the way ordinary people make decisions and
because they can be visualized graphically. Chapter 6 deals with machine learning models such as support vector
machines. A focus is placed on the “caret” package, which makes available to the researcher dozens of types of
models and facilitates comparison of their results on a cross-validated basis. Chapter 7 deals with neural network
analysis, a topic associated in the public eye with “artificial intelligence” and which also is a tool that may generate
superior solutions. Chapter 8 focuses on network analysis. A very broad range of social science data may be treated
as network data, and data relationships may be visualized in network diagrams. A final chapter treats text analysis,
including text acquisition through web scraping and other means; showing text relationships through comparative
word frequency tables, word clouds, and word maps; and use of sentiment analysis and topic analysis. In fact, topics
are so numerous that for space reasons some content is placed in online supplements in the Support Material (www.
routledge.com/9780367624293) to the text. Some supplements, such as agent-based modeling, are “books within the
book” bonuses for the reader of this text.
Data analytics represents a paradigm shift in social science research methodology. When I took my first teaching
position at Tufts University, we ran statistics, often in the Fortran language, on a “mainframe” with only 8 kilobytes
of memory! The “computer lab” at my next teaching position, at North Carolina State University, was initially
centered on sorting machines for IBM punch-card data. The teaching of research methods since then has been a
constant process of learning new tools and procedures. As social scientists we need to ride the wave of the paradigm
shift, not fear the learning curve all new things bring with them. I hope this book can be a small contribution to what
can only be described as a revolution in the teaching of research methods for social science. Happy data surfing!
G. DAVID GARSON
School of Public and International Affairs
North Carolina State University
April, 2021
Chapter 1
Using and abusing data
analytics in social science
1.1 Introduction
The use and abuse of data analytics (DA), data science, and artificial intelligence (AI) is of major concern in
business, government, and academia. In late 2019, based on a survey of 350 US and UK executives involved in
AI and machine learning, DataRobot (2019a, 2019b), itself a developer of machine learning automation plat-
forms, issued a news release on its report, headlining “Nearly half of AI professionals are ‘very to extremely’
concerned about AI bias.” Critics think the percentage should be even higher. This chapter has a triple pur-
pose. First, published literature in the social and policy sciences is used to illustrate the promise of big data
and DA, highlighting a variety of specific ways in which DA are useful. However, the other two sections of
this chapter are cautionary. In the second section, inventory threats to good research design common among
researchers employing big data and DA are discussed. The third section inventories various ethical issues
associated with big data and DA. The question underlying this chapter is whether, in terms of big data and
DA, we are marching toward a better society or toward an Orwellian “1984”. As in all such questions, the
answer is, “Some of both”.
Before beginning, a word about terminology is needed. The terms “data science”, “data analytics”, “machine
learning”, and “artificial intelligence” overlap in scope. In this volume, these “umbrella” terms may be used
interchangeably by the author and by other authors who are cited. However, connotations differ. Data science
suggests work done by graduates of data science programs, which are dominated by computer science depart-
ments. DA connotes the application of data science methods to other disciplines, such as social science. Machine
learning refers to any of a large number of algorithms which may be used for classification and prediction. AI
refers to algorithms that adjust and hopefully improve in effectiveness across iterations, such as neural networks
of various types. (In this book we do not refer to the broader popular meaning of artificial human intelligence
as portrayed in science fiction.) The common denominator of all these admittedly fuzzy terms is what is often
called “algorithmic thinking”, meaning reliance on computer algorithms to arrive at classifications, predictions,
and decisions. All approaches may utilize “big data”, referring to the capacity of these methods to deal with enor-
mous sets of mixed numeric, text, and even video data, such as may be scraped from the internet. Big data may
magnify bias associated with algorithmic thinking but it is not a prerequisite for bias and abuse in the application
of data science methods.
Official policy on ethics for information technology, including DA, is found in the 2012 “Menlo Report” of the
Directorate of Science & Technology of the US Department of Homeland Security. This report was followed up
by a “companion” document containing case studies and further guidance (Dittrich, Kenneally, & Bailey, 2013).
DOI: 10.4324/9781003109396-1
2 Using and abusing data analytics
The Menlo guidelines contain highly generalized guidelines for ethical practice in the domain of DA. In a nutshell,
it sets out four principles that are as follows:
1. Respect for persons: DA projects should be based on informed consent of those participating in or impacted
by the project.
The problem, of course, is that the whole basis of “big data” approaches is that huge amounts of data are col-
lected without realistic possibility of gathering true informed consent. Even when data are collected directly
from the person, consent takes the form of a button click, giving “consent” to fine print in legalese. This token
consent may even be obtained coercively as failure to click may deny the person the right to make a purchase
or obtain some other online benefits.
2. Beneficence: This is the familiar “do not harm” ethic with roots going back to the Hippocratic Oath for doc-
tors. In practical terms, DA projects are called upon to undertake systematic assessments of risks and harms
as well as benefits.
The problem is that DA projects are mostly commissioned with deliverables set beforehand and with tight
timetables. For the most part, the technocratic staff of DA projects is ill-trained to undertake true cost-benefit
studies even if time constraints and work contracts are permitted. The Menlo Report itself provides a giant
loophole, noting that there are long-term social benefits to having research. It is easy to see these benefits as
outweighing diffuse costs which take the form of loss of confidentiality and privacy, violations of data integ-
rity, and individual or group impairment of reputation. The reality is that few, if any, DA projects are halted
due to lack of “beneficence”, though placing a privacy policy on one’s website or obtaining pro forma
“consent” is commonplace. The costs in time and money of challenging shortcomings in “beneficence” falls
of the aggrieved person, who often finds pro-business legislation and courts, not to mention the superior legal
staff of corporations and governments, make the chance of success dim.
3. Justice: The principle of information justice means that all persons are treated equally with regard to data
selection without bias. Also, benefits of information technology are to be distributed fairly.
The problem is that on the selection side, profiling is inherent in big data analysis. Profiling, in turn, is famously
subject to bias. On the fair distribution side, the Menlo Report and DA projects generally interpret fairness in
terms of individual need, individual effort, societal contribution, and overall merit. These fairness concepts
are subjective and extremely vague. If information justice is considered at all, it is easy to rationalize to justify
DA practices without need for revision.
4. Respect for law and the public interest: DA projects should be based on legal “due diligence”, transparency
with regard to DA methods and results, and DA should be subject to accountability.
DA projects lack “due diligence” if there is no evidence that some effort was undertaken to conform to relevant
laws dealing with privacy and data integrity. The corporation or government agency which commissions a DA
project is wise to have such evidence, usually in the form of an official privacy policy, a policy on data sharing,
and so on. These policies are frequently posted on the web, giving evidence of “transparency”. The problem is
that this primarily serves for legal protection of the corporation or government entity and is rarely a constraint
on what the DA project actually does.
It is common in many domains for ethical guidelines to lack impact. An illustration at this writing is the ethical
standards document of the American Society for Public Administration in the era of the Trump presidency and its
many challenges to ethics. Like that document, the usefulness of the Menlo Report is primarily to call attention to
ethical issues, not actually to regulate DA projects.
Ostensibly, every US federal agency has appointed a “data steward” responsible for each database it maintains.
While this is different from each algorithm-based program, most agencies have a data steward statement of respon-
sibilities that often includes responsibilities in the areas of data privacy, transparency, and other values. An example
is in the “Readings and References” section of the student Support Material (www.routledge.com/9780367624293) for
this book.1 There may be a Data Stewardship Executive Policy Committee to oversee data stewardship, as there is
in the US Census Bureau. A literature review by the author was unable to find even a single empirical study of the
Using and abusing data analytics 3
effectiveness of governmental data stewards, though prescriptive articles on what makes a data steward effective
abound. “The proof is in the pudding” must be the investigatory rule here. Much of this chapter is devoted to illus-
trations of problems with the pudding.
Petrozzino (2020), addressing the Menlo Report, has argued that formal ethical principles do make a differ-
ence. Petrozzino, a Principal Cybersecurity Engineer within the National Security Engineering Center operated by
MITRE for the US Department of Defense, concluded her analysis by writing, “The enthusiasm of organizations to
use big data should be married with the appropriate analysis of potential impact to individuals, groups, and society.
Without this analysis, the potential issues are numerous and substantively damaging to their mission, organization,
and external stakeholders” (p. 17). Like Biblical principles of morality, it is largely up to the individual to act upon
ethical principles. However, it is thought better for the DA project director to have principles than not to have them!
The sciences do not try to explain, they hardly even try to interpret, they mainly make models. By
a model is meant a mathematical construct which, with the addition of certain verbal interpreta-
tions, describes observed phenomena. The justification of such a mathematical construct is solely
and precisely that it is expected to work – that is correctly to describe phenomena from a reason-
ably wide area.
It is exceptional that one should be able to acquire the understanding of a process without hav-
ing previously acquired a deep familiarity with running it, with using it, before one has assimilated
it in an instinctive and empirical way… Thus any discussion of the nature of intellectual effort in
any field is difficult, unless it presupposes an easy, routine familiarity with that field. In mathemat-
ics this limitation becomes very severe.
Truth is much too complicated to allow anything but approximations.
There’s no sense in being precise when you don’t even know what you’re talking about.
Can we survive technology?
the data on a multivariate basis. When such a model does not exist, as is often the case, analysis is exploratory at
best and is “not ready for prime time”.
This example illustrates how machine learning and AI can maintain and amplify inequity. Most algorithms
exploit crude correlations in data. Yet these correlations are often by-products of more salient social relationships
(in the health-care example, treatment that is inaccessible is, by definition, cheaper), or chance occurrences that will
not replicate.
To identify and mitigate discriminatory relationships embedded in data, we need models that capture or account
for the causal pathways that give rise to them.
Information is factual. Knowledge is interpretive. As soon as the analyst seeks to understand what data mean
inherently, the subjective process of interpretation has begun. Indeed, subjectivity antecedes data collection since
the researcher must selectively decide what information to collect and what to ignore. Even if their topic is the same,
different researchers will make different decisions about the types, sources, variables, dates, and other aspects
of their intended data corpus, whether quantitative or textual, “big” or traditional. Thus David Bollier (2010: 13)
observed, “Big Data is not self-explanatory”. He gives the example of data-cleaning. All data, perhaps especially big
data, require cleaning. Cleaning involves subjective decisions about which data elements matter. Cleaned data are
no longer objective data yet data cleaning is essential. When data come from multiple sources, each with their own
biases and sources of error, the problem is compounded.
privileged position in terms of scholarship. From a scholar’s perspective, such access is valuable and worth protect-
ing and this vested interest can produce bias. Thus boyd and Crawford (2012: 674) observed, “Big Data researchers
with access to proprietary data sets are less likely to choose questions that are contentious to a social media com-
pany if they think it may result in their access being cut. The chilling effects on the kinds of research questions that
can be asked – in public or private – are something we all need to consider.”
1. Interpretation is sounder when the data sample is randomly selected from the universe to which the researcher
wishes to generalize, or at least are representative of the desired sampling frame.
2. Model specification must include the proper variables. For instance, Wykstra, (2018) noted how an algorithm
assigning scores predicting likelihood of recidivism was dramatically different depending on whether the
predictor was past arrests or past convictions. If the true causes are not included in the model (and true causes
are often unknowable or if good indicators of the true causes are not available) the reliability of the model, and
hence the rate of false predictions can pose serious problems.
An example of big data bias based on scraping social media comments is given by Papakyriakopoulos, Carlos, and
Hegelich (2020), who studied German users’ political comments and parties’ posts on social media. “We quanti-
tatively demonstrate”, they wrote, “that hyperactive users have a significant role in the political discourse: They
become opinion leaders, as well as having an agenda-setting effect, thus creating an alternate picture of public opin-
ion.” The authors found hyperactive users participated in discussions differently, liked different content, and that
they became opinion leaders whose comments were more popular than those of ordinary users. Other research has
shown that some hyperactive users are paid political spammers or even “bots”, not random individuals who happen
to be more active. The bias introduced by hyperactive users translates directly into bias in recommender systems,
Using and abusing data analytics 7
such as those used by Facebook and all major social networks, leading to “the danger of algorithmic manipulation
of political communication” by these networks.
Based on article counts in Summon for the 2014–2019 period, Facebook and Twitter were the dominant sources
of data for scholarly articles (about 280,000 articles each), followed by YouTube (116 k), Instagram (75 k), and
WhatsApp (20 k). This huge number of articles reflects the relative ease with which social scientists may scrape
social media data. In this section, we take Twitter data as an example, but its limitations often are similar to limita-
tions on all social media data.
Three of the many limitations of Twitter data are those listed below.
1. Problems in acquiring unbiased data: Twitter is popular among scholars because it provides some tweets
through its public APIs. A few companies and large institutions have access to theoretically all public tweets
(those not made private by users). The great majority of researchers must be content with access to 10% or 1%
Twitter streams covering a time-limited period. The sampling process is not revealed in detail to research-
ers. Some tweets are eliminated because they come from protected accounts. Others are eliminated because
not-entirely-accurate algorithms determine they contain spam, pornography, or other forbidden content. For
those that are included, there is the problem of overcounting due to some people having multiple accounts and
undercounting because sometime multiple people use the same account. Then there is the much-publicized
problem that a nontrivial amount of use reflects bots, which send content on an automated basis or reflects the
work of banks of human agents working for some entity.
2. Difficulty in defining users: It is difficult to distinguish just what Twitter “use” and “participation” is. A few
years back, Twitter (2011) noted that 40% of active users are passive, listening but not posting. With survey
research it is possible, for example, to analyze the views of both those who voted and also the views of non-
voters. In contrast, in Twitter research it is not possible to compare the sentiments of those who tweeted with
sentiments of those who just listened.
3. Dangers of pooling data: When handing data from multiple sources, pooling issues arise. Serious errors of
interpretation may well arise when different sets of data are combined, as not infrequently happens in “big
data” research on social media sources. These problems are outlined, for instance, in Knapp (2013). Suffice it
to say, combining social media data from multiple sources may be difficult or impossible to do without incur-
ring bias.
In the selection and weighting of variables, biases may be introduced by the analyst creating the algorithm or by
his or her employer. There is even the possibility of a politics of algorithms, in which interested parties lobby to have
their interests represented. For instance, there is possible bias in the credit rating industry, as when groups lobby a
credit bureau to have membership in their organization counted as a plus or when discount stores lobby to have high
rates of credit card spending in their stores not count as a minus. Zarsky (2016: 125) concluded, “Lobbying obvi-
ously increases unfair outcomes of the processes mentioned because it facilitates a biased decision-making process
that systematically benefits stronger and well-organized social segments (and thus is unfair to weaker segments).”
The problem of subjectivity in the development of algorithms is compounded by the tendency of data scientists
and the public alike to anthropomorphize them. David Watson observed, “Algorithms are not ‘just like us’ and the
temptation to pretend they are can have profound ethical consequences when they are deployed in high-risk domains
like finance and clinical medicine. By anthropomorphizing a statistical model, we implicitly grant it a degree of
agency that not only overstates its true abilities, but robs us of our own autonomy” (Watson, 2019: 435). The prob-
lem is that it is not ethically neutral to blindly accept that AI, being rooted in neural sciences of the human mind,
is therefore to be seen, as human beings are seen, as agents having their own set of ethics. Rather than being like
humans, AI applications are tools. Like all tools, they tend to be used in the interest of those who fund them. While
it is common to observe that DA may be used for good or evil, a more accurate generalization is to say that on aver-
age, DA tends to serve powerful interests in society. Ethical vigilance by human beings is of utmost importance.
free and even if the CRAN maintainers have extensive screening in place, the burden will still be on the end users to
test/scan the downloaded packages (whether in source or binary form), according to some a priori defined standard
operating procedures, to achieve a level of confidence, that the packages pass those tests/scans.”3 However, the end
user is typically ill-equipped to evaluate bias and error in the algorithms underlying packages the user intends to
employ.
Of course, proprietary statistical and other software also may contain algorithmic errors. Moreover, unlike
R and Python packages, with commercial packages source code is not available for inspection for the most part.
However, companies do have paid staff to undertake quality control and vetting, and capitalist competition moti-
vates companies to offer products which “work” lest profits suffer. In the community-supported world of R and
Python, in contrast, such quality control work is unpaid, unsystematic, and idiosyncratic. For these reasons this
author recommends that in the area of statistical methods that researchers cross-check and confirm critical results
obtained from R and Python packages with results from major commercial packages. Even when results can be
verified by forcing correct settings, the researcher may find default settings in community-supported software may
be unconventional.
1. Human welfare: Algorithm-driven decisions on matters ranging from employment to education may lead to de
facto discrimination against and unfair treatment of citizens.
2. Autonomy: DA-driven profiling and consumer targeting can undermine the exercise of free choice and affect
the news, politics, product advertising, and even cultural information to which the individual I exposed.
3. Justice: Algorithmic profiling can flag false positives or false negatives in law enforcement, resulting in sys-
tematic unfairness and injustices.
4. Solidarity: Non-transparent decisions made by complex algorithms based on big data may prioritize some
groups over others without ever affording the opportunity for the mobilization of potential group solidarity in
defense against these decisions.
5. Dignity: Algorithmic profiling can lead to stigmatization on assault on human dignity. Being treated “as a
number” is inherent in algorithmic policymaking but is also inherently dehumanizing to the affected indi-
vidual, who would often favor case-by-case decision by human beings. Mannes (2020: 61) thus writes about
AI that it cannot only produce financial loss or even physical injury, but it also can cause “more subtle harms
such as instantiating human bias or undermining individual dignity.”
6. Non-maleficence: Non-maleficence refers to the medical principle of doing no harm, such as by a doctor’s duty
to end course of treatment found to be harmful. Big data analytics, however, puts non-maleficence as a value
under pressure due to the prevalence of non-transparent data reuse and repurposing.
7. Accountability: Citizens affected by DA algorithms may well be unaware they are affected and even if aware,
may not understand the implications of related decisions affecting them, and even if they do understand, citi-
zens my well not know who to try to hold accountable or how to do so.
8. Privacy: Even when “opt-in” or “opt-out” privacy protections are in place, the correlations among variables
in personal data in big data initiatives allow for easy re-identification and consequent intrusion on privacy.
Studying verbatim Twitter quotations found in journal articles, for instance, Ayers, Nebeker, and Dredze
(2018) found that in 84% of cases, re-identification was possible.
9. Environmental welfare: The “digitalization of everything” also has indirect environmental effects, neglect
of which is an ethical issue. An example is neglecting the issue of increased lithium mining to support the
Using and abusing data analytics 11
millions of batteries needed in a digital world, knowing that lithium mining is associated with chemical leak-
age and soil and water pollution. Impacts are not equally distributed, raising issues of environmental justice
as well.
10. Trustworthiness: Ethically negative consequences enumerated above may well lead to diminished trust in
institutions associated with these consequences. Diminished trust, in turn, is associated with diminished
social capital and with negative consequences for society as a whole.
impose fees for better access. These authors wrote, “This produces considerable unevenness in the system: Those
with money – or those inside the company – can produce a different type of research than those outside. Those
without access can neither reproduce nor evaluate the methodological claims of those who have privileged access.
It is also important to recognize that the class of the Big Data rich is reinforced through the university system: Top-
tier, well-resourced universities will be able to buy access to data, and students from the top universities are the ones
most likely to be invited to work within large social media companies. Those from the periphery are less likely to
get those invitations and develop their skills.” The result of the academic digital divide is a widening of the gap in
the capacity to do scholarship with big data.
1.4.3 Discrimination
Scholarly studies have routinely found that computer algorithms, the fodder of DA, may promote bias. A 2015
Carnegie Mellon University study of employment websites found that Google’s algorithms listed high-paying jobs
to men at about six times the rate that the same add was displayed to women. A University of Washington study
found that Google Images searches for “C.E.O.” returned 11% female images whereas the percentage of CEOs who
are women is over twice that (27%). Crawford (2017) gives numerous instances of discriminatory effects, such as AI
applications classifying men as doctors and women as nurses, or not processing darker skin tones. Based on research
in the field, Garcia (2016: 112) observed, “It doesn’t take active prejudice to produce skewed results in web searches,
data-driven home loan decisions, or photo-recognition software. It just takes distorted data that no one notices and
corrects for. Thus, as we begin to create artificial intelligence, we risk inserting racism and other prejudices into the
code that will make decisions for years to come.”
The complexity of fairness/discrimination issues involving data analytics and big data are illustrated in the
debate between ProPublica and the firm “equivant” (formerly Northpointe) over the COMPAS system. COMPAS,
the Correctional Offender Management Profiling for Alternative Sanctions system, is widely used in the correc-
tional community to identify likely recidivists and is advertise by the equivant company as “Software for Justice”.
Presumably COMPAS information is used by law enforcement for closer tracking of former inmates with high recid-
ivism COMPAS scores. A 2016 study by the public interest group ProPublica showed that COMPAS “scored black
offenders more harshly than white offenders who have similar or even more negative backgrounds” (Petrozzino,
2020: 2, referring to Angwin et al., 2016). The equivant company responded by arguing there was no discrimination
since the COMPAS accuracy rate was not significantly different for whites as compared to blacks, and thus was fair.
ProPublica, in turn, defended their charge of discrimination in a later article (Dressel & Farid, 2018) which argued
that fairness should not be gauged by overall accuracy but by the “false positive” rate, since that reflected the area of
potential discriminatory impact. By that criterion, COMPAS had a significantly higher false positive rate for blacks
than for whites. Dressel and Farid concluded, “Black defendants who did not recidivate were incorrectly predicted
to reoffend at a rate of 44.9%, nearly twice as high as their white counterparts at 23.5%; and white defendants who
did recidivate were incorrectly predicted to not reoffend at a rate of 47.7%, nearly twice as high as their black coun-
terparts at 28.0%. In other words, COMPAS scores appeared to favor white defendants over black defendants by
underpredicting recidivism for white and overpredicting recidivism for black defendants.” In this case, fairness or
information justice could be defined in two ways, leading to opposite inferences. It is hardly surprising that those
responsible for and heavily invested in a DA project like COMPAS chose to select a fairness definition favorable to
their interests. It is not so much a case of “lying with statistics” as it is a case of data analysis resting on debatable
assumptions and definitions.
A 2019 systematic literature review of big data and discrimination by Maddalena Favaretto and her colleagues at
the Institute for Biomedical Ethics, University of Basel, found that most research addressing big data and discrimi-
nation focused on such recommendations as better algorithms, more transparency, and more regulation (Favaretto,
De Clercq, and Elger, 2019). However, these authors found that “our study results identify a considerable number of
barriers to the proposed strategies, such as technical difficulties, conceptual challenges, human bias and shortcom-
ings of legislation, all of which hamper the implementation of such fair data mining practices” (p. 23). Moreover,
the DA literature was found to have rarely discussed “how data mining technologies, if properly implemented, could
also be an effective tool to prevent unfair discrimination and promote equality” (p. 24). That is, existing research
focuses on avoiding discriminatory abuse of big data systems, neglecting the possible use of big data to mitigate
discrimination itself.
Using and abusing data analytics 13
Algorithms may enact practices which violate the law. In July, 2020, the Lawyers’ Committee for Civil Rights
under Law filed an amicus brief in a lawsuit against Facebook for redlining, an illegal practice by which minority
groups are effectively obstructed from financing, such as for the purchase of homes in certain areas. Referring
to Facebook financial services advertisements, The Lawyers’ Committee for Civil Rights Under Law (2020)
argued, “Redlining is discriminatory and unjust whether it takes place online or offline and we must not allow
corporations to blame technology for harmful decisions made by CEOs”. The lawsuit contended that digital
advertising on Facebook discriminated based on the race, gender, and age of its users and then provided different
services to these users, excluding them from economic opportunities. This discriminatory practice was based on
profiling of Facebook users. Different users were provided different services based on their algorithm-generated
profiles, resulting in “digital redlining”. (At this writing the case (Opiotennione v. Facebook, Inc.) has not been
adjudicated.)
Likewise, discrimination is inherent in big data systems, which are more effective for some racial groups than
others. The MIT Media Lab, for instance, found that facial recognition software correctly identified white males
99–100% of the time, but the rate for black women was as low as 65% (Campbell, 2019: 54). The higher the rate of
misidentifications, the greater the chance that actions taken on the basis of the algorithms of such software might
be racially discriminatory. Concerns over misidentification using algorithms led San Francisco in May, 2019, to
become the first city to ban facial-recognition software in its police department. The American Civil Liberties
Union (ACLU) has demanded a ban on using facial recognition software by the government and law enforcement
after finding that “Facial recognition technology is known to produce biased and inaccurate results, particularly
when applied to people of color” (Williams, 2020: 11).
In a test, the ACLU ran images of members of Congress against a mug shot database, finding 28 instances where
members of Congress were wrongly identified as possible criminals. Again, people of color were disproportion-
ately represented in the false positive group, including civil rights leader John Lewis (Williams, 2020: 13). A later
ACLU report headlined, “Untold Number of People Implicated in Crimes They Didn’t Commit Because of Face
Recognition” (ACLU, 2020). Inaccuracy, however, has not prevented its widespread and growing use and convic-
tions based on identifications by facial recognition software. Likewise, ICE now routinely uses facial recognition
software to sift through ID cards and drivers’ licenses to find and deport undocumented people in a secret system
largely devoid of protections for those fingered by the software (Williams, 2020: 13).
Discriminatory impacts are even more likely when the algorithm in question draws on discriminatory views in
Twitter and other social media. Garcia (2016: 111) gives the example of “Tay”, an AI bot created by Microsoft for use
on Twitter. The intent of the algorithm was to create a self-learning AI conversationalist. The one-day Tay experi-
ment ended in failure when, starting with neutral language, “in a mere 12 hours, Tay went from upbeat conversa-
tionalist to foul-mouthed, racist Holocaust denier who said feminists ‘should all die and burn in hell’ and that the
actor ‘Ricky Gervais learned totalitarianism from Adolf Hitler, the inventor of atheism’”. That is, the Tay algorithm
amplified existing extremist views of a discriminatory nature.
1. The modeling effect: Social science research (e.g., Riccucci, Van Ryzin, and Li, 2016) has shown that when
roles are representative by gender, race, or other categories, people from those categories are more likely to
seek to play those roles also. In the case of DA, lack of representativeness may inhibit both being a user of
14 Using and abusing data analytics
DA tools or become a developer of them. Kraicer wrote, “The gap … could limit both who we imagine as a
computational social scientist, and even how computational social science should work.”
2. Standpoint theory: Standpoint theory research (e.g., Hekman, 1997) has shown that “where you stand” is cor-
related with the kinds of questions you ask and the kinds of answers you find. In part this is due to differential
access to knowledge, tools, and resources, but “where you stand” also has to do with your role as a woman, a
person of color, or with other life experiences. The body of DA research may be influenced by lack of repre-
sentativeness in the field. Kraecer noted, “Our social position informs what and how we research, and using
tools built from a single perspective may limit what we think to ask and test.”
In line with this, Frey, Patton, and Gaskell (2020) noted that “When analyzing social media data from marginal-
ized communities, algorithms lack the ability to accurately interpret offline context, which may lead to dangerous
assumptions about and implications for marginalized communities” (p. 42). Taking youth gangs as an example of
a marginalized community whose social media communication can be misinterpreted by algorithms, leading to
dire consequences for some and failure to provide services for others, Frey and his associates undertook an experi-
ment in which gang members became involved in the development of algorithms for processing relevant social
media messages. They found “the complexity of social media communication can only be uncovered through the
involvement of people who have knowledge of the localized language, culture, and changing nature of community
climate… If the gap between people who create algorithms and people who experience the direct impacts of them
persists, we will likely continue to reinforce the very social inequities we hope to ameliorate” (pp. 54–55). While
the likelihood of implementing the Frey experiment on a mass basis seems unlikely, to say the least, the experiment
did highlight how and why algorithms for processing social media may lead to error and bias.
and, due to the public availability of social media data, there is often confusion between public and private spaces. In
addition, social media participants and researchers may pay little attention to traditional terms of use.” When medi-
cal professionals defer to AI and anthropomorphize its results, professional ethics may risk being compromised.
In their article, these authors presented four case studies involving commercial scraping, de-anonymization of
forum users, fake profile data, and multiple scraper bots. In each case, the authors found serious violations of spe-
cific guidelines set forth by the Council for International Organizations of Medical Sciences (CIOMS). Violations,
which the authors labeled forms of “digital trespass”, involved “unauthorized scraping of social media data, entry of
false information, misrepresentation of researcher identities of participants on forums, lack of ethical approval and
informed consent, use of member quotations, and presentation of findings at conferences and in journals without
verifying accurate potential biases and limitations of the data” (Chiauzzi & Wick, 2019: n.p., abstract).
While attention to ethical issues in data science has been increasing, it is also widely acknowledged that ethical
training in data science has been deficient. In their article, “Data science education: We’re missing the boat, again”,
Howe et al. (2017), for example, called for new efforts in data science classes, focusing on ethics, legal compliance,
scientific reproducibility, data quality, and algorithmic bias.
The undermining of professional standards has consequences for the research result. For instance, in classic mul-
tivariate procedures such as confirmatory factor analysis and multigroup structural equation modeling, or even in
exploratory factor analysis, social scientists have sought to address the common problem that different groups may
attach different meanings to constructs. Chiauzzi and Wick (2019) give the example of differences over the mean-
ing of “treatment” in medical studies, where patients routinely define treatment in broader terms than do doctors.
Patients, for instance, may include not just medications but also “pets” and “handicapped parking stickers” as part
of “treatment”. Women more than men may attach social dimensions to “treatment”. Algorithm-makers may follow
the precepts of computer science without due sensitivity to the need for more subtle and appropriate development of
the measurement model for multivariate analysis. Chiauzzi and Wick conclude that “Faulty data assumptions and
researcher biases may cascade into poorly built algorithms that lead to ultimate inaccurate (and possible harmful)
conclusions.”
The worst impact on professional ethics of DA, data science, AI, and big data may be on the horizon as the auto-
mation of AI itself threatens to institutionalize poor ethical decision-making now common in the field. Dakuo Wang
et al. (2019) of IBM Research USA recently surveyed nearly two dozen corporate data scientists, publishing their
results in an article titled, “Human-AI collaboration in data science: Exploring data scientists’ perceptions of auto-
mated AI.” Though automation of the creation of AI applications is not yet widespread in business or government,
Wang and his colleagues found that “while informants expressed concerns about the trend of automating their jobs,
they also strongly felt it was inevitable” (p. 1). The issue for the future is what “it” is and if automated AI creation
will rest on underlying assumptions that perpetuate biases and unethical practices of the past.
compelled to be ultra-patriotic, displaying images of President Xi Jinping in their stores and making posts laudatory
of the regime to social media. Over a million people have been rounded up, partly enabled by DA, and sent to “re-
education centers”, where dire conditions prevail.
All tools may be used for good or evil. The CEO of Watrix, one of the suppliers for surveillance systems in
China, stated, “From our perspective, we just provide the technology. As for how it’s used, like all high tech, it may
be a double-edged sword” (Campbell, 2019: 55). This is a prevalent attitude in the big data community. Facebook,
for instance, disavows any responsibility for contributing to the rise of hate groups in America, to allowing Russians
to hack American and other elections via social media, or for racial bias in outcomes.
An example closer to home is Google’s “Project Nightingale”, an effort to digitize and store up to 50 mil-
lion health-care records obtained from Ascension, a leading US health-care provider. As reported by the Wall
St. Journal, the Guardian, and in a medical journal by Schneble, Elger, and Shaw (2020), a project employee
blew the whistle on misconduct in failing to protect the privacy and confidentiality of personal health information.
Specifically, the whistleblower charged and the Wall St. Journal confirmed that patients and doctors were not asked
for informed consent to share data and were not even notified. Also, health data were transmitted without anony-
mization with the result that Google employees had full access to non-anonymous patient health-care records. All
this occurred in spite of Google requiring training in medical data ethics. In her medical journal article, Schneble
concludes that data science and AI should not be exempt from scrutiny and prior approval by Institutional Review
Boards. The challenge, of course, is assuring IRB independence from employer interests.
Medicine provides other leading examples of privacy issues pertaining to big data and DA. Garattini et al. (2019:
69), for instance, cite four major categories of ethical issues in the medical sector:
1. Automation and algorithmic methods may restrict freedom of choice by the patient over what is done with
the data that individual provides. There is great “difficulty for individuals to be fully aware of what happens
to their data after collection … the initial data often moves through an information value chain: From data
collectors, to data aggregators, to analysts/advisors, to policy makers, to implementers. … with the final actor/
implementer using the data for purposes that can be very different from the initial intention of the individual
that provided the data” (p. 74). The authors suggest that offering the freedom to opt-out of data collection or
at least the option to seek a second, independent decision could be a remedy for patients, but opt-out strategies
have not proved effective consumer protection in other areas and second opinions may be prohibitively costly
for many patients even if possible in principle.
2. Big data analytics complexity may effectively make informed consent impossible. Garattini et al. (2019:
75–76) cite a recent Ebola outbreak in explaining the impossibility of applying informed consent in the con-
text of viral outbreaks, for instance.
3. Data analytics may well serve as a form of profiling individual and group identities, with consequent issues for
fair health access and justice. Garattini et al. (2019: 76) write, “In the case of viral diagnostics for example, the
amount and granularity of information provides not only the knowledge regarding potential drug resistance
parameters by the infecting organism but also the reconstruction of infectious disease outbreaks, transform-
ing the question of ‘who infected whom’ into ‘they infected them’, i.e., from the more general to the defini-
tive form” (cf. Pak & Kasarskis, 2015). To take another example, Lu (2019) was able to use data analytics to
identify trucks engaged in illegal construction site dumping with .84 precision, meaning that 16% of trucks
profiled as such were not illegal.
4. Big data analytics is normalizing surveillance of the population and changing the capabilities for and norms
regarding population-wide interventions of various types. Garattini et al. (2019: 77) note that in the area of
monitoring infectious diseases, big data may include information on social media, search engine search word
trends, and other indirect measures such that “Algorithms can provide automated decision support as part of
clinical workflow at the time and location of the decision-making, without requiring clinician initiative,” as,
for example, hospital-level or government-level to mount vaccination programs. The decision about vaccina-
tion is elevated from the realm of doctor-patient norms to the realm of norm pertaining to public health policy,
with attendant benefits but also risks. The authors note, “The overall consequences for individuals, groups,
healthcare providers and society as a whole remain poorly understood” (p. 80).
Using and abusing data analytics 17
What is legal may not be ethical when it comes to DA. On the one hand there are powerful arguments in favor of
treating data scraped from the web and social media as public:
1. The data are in fact publically accessible. Moreover, individuals who post do so knowing this. Journalists, law
enforcement authorities, teachers, and others have frequently warned that one should not post unless one is
willing for one’s community, friends, workplaces, and the public to know what is posted. Users frequently use
the public nature of posting to re-tweet or otherwise disseminate posted information themselves.
2. The courts have not prevented large corporations, government, and other entities from collecting web and
social media data on a mass basis. For example, it is now routine for a person’s posts about seeking to buy a
particular automobile or other item to result in email and pop-up web advertisements directed to that person.
Indeed, doing just this has become a giant business in its own right. At this writing it seems extremely unlikely
that there will be a legal sea change in favor of privacy.
3. In social science, the open science movement has emphasized data availability. The ability of other schol-
ars to replicate a researcher’s work is fundamental to the scientific method. If research cannot be repli-
cated, it is suspect. Replication requires access to the researcher’s data. The National Science Foundation
policy states “Investigators are expected to share with other researchers, at no more than incremental
cost and within a reasonable time, the primary data, samples, physical collections and other supporting
materials created or gathered in the course of work under NSF grants. Grantees are expected to encour-
age and facilitate such sharing” (https://www.nsf.gov/bfa/dias/policy/dmp.jsp). It is not uncommon for
other research funding organizations to require the public archiving of research data they have funded.
Following the replicability principle, many journals will not publish papers based on proprietary, classi-
fied, or otherwise unavailable data such that it is impossible to check the validity of the author’s work. The
replicability principle applies to all research data and does not make exception for data scraped from the
web or social media.
On the other hand, there are strong arguments for privacy also. Most of these revolve around the Hippocratic Oath,
which emphasizes the “Do no harm” principle, which is also seen as a professional obligation. What is legal is not
necessarily ethical. Institutional Review Boards have long been established with the charge of promoting ethical
behavior in survey and experimental research. In both of those contexts, unlike the context of social media, it is
possible and expected to obtain informed consent at the individual level. Attempts have been made to apply the
informed consent principle to the digital world, notably the European Union Data Directive. Its Article 7 this direc-
tive allows subjects to block usage of their personal data without consent, and its Article 12 requires that subjects
receive an account of digitally-based decisions which impact them. A 2018 EU evaluation of the directive revealed
considerable debate about its effectiveness.
Injury to the respondents might be incurred by release of individually identifiable information on sensitive issues
such as health (employers and insurers might otherwise use this), illegal activities (law enforcement might use infor-
mation on drug use), sexual views and activities (make this public could disrupt marriages), and views on race, abor-
tion, and other sensitive issues (release of this could lead to harassment by neighbors and the community). IRBs have
generally taken the view that data gathering (e.g., all survey items or interview protocols) require written consent of
the individual. Applying this principle to social media and other big data may lead to a policy of not releasing data
(e.g., not releasing tweets gathered from the public Twitter API) unless anonymized in order to protect individuals
from possible injury.
Given the pro-public and pro-privacy arguments, social scientists are forced to do more than ponder the ethical
issues. At the end of the day, decisions must be made about data access. Compromise policies must be adopted. To
give one example of such compromise, the followings are guidelines from the Social Science Computer Review with
regard to their “data availability” requirement for all articles:
• State that data are available for use under controlled conditions by applying to a board/department/committee
whose charge includes making data available for replication, giving contact information.
• State that the data may be purchased at a non-prohibitive price from a third party, whose contact information
is given.
• State that the anonymized data are available from an author at a given email address.
• State that the variance-covariance matrix and variable-level descriptive statistics are available from an author
at a given email address. (Many statistical procedures, such as factor analysis or structural equation modeling
may be performed with such input, not requiring individual-level data.)
• In the case of data scraped from social media or the web, it is sufficient if an appendix contains detailed infor-
mation that would enable a reader-researcher to reconstruct the same or a similar dataset.
• In rare cases, dataset availability is not relevant to the particular article. Check with the editor about such an
exception.
This particular journal noted that the alternative to the foregoing data availability policy would not be having no
data availability statement, but rather a statement from the journal that the data are unavailable for replication and
consequently findings based on inference from the data should be viewed as unverifiable.
best, even were those commissioning algorithmic systems inclined to make trouble for themselves by institutional-
izing an independent critical audience as part of their development process.
The meaningfulness of most transparency measures is questionable at present. While transparency is widely
given lip service, in practice very few citizens avail themselves of the ostensible opportunities (Zarsky, 2016:
122). To the extent that people do challenge algorithmic decisions such as those related to their financial credit,
this raises transactional costs for the credit-giving institution, which is apt to respond by not promoting the
opportunity to challenge by making it easy but rather just the opposite. From the citizen point of view, taking
advantage of transparency opportunities imposes high costs of time and sometimes even legal fees. The few
who challenge may well give up after protracted dealings with the institution. The high price of implementing
meaningful transparency is why, by and large, it does not exist in most settings where algorithmically-based
decision-making prevails.
1. Manipulation: ABDA can be used as a form of manipulative power to initiate cheap land grabs in ways farm-
ers would not have agreed to willingly.
2. Seduction: ABDA can pressure farmers to install monitors on their farms, limit access to their farms, limiting
the freedom of farmers and otherwise encouraging practices farmers themselves would not otherwise have
chosen.
3. Leadership: Agricultural technology providers get farmers to agree to use of ABDA without their informed
consent with regard to data ownership, data sharing, and data privacy.
4. Coercion: Agricultural technology providers threaten farmers with the loss of big data analytics if famers
do no obey their policies, and farmers are coerced into remaining with the provider due to fear of legal and
economic reprisal.
5. Force: Agricultural technology providers use ABDA to calculate farmer willingness-to-pay rates and then use
this information to force farmers into vulnerable financial positions.
Ryan, who goes into much more detail on each of these five points in his article, makes the case that far from
being neutral; data analytics is instrumental to the exercise of power. Data analytics has the proven potential to
give agricultural technology providers the upper hand in the game of power, much as it does in all sectors of the
economy.
In this chapter, we started with a brief account of the promise of DA, data science, and AI. As this story is
prominent in the media, our account here was brief, wishing to acknowledge the positives in general and for social
science specifically. However, most of this chapter has been devoted to the much-needed but less-told story of the
perils and pitfalls of big data and algorithmic policymaking both in terms of research design problems and in terms
of social and ethical issues.
In matters of research design this chapter called attention to the very real problem of “true believership” and
disinclination of data science as a field to see the possibility that there may be multiple paths to the truth, including
traditional statistics on the one hand and qualitative research on the other. Those who use data analytics must recog-
nize pseudo-objectivity when they see it in research and recognize that progress is made not by denying bias exists
but rather by acknowledging it and seeking to counterbalance it. This is an enormous challenge given the limitations
in the way both big data and the tools to analyze it are created.
20 Using and abusing data analytics
10. Does the organization restrict data collection to the necessary? Ethical compromise often arises from
collecting all data in sight. In contrast, ethical practices are better promoted by a policy of data minimi-
zation, which means collecting only data necessary to achieve organizational goals.
9. Does the organization repurpose data? If data authorized by the sources for one purpose are then
repurposed to other goals, the principle of informed consent is violated. This problem is confounded
when the repurposing is done by another entity to which the data are sold or shared.
8. Does the organization promote data transparency? No matter what other internal and external mecha-
nisms the organization puts into place to assure ethical data practices, they will never be comprehensive.
By making data and systems as transparent as possible, additional feedback will be forthcoming, some-
times from unexpected sources. More feedback promotes better and more ethical decision-making.
7. Does the organization promote a culture of data ethics? The organization must care about broader
values than short-term profits or political advantage. Promoting an organizational culture of data ethics
may involve embedding this culture in job descriptions, hiring processes, orientations, ongoing training,
manuals and reports, and job evaluations.
6. Does the organization reward data ethics entrepreneurs? In every area of successful innovation, imple-
mentation of the innovation is promoted when there is an advocate promoting change. If there is such
a data ethics entrepreneur, that person should be rewarded, not only for the person’s sake but also as a
statement of the organization’s values and culture.
5. Does the organization hire data scientists who care about ethics? Rather than force people to change, it
is better to hire the right people at the outset. Newly-hired data scientists should understand that focus-
ing on more modest but more ethical outcomes takes precedence over constructing unbridled systems
which might be technologically feasible.
4. Does the organization seek to counter algorithmic bias? Ethical lapses are often traced to biased and
flawed model assumptions. Short of hiring better analysts to begin, giving them ethical mandates, and
allowing them time to do their job, bias is also minimized if the project team includes not only technical
data science staff but also subject matter experts, research methodologists from outside data science,
representatives of affected groups, and peer reviewers.
3. Are impact studies conducted prior to system deployment? In addition to countering bias by a diverse
development team, requiring a formal, independent data system impact study alerts the organization to
prospective ethical problems.
2. Does the CEO support data ethics? Studies of technology acceptance and diffusion show many success
factors, but prime among them is strong support by the chief operating officer for the innovation. This
applies to introducing data ethics mechanisms into the organization.
1. Is someone responsible for data ethics? While all organizational members share ethical responsibili-
ties, the organization needs (1) a named data steward for each data system deployed; (2) oversight of the
data steward by an in-house Ethics Review Committee or the like; and (3) an annual independent and
external data ethics audit involving a data ethicist.
There are many types of social and ethical issues in data analytics, data science, and AI. Foremost is that fact that
these are tools when all is said and done. Tools may be used for good or evil. Tools may be best and most exploited
by those with the resources to do so, that is why studies find a bias toward the privileged in society. Specific ethical
issues such as discrimination or the undermining of privacy are becoming better known, but these issues are the tip
Using and abusing data analytics 21
of the iceberg. Submerged beneath the surface but posing a greater and more subtle danger to society are threats to
democracy, professional standards, and the way decisions are made. Algorithmic rigidity, misleading profiling, and
failure to reap the benefits of diversity are true and present dangers. It was said of those who fought despotism from
within in another time, “they did what they could”. It is trite but accurate to say that eternal vigilance is the price of
freedom. This applies to the digital world as well. As social science scholars, we must do what we can, supporting
transparency, diversity, and the public good in a problematic economic and political environment.
Endnotes
1. https://ncvhs.hhs.gov/wp-content/uploads/2014/05/090930lt.pdf
2. http://kbroman.org/pkg_primer/pages/cran.html
3. https://stat.ethz.ch/pipermail/r-help/2016-December/443689.html
4. In this case, the Supreme Court held unanimously that warrantless search and seizure of a cell phone with its digital con-
tents during an arrest is unconstitutional.
Another random document with
no related content on Scribd:
silloin hän löytäisi sinut täältä. Paetkaamme! Matkasuunnitelmamme
muuttuu taas», jatkoi Mrs. Edgecombe nauraen. »Puolison takaa-
ajaminen ja kummitätiä pakeneminen ei ole helpoimmin suoritettavia
matkoja!»
XXVIII.
»Niinkö luulet? Sinä olet yhtä erehtynyt kuin monet muutkin. Minun
tulee siis käännyttää sinut. Olen rakentanut kokonaisen järjestelmän
ja koetan nyt opettaa sen sinulle. Sanohan, mitä tarkoitat sanalla
'rikas'?…»
»Kyllä, mutta nautinnot kai riippuvat niistä summista, joita voi niihin
käyttää?…»
»Entä sisustus?»
»Ruoka kai.»
»Mutta suunnilleen?»
Hanna laski.