Acceptable Bias Using Corpus Linguistics Methods With Critical Discourse Analysis

Critical Discourse Studies
ISSN: 1740-5904 (Print) 1740-5912 (Online) Journal homepage: https://www.tandfonline.com/loi/rcds20
Acceptable bias? Using corpus linguistics methods

with critical discourse analysis
Paul Baker
To cite this article: Paul Baker (2012) Acceptable bias? Using corpus linguistics
methods with critical discourse analysis, Critical Discourse Studies, 9:3, 247-256, DOI:
10.1080/17405904.2012.688297
To link to this article: https://doi.org/10.1080/17405904.2012.688297
Published online: 25 May 2012.
Submit your article to this journal
Article views: 4631
View related articles
Citing articles: 50 View citing articles
Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=rcds20
Critical Discourse Studies
Vol. 9, No. 3, August 2012, 247 –256
Acceptable bias? Using corpus linguistics methods with critical discourse

analysis
Paul Baker∗
Department of Linguistics and English Language, Lancaster University, Lancaster, UK
This paper considers the proposal that corpus linguistics approaches can improve the
objectivity of critical discourse analysis research, resulting in a more robust and valid set
of findings. Taking a recent project which examined the representation of Islam and
Muslims in the British press, corpus-driven procedures identified that Muslims tended to be
linked to the concept of extreme belief much more than moderate or strong belief. There
were differences across newspapers, with 1 in 8 Muslims describing it as extreme in The
People while this figure was 1 in 35 for The Guardian. Such patterns of quantification,
however, still require researchers to carry out their own critical interpretations with regard
to what counts as acceptable frequencies.
Keywords: corpus linguistics; Islam; bias; extremism; CDA
Introduction
I wish to start this paper with a useful and concise definition of critical discourse analysis (CDA)
from one of its most prolific and influential proponents.
Critical discourse analysis (CDA) is a type of discourse analytical research that primarily studies the
way social power abuse, dominance, and inequality are enacted, reproduced, and resisted by text and
talk in the social and political context. With such dissident research, critical discourse analysts
take explicit position, and thus want to understand, expose, and ultimately resist social inequality.
(Van Dijk, 2001, p. 352).
This paper is concerned with how CDA practitioners are able to identify social power abuse,
dominance and inequality, and the extent to which corpus methods can aid such identifications.
A commonly cited potential criticism of CDA methods is that researchers may ‘cherry-pick’ data
which appear to prove a preconceived point (Koller & Mautner, 2004, p. 225; Orpin, 2005, p. 38;
Partington, 2004, p. 13). Widdowson (1998), in reviewing three edited collections of CDA
studies, argues that the biases of the analyst will mean that ‘Your analysis will be the record
of whatever partial interpretation suits your own agenda’ (p. 148) and ‘what is distinctive
about Critical Discourse Analysis is that it is resolutely uncritical of its own discursive practices’
(Widdowson, 1998, p. 151). On the other hand, Widdowson praises two studies taken from these
books, those by Krishnamurthy (1996) and Hoey (1996) who have both used corpus linguistics
(CL) approaches to discourse analysis. Widdowson (1998) argues that their ‘interpretation is
thus grounded in systematic language description’ (p. 148). Based on the outcomes of research
which examined discourses of immigration in the British press, I and my co-authors argued that
‘The combination of methodologies traditionally associated with CDA (DHA) [Discourse
Historical Analysis] and CL in research projects, and their potential theoretical and methodo-
logical cross-pollination seem to benefit both CDA and CL’ (Baker et al., 2008, p. 297). Our
∗
Email: p.baker@lancaster.ac.uk
ISSN 1740-5904 print/ISSN 1740-5912 online

# 2012 Taylor & Francis
http://dx.doi.org/10.1080/17405904.2012.688297
http://www.tandfonline.com
248 P. Baker
study offered a recursive model whereby analysis was carried out in a number of stages, moving
back and forth between quantitative and qualitative forms of analysis, with each stage informing
the subsequent stage. Methods associated with CL, for example, would be used in order to ident-
ify salient linguistic patterns in a corpus of texts, which could then be interrogated in a more
qualitative way via close reading of individual texts or concordance lines in the next stage.
The method described in Baker et al. (2008) was developed during the analysis of a 140
million word corpus of newspaper texts which contained references to immigration. While we
concluded that the method proved to be fruitful, care must be taken in over-generalising from
one study. A second study (described in Baker, Gabrielatos, & McEnery, forthcoming), which
examined newspaper representation of Islam and Muslims, had the aim of replicating the com-
bined CDA/CL method. A small number of the findings from that study are discussed in this
paper, in order to highlight some of the methodological issues which arose as a result.
The research context

The main aim of analysing a corpus of newspaper articles which referred to Islam and/or
Muslims was to identify how these concepts were represented in the British press. Other research
(Awass, 1996; Moore, Mason, & Lewis, 2008; Poole, 2002; Richardson, 2004; Said, 1997) had
used a range of different techniques to examine the media representation of Islam, generally con-
cluding that they had found a picture of negative bias. For example, Richardson’s (2004) study of
the linguistic and social practices of five daily and four Sunday British broadsheets over a
4-month period argued that the broadsheets engage in three processes: separation, differentiation
and negativisation and ‘predominantly reframe Muslim cultural difference as cultural deviance,
and increasingly, it seems, cultural threat’ (p. 232). While earlier studies had incorporated CDA
and/or aspects of quantitative analysis, none of the previous research on the subject had incor-
porated corpus-driven methods (such as keyword analysis1) on an extremely large sample of data
consisting of over 100 million words. Our research team were interested in ascertaining whether
such a corpus analysis would produce a similar pattern of negative bias or whether it would
result in a more complex picture. Unlike more traditional CDA research, we approached our
corpus in a relatively naı̈ve way. This meant that while we were aware that other research
had found negative bias, we did not intend to specifically look for such bias ourselves.
Instead, we hoped that the identification of frequent and salient linguistic patterns in the
corpus would provide a ‘way in’ to the data – we would thus need to account for whatever
the corpus analytical techniques highlighted, negative or positive.
For example, one early strand of our analysis involved focussing on the word Muslim and its
plural form. This word seemed to be a good place to start because it was both highly frequent in
the corpus and it appeared to be used as both an adjectival modifier of nouns (e.g. Muslim
woman) and as a noun itself (e.g. a Muslim). Interestingly, its use as an adjective appeared to
be more frequent (about 70% of the time), so we decided to focus on the most typical contexts
in which Muslim occurred as a pre-modifying adjective. We found that one common pattern in
this context concerned words relating to extreme belief, such as extremist(s), fundamentalist(s)
and militant(s). Such words also were common when Muslim was a noun, and they also occurred
frequently with two other high-frequency words Islam and Islamic.
In order to obtain a better picture of what sort of contexts these extreme-belief words occur in
we used Sketch Engine’s Word Sketch2 function (Kilgarriff, Rychly, Smrz, & Tugwell, 2004).
Table 1 shows the 10 most statistically salient (using the logDice metric) adjective and verb
collocates of the extreme belief words (when they occur as nouns). Sketch Engine allows us
to distinguish between verb collocates that position a noun as either a subject (e.g. A fanatic
plotted . . . ) or an object (e.g. A fanatic was deported). While there is not enough space here
Critical Discourse Studies 249
Table 1. Word Sketch of extreme belief words.

Extreme belief
word Adjectives Verbs (subject) Verbs (object)
fanatic murderous, Islamic, Muslim, brainwash, plot, brainwash, deport,
religious, home-grown, evil, behead, plan, abuse, appease, defeat, fear,
suicidal, fundamentalist, hate, preach, target, curb, isolate, link,
ruthless, hate-filled hijack, rant brand, prepare
extremist Islamic, Muslim, violent, target, brainwash, isolate, deport, tackle, link,
suspected, right-wing, infiltrate, plot, suspect, determine,
religious, home-grown, far- hijack, preach, prosecute, appease,
right, Algerian, animal exploit, murder, defeat, confront
plan, pose
militant Islamic, suspected, Palestinian, fire, kidnap, storm, suspect, link, hole, arrest,
Hamas, Kashmiri, wanted, attack, threaten, arm, assassinate, blame,
loyal, armed, alleged, seize, behead, target, mask, kill, disarm
Pakistani bomb, ambush
fundamentalist Islamic, Christian, Muslim, infiltrate, object, wish, poise, appease, anger,
religious, reluctant, favour, preach, hate, offend, upset, fuel,
Protestant, crazed, fanatical, target, threaten, oppose, link, criticise,
extreme, Algerian wage, exploit suspect
separatist Basque, Kashmiri, Kurdish, operate, seize, try, fight, blame, crush, defeat,
Tamil, Flemish, Croat, Sikh, threaten, seek, want, suspect, encourage,
Albanian, suspected, Muslim begin, claim, call support, fight, accuse,
join, include
radical Islamic, British-based, left- brainwash, plot, deport, inflame, counter,
wing, non-violent, suspected, preach, exploit, determine, suspect,
home-grown, Islamist, influence, recruit, tackle, jail, confront,
Muslim, anti-Western, so- hijack, kidnap, pose, investigate, invite
called want
hardliner embattled, unelected, wield, control, oppose, appease, embolden, anger,
incumbent, clerical, Iranian, exploit, dominate, galvanise, infuriate,
Croat, Hamas, loyal, furious, ally, block, attempt, strengthen, isolate,
outspoken fear, replace alienate, enable, oppose
firebrand left-wing, one-eyed, Unionist, endanger, head, N/A
populist, one-time, bearded, launch, support,
socialist, far-right, clerical, lead, run, tell
Protestant
to carry out detailed analyses of these collocates, it is notable how many of these collocates refer
to crime, violence, terrorism, conflict and ideological influence. In these articles then, the
concept of extremisms is not strongly linked to more positive representations like piety.
Having determined that these extreme belief words tended to occur in negative contexts, we
decided to examine their dispersion across the corpus.
The corpus consisted of approximately 143 million words (200,000 articles) taken from the
British national press between 1998 and 2009. We used the online database Nexis UK to ident-
ify articles which contained relevant search terms like Muslim and Islam.3 The newspapers in
our corpus were The Star, The Mirror, The Sun, The Daily Mail, The Daily Express, The Daily
Telegraph, The Times, The Independent and The Guardian4 and all of the Sunday versions of
these papers. Two other newspapers, The Business (which appeared weekly but stopped pub-
lication in 2008) and The People (a Sunday newspaper with no daily equivalent), were also
included. It is possible to distinguish British national newspapers along numerous dimensions
such as political leaning (whether the newspaper advocates broadly conservative or liberal
250 P. Baker
Figure 1. Overall frequencies of extreme belief words occurring before or after Muslim(s), Islamic and
Islam for each newspaper.
economic and/or social values) or format/style (whether the newspaper takes the form of a
more serious ‘quality’ broadsheet reporting or whether it tends towards a more populist
tabloid format).
Figure 1 shows the overall frequencies of occurrences of the words fanatic(al), militant,
extremist, fundamentalist, radical, separatist, hardline(r) and firebrand, and their plurals,5
occurring directly next to (either to the right or left) of the words Muslim(s), Islam or Islamic,
for each newspaper in the corpus.
A number of conclusions of a descriptive nature can be drawn from Figure 1. First, the direct
and close association of extreme belief words with Islamic appears to be more frequent than for
Muslim(s) and Islam. Notably, the difference in frequencies appears to be largest between Islam
and Islamic. Second, some newspapers seem to use these extreme belief terms to represent
Muslims and Islam much more frequently (The Times, The Independent) than others (The
Business, The People). However, Figure 1 is based only on frequency alone. It is indeed inter-
esting that The Times has a high occurrence of terms like extremist Muslim, but this could be due
to the fact that The Times is a daily broadsheet newspaper which contains more text than some of
the other newspapers. A supplementary way of comparing newspapers would be to take into
account the total number of times that each one uses the terms Muslim(s), Islamic and Islam,
and then calculate the proportion of times that such words co-occur with an extreme belief
word. This is shown in Figure 2.
In one way, the picture which emerges when proportional frequencies are taken into account
is not so different to Figure 1. The word Islamic still appears to be a stronger ‘attractor’ of
extreme belief terms. However, when we compare newspapers against each other, a very
different picture emerges. Here, The People (a conservative tabloid) has the highest proportional
frequency of writing about extremist Muslims, whereas The Guardian (a liberal broadsheet) has
the lowest proportional frequency (at least for the words Muslim(s) and Islamic).
It is thus useful to take both raw and proportional information into account. Figure 1
indicates that some newspapers write a lot about extremist Muslims, possibly because they
simply contain much more text than other newspapers, whereas Figure 2 tells us that some news-
papers tend to write about Muslims as extremist because they have relatively more constructions
Figure 2. Proportion of times (per cent) that extreme belief words occur before or after Muslim(s), Islamic
and Islam for each newspaper.
of Muslims as extremists as opposed to Muslims as something else. Our quantitative analysis has
thus uncovered evidence for a semantic prosody (Louw, 1993) of Muslims and Islam as extreme,
which is more prevalent in some newspapers than others.
This is not the end of the analysis, however. We also decided to consider other types of words
which refer to belief. A second set of words were identified through further examination of
collocates and frequency lists – those which appeared to refer to ‘strong belief’ but did not
appear to explicitly carry a connotation for extremism: orthodox, pious, committed (as an
adjective), devout and faithful. Finally, a third set of words referenced moderate belief:
moderate(s), progressive(s), secular and liberal(s). Figure 3 shows the proportion of times that
the words Muslim(s), Islam and Islamic are modified by the three different classes of words.
Figure 3. Percentages of times that Muslim(s), Islam and Islamic are directly modified by different types of
belief words.
252 P. Baker
For the sake of simplicity, I have not considered newspapers individually, but have instead
grouped all of the newspapers together so that the overall picture can be more easily grasped.
A clear pattern emerges from a glance at Figure 3. The extreme belief words are much more
likely to modify Muslim(s), Islam and Islamic, especially Islamic. There are relatively fewer
instances of terms like moderate Muslim or pious Muslim in the corpus.
Care must be taken, however, in assuming that words have the meanings and connotations
that we assign to them. Take, for example, the term devout Muslim, which I categorised above as
‘strong but not extreme belief’. A more detailed analysis of this term revealed that it often
occurred near the word described. The example below shows one such case.
Ragab el-Swerkie, 56, who owns a chain of clothes stores and is described by his employees as a
devout Muslim, preyed mainly on beautiful young females, say prosecutors. (Sunday Telegraph,
24 June 2001)
Therefore, the phrase described. . . as a devout Muslim appears to highlight cases where devout
Muslims are actually more problematic. It could be argued then that readers are being primed to
suspect that (some) devout Muslims are actually not devout at all, or, worse, that devout is
merely a euphemism for extremist (or other ‘extreme belief’ words).
As noted previously, terms like extremist Muslim are used uncritically in the corpus,
although the analysis of devout Muslim is useful in that it reveals that there are other ways in
which extreme beliefs can be implied, without actually using a word like extremist. The
example of devout Muslim is useful in warning us not to read frequencies at face value.
However, if we simply focus on the figures, we can summarise the patterns regarding extent
of belief with the following statements:
. References to extreme forms of Islam or Muslims are 21 times more common in the corpus
than references to moderate Islam/Muslims
. There is variation across the terms; almost 1 in 6 cases of Islamic are modified by an
extreme belief word. This is 1 in 20 for Muslim(s) and 1 in 25 for Islam.
. There is variation across newspapers – in The People, 1 in 8 Muslims are described as
extreme. In The Guardian this is 1 in 35.
Critically considering bias

It is at this stage that the contentious issue which I wish to highlight in this paper comes to bear.
So far, the analysis had been fairly descriptive, although a very basic form of interpretation was
required to make sense of the patterns in Figures 1 – 3. In writing the concluding chapter to our
book on media representations of Islam (Baker et al., forthcoming), when I came to the section
about extent of belief, I initially wrote the following sentence:
The British press are biased against Muslims because they tend to over-focus on Muslims who are
extreme, or they often associate Islam with extremism.
I felt that I could be fairly confident in making this statement because I had considered the whole
corpus. I had not ‘cherry-picked’ a few cases which proved the point, but had looked at linguistic
patterns across millions of words. I experienced some concern that perhaps I had not considered
every possible word which referenced extremism or different types of belief, and during later
analyses of the corpus data, I found a couple of words that I had missed, such as zealot.
However, such words were relatively rare (Muslim zealot(s) only occurs 44 times in the
corpus as opposed to Muslim extremist(s) which occurs 2060 times), so I was fairly sure that
I had covered the majority of cases in the data. For the sake of completeness, I could have
also looked at cases where Muslims are referred to but not described directly as Muslims,
such as ‘they are fanatics’ or incidences which refer to specific, named Muslims. However, I
felt that an analysis of explicit cases like Muslim extremist was the strongest evidence I could use
to indicate bias.
Yet, casting a critical eye back over my concluding sentence about the British press being
biased, I wondered to what extent others would agree with me. I carried out a critical reading
of that statement, imagining a range of different responses to it and the information in
Figures 1 – 3.
First, it could be argued that the majority pattern is that the words Muslim(s), Islam and
Islamic are not normally referred to by extremist belief words. Take one of the ‘most biased’
newspapers, The People, which directly links the words Muslim(s) to an extremist word 12%
of the time (or 1 in 8 cases). However, there are 7 out of 8 cases where The People does not
directly link the words Muslim(s) to an extremist word. Someone could argue then that a
negative representation is acceptable as long as it is not the majority representation.
Perhaps if the representation had been 4 in 8 cases, then a stronger case for negative bias
(at least with regard to extreme belief) could be made. I should point out that I do not wish to
claim this or any of the following positions as my own, but demonstrate them as possible
positions.
A second position could take a very different view – that Muslims should never (or only in
particular circumstances, for example, if they self-identify) be described as extreme (particularly
because when we looked at representations of ‘extreme Muslims’, they tended to be associated
with very negative contexts such as terrorism and conflict). Even 1 in 100 cases would therefore
be a problem. Therefore, all of the British press could be seen as unacceptably biased.
A third view might try to compare the newspapers against each other. It might accept that
completely avoiding negative representation of all of the members of a large and diverse
social group is impossible, and so instead we should judge the newspapers in relation to each
other. For example, in Figure 2, The Guardian has the smallest proportion of cases where the
words Muslim(s) and Islamic are linked to extremist belief. We might say that this newspaper
could act as a benchmark, giving us an idea of a ‘responsible’ way of reporting on Islam and
that others should be judged according to how far they deviate from The Guardian. Or we
could look at the figures for the averages in Figures 1 and 2. That would at least allow us to
identify which newspapers have extreme belief representations which are more frequent than
the average.
Fourth, a different perspective might argue that the British press does not write about
extremist Muslims enough. They could accuse the press (and particularly The Guardian) of
downplaying the importance of such news stories, sacrificing news-values for ‘political
correctness’.
Finally, faced with such a wealth of (hypothetical) disagreement, we may decide to be cau-
tious in making any definitive conclusions about the extent that bias exists and whether or not it
is acceptable, and instead simply present the descriptive information in the form of the figures,
allowing readers to draw their own conclusions.
All of the above strategies carry their own problems, at least from the perspective of someone
who wishes to claim that a corpus approach enables an objective analysis. The first four betray
the analyst’s own biases, while the last one attempts to ‘sit on the fence’ and fails to fully engage
with the remit of CDA which is to go beyond a mere descriptive account of linguistic patterns
in texts. If we want to fully carry out CDA then, it appears that the political biases of the
analyst must come into play. Traditional CDA practitioners might see no problem with this,
although as someone who has argued that corpus approaches help to reduce researcher bias, it
is ironic to find that bias has crept back in again at the end. Putting aside the question of
whether a text producer is negatively biased or not, a further set of issues relates to how we
decide whether the amount of bias is cause for concern. There appear to be a number of different
254 P. Baker
factors that need to be taken into account when considering whether a particular representation
of group can be labelled as problematic.
First, there is the overall frequency of the negative representation (see Figure 1). So, a group
may be negatively represented only 5% of the time it gets referred to. But, it may get talked or
written about so much, that that 5% will still amount to recipients of texts being exposed to a
great deal of negative representation.
Second, there is the proportional frequency of the negative representation (Figure 2). So, if a
social group tends to be represented negatively much more than it is represented positively, then
this also could be seen as cause for concern. It is, of course, unlikely that analysts will be able to
agree on the ‘acceptable’ or ‘problematic’ frequencies for these two factors.
A third factor relates to the social group under discussion. One way of deciding whether a
particular representation is a cause for worry could be to compare it against another (similar)
social group. For example, if we know how often Muslims are represented as extreme in the
press, then how does this compare to Christians? Similar to comparing frequencies of represen-
tations across different newspapers, this could be another way of reaching benchmarks about
what is typical in certain discourse communities. However, it may not be a good idea to base
standards of acceptable and unacceptable representation frequency by comparing groups who
may be similar in one dimension (e.g. both Christians and Muslims are religious groups) but
could also be different in many other ways. It could be argued that CDA practitioners should
be especially concerned if vulnerable groups like Muslims, immigrants, gay people, disabled
people, women, working class people, older people, children or ethnic minority groups are rep-
resented negatively. Traditionally, these groups are either numerically in a minority (in the UK
at least); so even in a democracy they may not have ‘power in numbers’, or they may not be able
to protect their own interests, or they may already suffer from societal and personal prejudice and
discrimination. Also, what if the group under examination is one which is or has been relatively
powerful, like bankers, who have been held responsible for the global recession of 2008? If a
corpus analysis of the media representation of bankers found that they were negatively rep-
resented as greedy, irresponsible, etc. then should this be raised as a point of concern with rec-
ommendations for curbing such representations? It is also unlikely that a ‘hierarchy of
vulnerability’, along with cut-off points as to who deserves their media representation to be
closely monitored, can be agreed upon. On the other hand, some people may view any form
of negative representation of any social group to be problematic, no matter what that certain
members of that social group may or may not have done in the past.
A fourth factor is the strength of negativity of the negative representation. So, in the illustra-
tive example discussed in this paper, I only considered constructions of Muslims or Islam in
relationship to extremist belief. I have argued that this is a negative representation and therefore
a problem if it happens (too) often. However, this was not the only negative representation in the
research on Islam that I discovered. Elsewhere in the corpus, Muslims were represented as
terrorists, ‘scroungers’ or as easily offended and difficult. At other points, they were viewed
as susceptible to radicalisation or as victims of prejudice. So, as with a hierarchy of vulnerability,
we could also conceive of a hierarchy of negativity, with some representations being viewed as
more problematic than others. Again, analysts would need to make decisions with regard to the
point that the frequency of certain types of representation crosses a line. Potentially, each type of
representation could collectively contribute towards an overall negative or positive stance.
Fifth is a factor relating to the context of the representation (particularly concerning issues of
text production and reception). Is it more of a problem if representations of Muslims as extreme
tend to be very frequent in political speeches made by government ministers, or if they occur in
people’s personal blogs? Negative representations of social groups are problematic whatever
context they occur in, but certain contexts, such as those which are made by powerful or
influential text producers or are received by powerful people and/or reach very large numbers of
people, may result in more immediate and damaging consequences.
These issues do not relate merely to corpus approaches to CDA but all forms of CDA.
However, once we are in a position to start quantifying bias, based on analysis of very large
representative (or even totally inclusive) samples of data, these questions regarding interpret-
ation and evaluation of linguistic patterns become much more pertinent.
Conclusion
One of the advantages of CDA is in its lack of set ways of conducting analysis. With a number of
different ‘schools’ available, approaches can be combined together or different techniques can
be selected from a wide range. CDA appears to have a broad remit to highlight problematic
inequalities of power. Additionally, CDA practitioners view their political commitment (the
‘explicit position’) as a strength rather than a problem. Taking a post-structuralist stance, it
could be argued that bias is unavoidable when conducting social research, and the aim for
neutral objectivity is in itself a ‘stance’.
Yet, if CDA is to be viewed as credible and convincing, it needs to separate itself from
polemic. This is why I would argue that any analytical tools and methods that are rigorous
and grounded in scientific principles such as representativeness, falsification, data-driven
approaches, using statistical approaches to test hypotheses and a desire to provide a full
picture of representation (not just the negative cases) can only serve to help to improve
CDA’s standing, ultimately making its findings more influential.
While corpus approaches can enhance CDA by allowing analysts to consider a much larger
amount of data, enabling them to make more confident claims based on the appearance of quan-
titative patterns, I hope that this paper has demonstrated that the interpretation and evaluation of
quantitative patterns are still very much likely to be subject to human bias. That does not mean
we should throw out the baby with the bath water and abandon CL processes within CDA
research, but we ought to be careful in overstating the ability of CL to reduce researcher bias.
One way forward would be to aim for an increased commitment to researcher reflexivity
(Watt, 2007), in other words, a greater consideration of how the researcher impacts on the
researched. This would also involve CDA practitioners engaging in further dialogue with
regard to how social inequality is created and maintained in media representations.
Notes on contributor
Paul Baker is a Reader at the Department of Linguistics and English Language, Lancaster University. His
books include Using Corpora in Discourse Analysis (2006), Sexed Texts: Language, Gender and Sexuality
(2008) and Sociolinguistics and Corpus Linguistics (2010). He is the commissioning editor of the journal
Corpora.
Notes
1. A keyword analysis identifies words which are statistically more frequent in a particular corpus or text,
when compared against another corpus. It is therefore useful in revealing words which are salient to a
corpus and may not have been identified as important prior to analysis. A keywords approach is therefore
corpus-driven as the technique drives the analyst to account for patterns that he/she had not considered,
rather than the analyst deciding in advance which hypotheses to investigate.
2. http://www.sketchengine.co.uk/.
3. We are fairly confident that the number of articles we collected is a largely representative although not
exhaustive repository of news items about Muslims and Islam in the British press during the time period
256 P. Baker
considered. We did not consider names of individual Muslims in our search algorithm, but instead
focussed on words for identity groups or religions like Muslim and Sunni.
4. Nexis UK included The News of The World as the ‘Sunday’ version of The Sun, while The Observer
was viewed separately from The Guardian. We have retained these categorisation decisions to aid
replicability of data collection.
5. These words were identified as frequent collocates of Muslim(s), and concordance analyses showed that
they were almost always used uncritically to represent one (or usually more) Muslims as possessing
extreme belief, or as representing Islam (or parts of it) as being an extreme religion.
References
Awass, O. (1996). The representation of Islam in the American media. Hamdard Islamicus, 19(3), 87 –102.
Baker, P., Gabrielatos, C., Khosravinik, M., Krzyzanowski, M., McEnery, T., & Wodak, R. (2008). A
useful methodological synergy? Combining critical discourse analysis and corpus linguistics to
examine discourses of refugees and asylum seekers in the UK press. Discourse and Society,
19(3), 273– 306.
Baker, P., Gabrielatos, C., & McEnery, A. (forthcoming). Discourse analysis and media bias: The rep-
resentation of Islam in the British press. Cambridge: Cambridge University Press.
Hoey, M. (1996). A clause-relational analysis of selected dictionary entries. Contrast and compatibility in
the definitions of ‘man’ and ‘woman’. In C.R. Caldas-Coulthard & M. Coulthard (Eds.), Texts and
practices: Readings in critical discourse analysis (pp. 150–165). London: Routledge.
Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D. (2004). The Sketch Engine. Proceedings of EURALEX
2004 (pp. 105 –116). Lorient, France.
Koller, V., & Mautner, G. (2004). Computer applications in critical discourse analysis. In C. Coffin, A.
Hewings, & K. O’Halloran (Eds.), Applying English grammar (pp. 216– 228). London: Arnold.
Krishnamurthy, R. (1996). Ethnic, racial and tribal: The language of racism? In C.R. Caldas-Coulthard &
M. Coulthard (Eds.), Texts and practices: Readings in critical discourse analysis (pp. 129–149).
London: Routledge.
Louw, B. (1993). Irony in the text or insincerity in the writer? – The diagnostic potential of semantic pro-
sodies. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and technology: In honour of John
Sinclair (pp. 157 –176). Amsterdam: John Benjamins.
Moore, K., Mason, P., & Lewis, J. (2008). Images of Islam in the UK: The representation of British Muslims
in the national print news media 2000–2008. Cardiff: Cardiff School of Journalism, Media and
Cultural Studies.
Orpin, D. (2005). Corpus linguistics and critical discourse analysis. International Journal of Corpus
Linguistics, 10(1), 37 –61.
Partington, A. (2004). Corpora and discourse, a most congruous beast. In A. Partington, J. Morley, &
L. Haarman (Eds.), Corpora and discourse (pp. 11– 20). Bern: Peter Lang.
Poole, E. (2002). Reporting Islam: media presentations of British Muslims. London: I.B. Tauris.
Richardson, J.E. (2004). (Mis)Representing Islam: The racism and rhetoric of British broadsheet newspa-
pers. Amsterdam: John Benjamins.
Said, E.W. (1997). Covering Islam: How the media and the experts determine how we should see the rest of
the world. London: Vintage.
Van Dijk, T. (2001). Critical discourse analysis. In D. Tannen, D. Schiffrin, & H. Hamilton (Eds.),
Handbook of discourse analysis (pp. 352 –371). Oxford: Blackwell.
Watt, D. (2007). On becoming a reflexive researcher: The value of reflexivity. The Qualitative Report,
12(1), 82–101.
Widdowson, H.G. (1998). The theory and practice of critical discourse analysis. Applied Linguistics, 19(1),
136– 151.

Acceptable Bias Using Corpus Linguistics Methods With Critical Discourse Analysis

Uploaded by

Copyright:

Available Formats

Acceptable Bias Using Corpus Linguistics Methods With Critical Discourse Analysis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Acceptable Bias Using Corpus Linguistics Methods With Critical Discourse Analysis

Uploaded by

Copyright:

Available Formats

Critical Discourse Studies

ISSN: 1740-5904 (Print) 1740-5912 (Online) Journal homepage: https://www.tandfonline.com/loi/rcds20

Acceptable bias? Using corpus linguistics methods

To link to this article: https://doi.org/10.1080/17405904.2012.688297

Published online: 25 May 2012.

Submit your article to this journal

Article views: 4631

View related articles

Citing articles: 50 View citing articles

Full Terms & Conditions of access and use can be found at

Acceptable bias? Using corpus linguistics methods with critical discourse

Department of Linguistics and English Language, Lancaster University, Lancaster, UK

ISSN 1740-5904 print/ISSN 1740-5912 online

The research context

Table 1. Word Sketch of extreme belief words.

Critically considering bias

You might also like