Arti Ficial Intelligence and The Conduct of Literature Reviews

Debates and Perspectives Paper
Journal of Information Technology

2022, Vol. 37(2) 209–226
Artificial intelligence and the conduct of © Association for Information
Technology Trust 2021
literature reviews
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/02683962211048201
Journals.sagepub.com/jinf
Gerit Wagner, Roman Lukyanenko and Guy Paré 
Abstract
Artificial intelligence (AI) is beginning to transform traditional research practices in many areas. In this context, literature
reviews stand out because they operate on large and rapidly growing volumes of documents, that is, partially structured
(meta)data, and pervade almost every type of paper published in information systems research or related social science
disciplines. To familiarize researchers with some of the recent trends in this area, we outline how AI can expedite individual
steps of the literature review process. Considering that the use of AI in this context is in an early stage of development, we
propose a comprehensive research agenda for AI-based literature reviews (AILRs) in our field. With this agenda, we would
like to encourage design science research and a broader constructive discourse on shaping the future of AILRs in research.
Keywords
Artificial intelligence, machine learning, natural language processing, research data management, data infrastructure,
automation, literature review
Introduction the benefit of the research field and its stakeholders. In this
paper, our position resonates more with the latter per-
The potential of artificial intelligence (AI) to augment and spective, which is focused on the mid-term instead of the
partially automate research has sparked vivid debates in long-term, and well-positioned to advance the discourse
many scientific disciplines, including the health sciences with less speculative and more actionable discussions of the
(Adams et al., 2013; Tsafnat et al., 2014), biology (King specific research processes that are more amenable appli-
et al., 2009), and management (Johnson et al., 2019). In cations of AI and those processes that rely more on the
particular, the concept of automated science is raising in- human ingenuity of researchers.
triguing questions related to the future of research in dis- In this essay, we focus on the use of AI-based tools in the
ciplines that require “high-level abstract thinking, intricate conduct of literature reviews. Advancing knowledge in this
knowledge of methodologies and epistemology, and per- area is particularly promising since (1) standalone review
suasive writing capabilities” (Johnson et al., 2019: 292). projects require substantial efforts over months and years
These debates resonate with scholars in Information Sys- (Larsen et al., 2019), (2) the volume of reviews published in
tems (IS), who ponder which role AI and automation can IS journals has been rising steadily (Schryen et al., 2020),
play in theory development (Tremblay et al., 2018) and in and (3) literature reviews involve tasks that fall on a
combining data-driven and theory-driven research (Maass spectrum between the mechanical and the creative . At the
et al., 2018). With this commentary, we join the discussion same time, the process of reviewing literature is mostly
which has been resumed recently by Johnson et al. (2019) in conducted manually with sample sizes threatening to exceed
the business disciplines. The authors observe that across this the cognitive limits of human processing capacities. This
multi-disciplinary discourse, two dominant narratives have
emerged. The first narrative adopts a provocative and vi-
sionary perspective to present its audience with a choice Department of Information Technologies, HEC Montréal, Montréal,
between accepting or rejecting future research practices in Québec, Canada
which AI plays a dominant role. The second narrative
Corresponding author:
acknowledges that a gradual adoption of AI-based research Guy Paré, Research Chair in Digital Health, HEC Montréal, 3000, chemin
tools has already begun and aims at engaging its readers in a de la Côte-Sainte-Catherin Montréal, Québec H3T 2A7, Canada.
constructive debate on how to leverage AI-based tools for Email: guy.pare@hec.ca
210 Journal of Information Technology 37(2)
has been illustrated recently by Larsen et al. (2019), who specify. The work of Larsen et al. (2019) is exemplary in this
estimated that in the IS field, the number of relevant papers regard, developing classifiers that can automatically screen
in many research areas easily exceeds 10,000. As a conse- and include papers relevant to research on TAM (Tech-
quence, some review articles, problematically, no longer aim nology Acceptance Model). Considering these capabilities,
for comprehensive coverage, often restricting their scope to we expect AI to be most useful in the mechanical tasks of
few top journals. Overall, we anticipate that these trends will reviews compared to more creative ones. At the same time,
be reinforced in the future, further emphasizing the need to an informed discourse and methodological guidelines are
envision fruitful collaboration between human researchers necessary to identify the appropriate areas of application
and machines, such as AI-based tools (cf. Seeber et al., 2020). and to address the challenges associated with AI, such as
In light of these challenges, we focus on the contributions model overfitting, biases, black box predictions, and the
of AI which refers to the capability of performing cognitive acceptance by the research community.
tasks and exhibiting intelligent behavior commonly asso- The objective of this essay is to frame the broader
ciated with human intelligence (Russell and Norvig, 2016; discourse on how AI is and can be applied in the individual
Taulli and Oni, 2019). Specifically, we are interested in steps of the literature review process, providing illustrative
approaches that are commonly referred to as “weak AI” and exemplars for prospective authors and outlining opportu-
combine process automation (execution engines) with ca- nities for further advancing such methods. To clearly frame
pabilities like machine learning (ML) or natural language this objective, we coin the term AI-based literature reviews
processing (NLP). Machine learning refers to tools, (AILRs), which refers to literature reviews undertaken with
methods, and techniques for learning and improving task the aid of AI-based tools for one or multiple steps of the
performance with experience (Goodfellow et al., 2016; review process, that is, problem formulation, literature
Mitchell, 1997), while NLP refers to computational tools, search, screening for inclusion, quality assessment, data
methods, and techniques for analyzing, interpreting, and extraction, or data analysis and interpretation. Without
increasingly generating natural language (Manning and necessarily being driven by academic researchers, func-
Schütze, 1999). Although we are particularly interested tionality for literature searches is already supported by AI,
in tools powered by advanced AI, we do not discard pre- as implemented by academic literature databases and in-
decessors of AI per se. dexing routines. We focus on how AI-based tools can
AI offers two capabilities that are particularly salient for evolve to play an even more powerful role and further
conducting literature reviews. First, they operate on po- automate and augment steps in different types of literature
tentially fuzzy, weakly structured, and unstructured data that reviews. An important question for researchers is how such
are provided in the form of bibliographical meta-data or full- tools can best be leveraged in all stages of the review
text documents. Techniques of NLP can go beyond purely process and how it can be adapted to particular types of
syntactic processing of text by abstracting and analyzing its reviews. In doing so, it can be expected that different types
semantic meaning, thereby promising to offer valuable of reviews, such as descriptive or interpretive reviews, will
support in the searching and screening tasks. For example, be more or less amenable to the use of AI. The remainder of
papers including the word “review” may be hard to dis- this paper is structured as follows. In the next section, we
tinguish on a syntactic level, but using semantic techniques, outline the process of conducting a literature review, ex-
NLP performs much better in dissociating whether “review” plaining the steps and tasks that may benefit from AI-based
refers to a literature review or a customer review. An ex- tools. Next, we outline a comprehensive research agenda for
ample applying such techniques to IS research is offered by AILRs. We close with some concluding remarks.
Sidorova et al. (2008), who illustrate the topics prevalent in
top-tier IS journals based on latent Dirichlet allocation Artificial intelligence–based support for the
(LDA) models. This paper clearly shows the advantages of
LDA models, which allow unobserved (latent) topics to
literature review process
emerge from the analysis of bags of words. The application The literature review process involves both creative and
of NLP techniques has further been considered useful for mechanical tasks, which creates exciting opportunities for
generating semantic topics from samples of papers and advanced AI-based tools1 to reduce prospective authors’
thereby allowing researchers to explore the literature from a efforts for time-consuming and repetitive tasks and to
more abstract perspective (Mortenson and Vidgen, 2016). dedicate more time to the creative tasks that require human
Second, advanced supervised ML techniques, such as deep interpretation, intuition, and expertise (Tsafnat et al., 2014).
learning, can be trained to replicate the decisions of re- To familiarize the reader with state-of-the-art knowledge in
searchers. This relieves researchers of the task of explicating this area, we consider each step of the review process in
and codifying myriads of rules, and even more significantly, turn, outlining current AI-based tools as well as the potential
it can automate decisions for which exact rules are hard to for AI-based tool support. Corresponding opportunities for
Wagner et al. 211
further tool development and improvement are outlined in Harrison et al., 2020; Jonnalagadda et al., 2015; Kohl et al.,
the following agenda. The overview is in line with the steps 2018; Marshall and Wallace, 2019; Tsafnat et al., 2014;
of the review process that Templier and Paré (2018) have Van Dinter et al., 2021) and online registries (i.e., www.
synthesized from the methodological literature. Table 1 systematicreviewtools.com). Since AI-based tools are
provides a brief summary, explaining whether the step is constantly evolving, with some not applicable to IS research
amenable to AI-support and pointing the reader to corre- and some no longer maintained, we briefly tested those
sponding tools. deemed relevant for our main purpose. This overview is by
We collected the evidence by surveying previous liter- no means comprehensive and aims at illustrating promising
ature reviews of AI-based tools (e.g., Al-Zubidy et al., 2017; examples for IS researchers. In the following paragraphs,
Table 1. AI-based tools for steps of the review process.
Step AI-based tools Potential for AI-support
1. Problem • Programming libraries supporting thematic analyses • Moderate potential with AI potentially pointing
formulation based on LDA models (example paper: Antons and researchers to promising areas and questions or
Breidbach, 2017) verifying research gaps
• GUI applications and programming libraries supporting
scientometric analyses (Swanson and Smalheiser,
1997)
3. Search • TheoryOn (Li et al., 2020) enables ontology-based • Very high potential since the most important search
searches for constructs and construct relationships in methods consist of steps that are repetitive and
behavioral theories time-consuming, that is, amenable to automation
• Litbaskets (Boell and Wang, 2019) supports
researchers in setting a manageable scope in terms of
journals covered
• LitSonar (Sturm and Sunyaev, 2018) offers syntactic
translation of search queries in for different databases;
it also provides (journal) coverage reports
4. Screen • ASReview (Van de Schoot et al., 2021) offers screening • High potential for semi-automated support in the first
prioritization screen, which requires many repetitive decisions
• ADIT approach of Larsen et al. (2019) for researchers • Moderate potential for the second screen, which
capable of designing and programming ML classifiers requires considerable expert judgment (especially
for borderline cases)
5. Quality • Statistical software packages (e.g., RevMan) • Low-to-moderate potential for semi-automated
assessmenta • RobotReviewer (Marshall et al., 2015) for experimental quality assessment
research
5. Data extraction • Software for data extraction and qualitative content • Moderate potential for reviews requiring a formal
analysis (e.g., Nvivo and ATLAS.ti) offers AI-based data extraction (descriptive reviews, scoping
functionality for qualitative coding, named entity reviews, meta-analyses and qualitative systematic
recognition, and sentiment analysis reviews)
• WebPlotDigitizer and Graph2Data for extracting data • High for objective and atomic data items (e.g., sample
from statistical plots sizes), low for complex data which has ambiguities
and lends itself to different interpretations (e.g.,
theoretical arguments and main conclusions)
6. Data analysis and • Descriptive synthesis: Tools for text-mining (Kobayashi • Very high potential for descriptive syntheses
interpretation et al., 2017), scientometric techniques and topic • Moderate potential for (inductive) theory
models (Nakagawa et al., 2019; Schmiedel et al., 2019), development and theory testing
and computational reviews aimed at stimulating • Low non-existent potential for reviews adopting
conceptual contributions (Antons et al., 2021) traditional and interpretive approaches
• Theory building: Examples of inductive
(computationally intensive) theory development
(e.g., Berente et al., 2019; Lindberg, 2020; Nelson,
2020)
• Theory testing: Tools for meta-analyses (e.g., RevMan
and dmetar)
a
Applicable to meta-analyses and qualitative systematic reviews.
we adopt a granular perspective, highlighting AI-support for research topics. For instance, this can be achieved by ap-
individual tasks that authors can ultimately orchestrate in an plying and advancing scientometric methods (e.g., Evans
overarching data processing and tool chain.2 and Foster, 2011; Swanson and Smalheiser, 1997) and LDA
topic models (e.g., Antons and Breidbach, 2017).
Step 1: Problem formulation
Step 2: Literature search
The first step of a literature review requires authors to
identify and clarify the research questions and central In this step, authors construct the literature sample by ap-
concepts or theories (Templier and Paré, 2018). In addition, plying different search methods, including database
authors can be advised to complete an initial verification of searches, table-of-content scans, citation searches, and
the research gap (Müller-Bloch and Kranz, 2015), which may complementary searches (cf. Templier and Paré, 2018).
involve assessing whether the gap has already been ad- Depending on the goal of the review, authors can aim at a
dressed, whether the research question allows for a sub- coverage that is comprehensive, representative, or selective
stantial contribution that exceeds previous work, and whether (Cooper, 1988). Corresponding search methods can be
it is indeed important to address the gap (Rivard, 2014). assembled in complex search strategies, involving several
We expect moderate potential for AI-support in the iterations, collaborative work of research teams, and AI-
problem formulation step, in which we focus on the based tools. Due to the heterogeneity of the information
identification and verification of research gaps. Given that retrieval process, the variety of data sources (e.g., journals,
the natural sciences have witnessed some exciting advances conference proceedings, books, and different forms of grey
with regard to detecting new research gaps and promising literature), and the plethora of data quality problems (e.g.,
starting points for hypotheses, it can be hoped that some of incomplete or incorrect meta-data), appropriate data man-
this work will eventually be applied and adapted to the agement strategies are vital. Ultimately, they should enable
social sciences. For instance, there are path-breaking ad- transparent reporting (Paré et al., 2016; Templier and Paré,
vances with regard to automated generation of hypotheses 2018), as well as repeatability and reproducibility (Cram
(and experimental testing in automated laboratories) in et al., 2020).
biochemistry (King et al., 2009) and machine-learning Recent work in IS provides compelling advances, es-
approaches for scientometric, literature-based discovery pecially with regard to the prevalent database searches. We
in computer science (Thilakaratne et al., 2019). These areas highlight three tools that have been published recently. First,
are certainly more predestined for the initial application of the TheoryOn search engine (Li et al., 2020) offers a
AI because they do not necessarily raise complex ethical complementary option to traditional databases by allowing
decisions of research involving human participants, or the researchers to execute ontology-based searches for indi-
fuzziness of behavioral theories and constructs prevalent in vidual constructs and construct relationships across be-
IS research (Li et al., 2020). Nevertheless, research in the havioral theories. Second, Litbaskets (Boell and Wang,
social sciences may eventually be inspired by these ap- 2019) can inform the design of search strategies. This
proaches. Overall, they could stimulate researchers in web-based tool allows researchers to assess the potential
identifying areas in need of review papers or further re- volume of database search results based on pre-specified
search more broadly. Still, we expect that human judgment keywords and different sets of journals which can be ad-
will remain a necessary ingredient for this step in the near justed in a flexible way. Third, LitSonar (Sturm and
future, especially for research generating questions through Sunyaev, 2018) greatly facilitates the execution of data-
problematization (Alvesson and Sandberg, 2011). Beyond base searches by automatically translating search queries for
the discovery of research gaps and areas in need of a review a range of literature databases relevant for IS reviews (e.g.,
paper, there is some potential for supporting researchers in EBSCO, AIS eLibrary, and ProQuest). This tool is par-
verifying whether the gaps are still open by identifying ticularly promising because it provides a coverage report,
identical or similar knowledge contributions and previous potentially identifying possible database embargos (periods
review papers. In the discovery and verification, the use of AI is in which journals are not indexed), and thereby alleviating
likely to involve uncertainty, requiring researchers to make final insufficiencies of academic databases.
decisions regarding the treatment of research gaps. Overall, this step tends to be time-consuming with many
In sum, current tool support for this initial step is still in mechanical tasks potentially lending themselves to auto-
the early stages of development with published approaches mation (Carver et al., 2013; Johnson et al., 2019). The need
implementing their code on an individual basis as opposed for automation and AI-support is particularly salient when
to drawing on mature GUI-based tools. Researchers with considering the rapid growth of research output (Larsen
programming skills can find inspiration in previous works et al., 2019) and the inefficiency of investing valuable
exploring and detecting research gaps and opportunities for time of academic experts to complete repetitive and me-
inter-disciplinary research at the intersection of literatures or chanical tasks.
Wagner et al. 213
Step 3: Screening for inclusion (2019). The authors point out that reviews of popular
theories may be infeasible, especially when thousands of
In the screening step, authors work with the search results to relevant papers have been published. In such cases, they
dissociate the relevant papers from those that must be ex- propose the thought-provoking possibility to consider
cluded from the review. This step is typically divided into a sampling papers randomly from the set of relevant (called
first (more inclusive) screening based on titles and abstracts theory-contributing) papers, as identified by machine-
and a second (more restrictive) screening based on full-texts learning algorithms. In line with random sampling in em-
(Templier and Paré, 2018). In manual screening processes, pirical studies, Larsen et al. (2019) thereby suggest that
researchers execute the tedious task of checking hundreds or random selection may be useful in literature reviews to
thousands of papers. In this process, they are likely to obtain a view of the literature that is representative for the
experience fatigue, which may interfere with their ability to scientific discourse. As a result, the Automated detection of
accurately dissociate cognitively demanding borderline implicit theory approach is particularly promising for de-
cases. This explains why methodologists have recom- veloping reviews that use a theory as the unit of analysis
mended to conduct a first screen in which researchers ex- (Theory and Review Articles) and encounter excessive
clude papers that are clearly irrelevant (based on titles and amounts of research following up on the selected theory.
abstracts) and to deliberately retain challenging papers for a We expect the potential for AI-support to be high for the
second round of screening. To ensure screening perfor- first screen and moderate for the second screen. The first
mance in the second round, researchers typically work with screen seems particularly amenable to partial automation
smaller samples (after excluding a bulk of papers in the first and AI-support because it is more inclusive and does not
screen), consult the full-text documents, apply more specific require final exclusion decisions for borderline cases. This
(pre-defined) exclusion criteria, and execute parallel inde- arguably requires machines to have adequate capabilities of
pendent assessment with team decisions on the final bor- reading and “understanding” abstracts and titles. In contrast,
derline cases. The most rigorous screening procedures have the second screen is dedicated to disentangling the re-
been suggested for reviews aimed at theory testing maining cases, which can be particularly challenging since
(Templier and Paré, 2018), in which erroneous inclusion IS research is not standardized as strictly as other disci-
decisions may have more significant and measurable effects plines. In contrast to the health sciences and biology, for
on the conclusions of the review. instance, the lack of widely used taxonomies for IS con-
AI-based tool support for screening has been evolving structs, standard vocabulary for keywords (e.g., MeSH
over the years (Harrison et al., 2020). While many tools suffer terms), and descriptive paper titles makes it difficult to
from severe restrictions for screening IS research (e.g., op- achieve required classification performance in the second
erating primarily on health sciences databases and requiring screen (cf. O’Mara-Eves et al., 2015). This challenge ap-
PubMed IDs), ASReview (Van de Schoot et al., 2021), which plies to humans and machines alike. Taken together, the
has been published recently, offers an option for researchers in screen and search (considered as an information retrieval
IS. This tool is a particularly promising exemplar since it task) should primarily be evaluated in terms of the recall,
combines inspectability of the code (published under the that is, the proportion of papers that are successfully re-
Apache-2.0 License), extensibility (availability of code and trieved (i.e., relevant). Authors of literature reviews tradi-
documentation, implementation using popular Python ML- tionally target a high recall by executing comprehensive
libraries), ongoing validation efforts, and interoperability searches and thereby accept very low precision and the
(offering import and export of common Research Information corresponding screening burden (Li et al., 2020). One
System Format and Comma-separated values files). Im- implication of Li et al.’s (2020) work is that AI-supported,
plementing a range of ML classifiers (including Naive Bayes, ontology-based searches may effectively prevent some of
Support Vector Machines, Logistic Regression, and Random the screening burden through better precision. Overall, the
forest classifiers), it learns from initial inclusion decisions and two screening steps are therefore among the most time-
leverages these insights to present researchers with a priori- consuming activities of the literature review process (Carver
tized list of papers (i.e., the titles and abstracts), proceeding et al., 2013). When considering potential AI-support of this
from those most likely to be included to those least likely. This step, the reliability of manual screening processes should
allows researchers to efficiently work through a prioritized list not be overestimated, even if the screen is conducted by
of papers in the first inclusion screen and even rely on au- academic experts. In fact, recent evidence in the health
tomated exclusion by stopping to screen after decisions ex- sciences suggests a base rate of 10% disagreements between
ceed a certain number of excluded papers in a row (e.g., n = inclusion screens conducted independently (Wang et al.,
100). Borderline cases can be retained for a second screen in 2020). This indicates that it may even be possible to aug-
which decisions are based on full-text documents. ment and improve screening activities of researchers by
Furthermore, researchers with programming skills can having AI-based tools identify inconsistent and potentially
adapt the discourse approach proposed by Larsen et al. erroneous screening decisions.
Step 4: Quality assessment and facilitate the transfer and organization of data into
corresponding repositories. We do not expect full auto-
The quality assessment involves checking primary empir- mation of more significant data items in the near future.
ical studies for methodological flaws and other sources of Even in the health sciences, which have established rela-
bias (Higgins and Green, 2008; Kitchenham and Charters, tively consistent reporting practices, corresponding tools
2007; Templier and Paré, 2018). This step is intended to designed to extract study characteristics like the PICO
assess the degree to which the conclusions of reviews (population, intervention, context, and outcome) elements
aimed at theory testing may be affected by different types of are still in the early stages of development (Jonnalagadda
biases (e.g., selection, attrition, and reporting bias). It is et al., 2015).
recommended to conduct these procedures in a parallel and
independent way to ensure high reliability (Templier and
Paré, 2018). Step 6: Data analysis and interpretation
We believe the potential for AI-based tools supporting The final step of the review process can take various forms,
these procedures is low to moderate for two reasons. First, depending on the type of review (Templier and Paré, 2018).
assessing (methodological) quality is a challenging task Some reviews put more emphasis on elegant narratives
which requires expert judgment, making it difficult to which convey insightful and deeply hermeneutic interpre-
achieve high inter-coder agreement (Hartling et al., 2009). tations while others are designed to eliminate any subjec-
Second, sample sizes in IS reviews (meta-analyses and tivity which may interfere with the accuracy of aggregated
qualitative systematic reviews) are not excessively large, evidence or descriptive overviews.
that is, manual assessments are still manageable. Following Depending on the main knowledge building activities
methodological guidelines for quality appraisal and risk of (Schryen et al., 2020), IS researchers can use different tools.
bias assessment, IS researchers conducting meta-analyses For descriptive syntheses, there is a range of established
and systematic literature reviews can leverage traditional tools for text-mining (Kobayashi et al., 2017), as well as
tools like RevMan (cf. Bax et al., 2007) or corresponding tools for analyzing and visualizing topics, theories, and
packages of statistical software environments like R and research communities based on scientometric techniques,
SPSS. Further AI-based tools like RobotReviewer computational techniques, or LDA models (Balducci and
(Marshall et al., 2015) can also be applicable for meta- Marinova, 2018; Nakagawa et al., 2019; Thilakaratne et al.,
analyses in IS. While focusing on risk of bias assessment of 2019), for instance. Further promising tools originally ap-
randomized controlled trials in the life sciences, Robot- plied to unstructured data in contexts such as technology
Reviewer is an excellent exemplar for explainable AI, al- adoption (Laurell et al., 2019), corporate communication
lowing researchers to interactively trace ratings in each (van Zoonen and van der Meer, 2016), or corporate social
domain of bias to its origin in the full-text document. responsibility (Tate et al., 2010) could be adapted to the
needs of IS review papers. For inductive work, there is an
increasing amount of research on computationally intensive
Step 5: Data extraction
techniques that leverage data for theory generation (e.g.,
Data extraction requires researchers to identify relevant Berente et al., 2019; Lindberg, 2020; Nelson, 2020). Finally,
fragments of qualitative and quantitative data and to transfer for theory testing, there is a range of applications and li-
them to a (semi) structured coding sheet (Templier and Paré, braries for meta-analyses, such as RevMan (cf. Bax et al.,
2018). It is more salient in descriptive reviews, scoping 2007) or the R package dmetar.
reviews, and reviews aimed at theory testing compared to In assessing the potential for future AI-based tools to
reviews that are more selective and interpretive such as support data analysis, we need to take into account that this
narrative reviews and theory development reviews. step can take various forms. In pre-theoretical reviews, AI-
Current tool support in the IS field reflects the moderate based tools offer capabilities to generate descriptive in-
potential of AI in this area, with authors relying on general sights, for example, based on topic modeling (Kunc et al.,
tools for qualitative data analysis, such as ATLAS.ti and 2018; Mortenson and Vidgen, 2016; Schmiedel et al.,
NVivo (which are starting to implement NLP and machine 2019). As a promising bridge toward conceptual contri-
learning algorithms for tasks such as automated qualitative butions, Antons et al. (2021) advance the notion of com-
coding, named entity recognition and sentiment analysis), putational reviews. Specifically, they suggest to leverage
or specialized tools for extracting data from tables or sta- descriptive and scientometric analyses to stimulate re-
tistical plots, such as WebPlotDigitizer or Graph2Data. searchers in pursuing conceptual goals of explicating, en-
We expect the potential for supporting this step with AI visioning, relating, and debating. Theory development in IS
to be moderate. Learning from ongoing data extraction appears to be most amenable to AI-support when it follows
decisions prospective tools could progressively improve, an inductive approach (Berente et al., 2019). We have yet to
highlight the more promising fragments in a given paper, witness exemplars of AI-driven theory development in a
Wagner et al. 215
behavioral research domain which comes close to the in- (primarily on Levels I and II) as well as actionable advice
genuity and creativity displayed in some of the strongest targeting individuals and the IS community as a whole
theory and review papers. After all, it is important to re- (primarily on Level III). With this broad agenda, we em-
member that new assemblages of constructs and relation- phasize that AI and AILRs raise interesting questions on
ships are not a sufficient condition for a strong theoretical how we conduct and synthesize research.
contribution. As pointed out by Johnson et al. (2019), it is
the “why” associated with the relationships, the underlying
theoretical rationale (Whetten, 1989), which is critical and
Level I: Supporting infrastructure
one of the open challenges for AI-based theory development Technical infrastructure can greatly facilitate or constrain
in the near future. In contrast to the creative and unstruc- AILRs. The diversity of infrastructure needed to support a
tured endeavor of theory development, the process of ag- vibrant AILR tradition within IS points to a wide range of
gregating evidence from primary studies in order to test related opportunities for research and design. We cover
hypotheses and theories (most notably in a meta-analysis) quality assurance, smart search technologies, and enhanced
can largely be supported by AI-based tools using the extant databases.
methodological literature as guidance.
Quality assurance of AILR inputs. One of the prominent char-
acteristics of AILRs is scalability. The expanding computa-
A research agenda tional resources allow ever greater numbers of research papers
In this section, we outline an agenda suggesting how IS to be extracted and analyzed with some projects reporting
researchers can focus and coordinate their efforts in ad- tens (Larsen et al., 2019) and others reporting hundreds of
vancing AILRs. Nurturing a vibrant AILR tradition is a task thousands (Dang et al., 2009) of papers processed. In such
for the entire scholarly community, including design science cases, the technical quality of paper documents would vary
researchers, behavioral scientists, methodologists, re- (e.g., regarding Optical character recognition and the in-
viewers, and journal editors as well as authors of primary clusion of additional, non–content-related text), and the
research papers. The AILR-centric agenda for research, scale of work necessitates novel methods and metrics for
design, and action, as displayed in Figure 1, covers three preprocessing and establishing the quality of inputs into
levels: (I) supporting infrastructure, (II) methods and tools, AILRs (Antons et al., 2021). Considering the scale, an
and (III) research practice. important consideration in undertaking this work is the need
The agenda proceeds from technical questions of how to establish and, if necessary and possible, improve quality
research is stored and made accessible (Level I) to specific of inputs automatically, with little human intervention.
questions of how methods and tools can support the process Furthermore, methods and tools should attempt to cover a
of conducting AILRs (Level II), to overarching community- diverse range of AILR inputs, including traditional research
centric questions of how IS research could facilitate the papers, grey literature, and objects of research, such as IT
conduct of AILRs (Level III). For each level, we suggest artifacts. These challenges create fertile opportunities for
fruitful opportunities, focusing on research and design collaboration with researchers working on information
Figure 1. A research agenda for AILR-centric research, design, and action.

quality of big heterogenous data within IS, computer sci- Enhanced databases. We envision enhancements of IS da-
ence and beyond, many of whom also utilize AI methods tabases and complementary repositories which can greatly
(Batini et al., 2015; Kenett and Shmueli, 2016; Lukyanenko facilitate AILRs. First, recognizing that there are many areas
et al., 2019; Wahyudi et al., 2018). in which scholarly databases could improve, we call for
research advancing coverage reports and improving inter-
Smart search technologies. To facilitate information retrieval operability of academic databases. A prevalent challenge
from databases and subsequent analyses, more research is for literature reviews in the social sciences, in general, and
needed on smart search technologies. This involves going IS, in particular, is the lack of databases comprehensively
beyond a mere word matching and seeking to understand curating research published in the main outlets, including
the intent and meaning behind a search query. First, work is journals and conferences (vom Brocke et al., 2015), ac-
needed on better syntactic interpretation of search words, companied by increasing volatility of database indices and
such as research on query parsing and validation (Russell- search algorithms (Cram et al., 2020). This requires re-
Rose and Shokraneh, 2019). Indeed, machine learning has searchers to search multiple sources and apply multiple
become a major driver in NLP improvements. One notable search techniques (Papaioannou et al., 2009; Templier and
opportunity here is the development of IS-specific NLP Paré, 2018). The ground-breaking paper of Sturm and
query parsing algorithms, as parsing text is partially Sunyaev (2018) illustrates how journal coverage reports
context-dependent (Eisenstein, 2019). could enable substantially more targeted and efficient lit-
Second, to go from syntactic parsing to understating the erature searches. We further emphasize that limited inter-
meaning and intent of a search query, AILR tools must operability (accessibility via APIs) is still a major obstacle
possess the ability to infer and reason beyond the infor- breaking the data processing pipeline between the database
mation provided. Here, domain-specific ontologies can and local repositories of the research team, introducing
greatly facilitate deeper semantic interpretation of search manual database queries and duplicate checking as potential
queries. Unlike simple tags or labels, ontologies capture sources of errors.
nuanced semantics of the domain, including concept defi- Second, data curation initiatives could benefit from the
nitions and relationships among concepts (e.g., “ERP” is a interplay of supervised ML and crowdsourcing platforms,
kind of “enterprise IT,” “design principles” are synonymous targeting annotation, quality control, and synthesis. For
with “design guidelines,” “UTAUT” is an extension of example, the Cochrane crowd is a section of the larger
“TAM”). Research opportunities pertain both to IS-specific online medical database which actively solicits volunteers
and foundational ontologies. Work on IS-specific ontologies “to help categorise and summarise healthcare evidence.”3
has already been undertaken in the past (Alter, 2005; Beyond the IS theory wiki,4 which is curated by online
Lukyanenko, 2020), and the need to support AILRs could editors in a manner similar to Wikipedia, we are not aware
provide a new impetus to this line of work. Indeed, in fast- of any such efforts in the IS discipline. Considering the
paced disciplines like IS, for such ontologies to be useful, potential value such repositories may bring, we call for more
they need to be current and up to date, necessitating con- work on crowdsourcing for research, including on how to
tinuous research attention. motivate volunteer researchers to categorize and analyze IS
We call for further work on ontology-based indexing to literature and how to evaluate and improve quality of the
improve discoverability of scholarly content (cf. Li et al., contributions.
2020). This could involve the use of domain ontologies to Third, we highlight opportunities for advancing com-
assign specific labels to papers and their content (e.g., plementary database repositories for models and artefacts.
constructs, construct relationships, theories, themes, and To ensure long-term viability and cross-fertilization of
methodologies), potentially alleviating the problems of AILRs, there are opportunities to design reusable reposi-
“concept drift” or buzzwords in IS (O’Mara-Eves et al., tories for NLP and ML models, whitelists and lexicons
2015; vom Brocke et al., 2015). which can then be shared with the community (Dalgali and
Further design science and empirical research is needed Crowston, 2019) and incorporated into AILR routines. This
on foundational ontologies, which commonly provide the can be especially powerful for capturing recurring patterns
basis for domain ontologies, describing primitive constructs in IS literature, such as AI models capable to understanding
and rules for using them in domain ontologies (March and common dependent variables (e.g., intention to adopt),
Allen, 2014; Wand and Weber, 1988). With the develop- which could be reused as module components for specific
ment of new ontologies constituting a research opportunity, literature reviews (e.g., adoption of electronic health record
surveys and evaluations of different ontologies will also be technologies).
needed, especially for the objective of creating an IS- In addition, tools and algorithms are needed to extract
specific domain ontology to guide AILRs. and analyze non-textual subject matter of research papers,
Wagner et al. 217
especially the IT artifacts. IS researchers have begun to topics (e.g., based on sample overlap), facilitating research
conduct reviews of the features of IT artifacts, known as gap verification, the development of related work sections,
“design archaeology” (Gleasure, 2014; Chandra Kruse et al. and contribution statements.
2019). Corresponding algorithms for automated reviews of
digital artifacts can be based on recent advances in image Step 2: Literature search. We highlight two opportunities to
mining, process mining, or computer vision. This is a leverage AI in the search step. First, backward citation
unique opportunity native to the IS discipline, which stands searches (wherein the literature is drawn from the references
to enrich AILRs broadly. in a focal article) are rarely reported in IS review papers
Overall, advancing quality assurance and domain- despite initial evidence for their effectiveness (Jalali and
ontologies, as well as information curated in enhanced Wohlin, 2012; Papaioannou et al., 2009). The tedious task
databases and local repositories will enable prospective of scanning multiple (in all likelihood overlapping) refer-
authors to execute AILRs more efficiently, to expedite ML ence sections would easily lend itself to AI-supported ex-
training, and to enhance ML models by infusing domain traction, consolidation, and merging of reference data.
knowledge into NLP algorithms. These advances can fa- Consideration of additional cues (e.g., frequency and
cilitate research progress through corresponding tools and contexts of citations) could even provide a basis for pri-
methods supporting distinct steps of the literature review oritizing or filtering de-duplicated search results. Second,
process. we expect further progress regarding tools aimed at doc-
umenting, analyzing, and justifying individual search
strategies (cf. Templier and Paré, 2018). IS researchers
Level II: Methods and tools
could use initial work on syntactic search query validation
There are vast opportunities for methodological and tool- (Russell-Rose and Shokraneh, 2019) to build tools sup-
centric research on AILRs. We outline promising avenues porting researchers in designing and improving different
for research and design in each individual step of the review elements of search strategies. Examples of promising
process and offer complementary recommendations on starting points include the analysis and justification of the
cross-cutting concerns, covering the need for advancing scope in terms of publication outlets covered and the se-
evaluation studies, conceptions of validity, and the notion of lection of search terms in database searches.
transparency in AILRs.
Step 3: Inclusion screen. The screening tasks provide the
Step 1: Problem formulation. Future design-oriented research most significant opportunities for advancing AI-based tools.
targeting problem formulation could facilitate posing in- Most importantly, this pertains to supporting the time-
novative questions, discovering unanticipated patterns in consuming first screen (based on titles and abstracts), in
the literature, and promoting novel research perspectives on which AI-based tools and humans can complement each
the phenomena under investigation. This problem of dis- other in different ways (see Larsen et al., 2019; O’Mara-
covering novel insights from big data repositories has al- Eves et al., 2015, van de Schoot et al., 2021). Many
ready been discussed in IS (Rai, 2016) and investigated in promising tools designed for reviews of health research
various contexts (e.g., Germonprez et al., 2007; Kallinikos (e.g., Wallace et al., 2012) could serve as an inspiration for
and Tempini, 2014; Lukyanenko et al., 2019). The lessons design-oriented work in IS. Since IS research does not
from these studies could be transferred to the retrieval of follow comparable standards regarding research reporting
information from knowledge repositories and subsequent and regulated vocabulary, such as the MeSH terms in the
discovery of novel problems. A second stream of research health sciences (cf. O’Mara-Eves et al., 2015), modifica-
opportunities relates to facilitating the verification of re- tions and careful evaluation are necessary. Beyond sup-
search gaps (Müller-Bloch and Kranz, 2015). For instance, porting the screen, further AI-based tools could target two
this could be accomplished based on ML classifiers iden- mechanical, tedious tasks intertwined with the screen,
tifying previous review papers (cf. Tsafnat et al., 2014). namely the acquisition of full-texts between the first and
While health scientists have dedicated repositories of re- second screen (Thomas et al., 2017; Tsafnat et al., 2014) and
view papers at their disposal (e.g., the Cochrane library), IS the identification of studies reporting results from the same
researchers do not have easy access to an overview of dataset (Templier and Paré, 2018).
published reviews. A corresponding artifact could enable
prospective authors to verify whether their research ques- Step 4: Quality assessment. There are several opportunities
tions have already been addressed and to make more infor advancing AI-based tool support for the quality ap-
formed statements on related review papers, especially praisal step. This is underlined by the fact that current re-
when confronted with a rapidly growing volume of papers search practices regarding the reporting of quality
and reviews published outside the top-tier journals. Such assessment in meta-analyses of IS research are insufficient,
tools could provide a prioritized list of reviews on related even for meta-analyses published in top-tier journals
(Templier and Paré, 2018). While methods papers in IS are and deeper meaning. This research could contribute to work
beginning to recognize the critical role of quality assess- on deep linguistic processing, context extraction, word-
ment, corresponding procedures are a more established sense disambiguation, and other open ML and NLP
element in general meta-analysis methods papers (e.g., problems (Dörpinghaus and Stefan, 2019; Raganato et al.,
Higgins and Green, 2008; Hunter and Schmidt, 2014). 2017; Stanovsky et al., 2017; Wang et al., 2020). We believe
Corresponding design science research could draw in- this second, discourse-oriented perspective illustrates how
spiration from risk of bias assessment tools like Robot- tools in the social sciences may differ from those in the
Reviewer (Marshall et al., 2015) and advance AI-supported natural sciences.
quality appraisal of non-experimental, observational, and
cross-sectional research designs. In a first step, this would Step 6: Data analysis and interpretation. Regarding the final
be greatly facilitated by classifiers dedicated to detecting the step, we focus on knowledge integration and inductive
research designs and methods of primary papers. In a theory development. With knowledge integration still
second step, design-oriented work could aim at (partial) posing challenges for prospective authors, we call for the
automation of quality assessment based on checklists development of discipline-specific algorithms to address
and criteria for observational studies (Shamliyan et al., issues especially prevalent in IS, such as similarity as-
2010), surveys (Pinsonneault and Kraemer, 1993), posi- sessment of IS measurement items or IS constructs, and IS
tivist case studies (Dubé and Paré, 2003), or Delphi studies construct integration. Here, important progress can be made
(Paré et al., 2013). by anchoring solutions in IS domain knowledge. For ex-
ample, similarity judgment for reflective vs formative
Step 5: Data extraction. Future AI-based tools supporting the constructs rely on different assumptions about the rela-
data extraction step may adopt two perspectives on extant tionships between measurement items, and knowledge of
research. First, there is the view that papers contain data and whether a given IS construct is generally reflective or
evidence that have a singular interpretation and should lend formative can be useful. Hence, different algorithmic
themselves to efficient extraction and analysis. Consistent similarity metrics can be used depending on the type of
with the positivist paradigm (which holds that there is single constructs involved. For formative constructs, measurement
ground truth accessible through empirical studies), corre- items are expected to be quite dissimilar from one another,
sponding tools may spot relatively isolated fragments of a as the items represent different dimensions of the higher
paper to provide researchers with target categories like order construct (e.g., measurement items of information
methodological characteristics or details of the research quality’s dimensions of perceived accuracy are quite dif-
design. There are several opportunities of transferring and ferent from those which measure perceived information
adapting corresponding tools aimed at extracting study completeness, see, e.g., Xu et al., 2013). In this case, to
characteristics (Jonnalagadda et al., 2015). Second, con- ensure that these measurement items can be automatically
sistent with philosophical paradigms recognizing that related, a domain-specific ontology can be helpful. In
multiple interpretations may exist simultaneously (e.g., contrast, for reflective constructs, the algorithms can expect
interpretivism, critical social theory, or critical realism), to detect high synonymity among the measurement items
research can be viewed as a discourse in which authors of (e.g., as in the case of items which measure perceived ease
review papers examine different arguments and possibly of use), which could be a signal that these measurement
diverging interpretations of the same observations (Avison items belong to the same reflective construct.
and Malaurent, 2014; Klein and Myers, 1999) in the process Work on integrating constructs is already being pursued,
of forming a synthesis. The discourse evolving around including such artefacts as CID1 (Larsen and Bong, 2016),
particular theoretical models, that is, the focus of Larsen ADIT (Larsen et al., 2019), ideational impact classifiers
et al. (2019), is one very promising way to operationalize (Prester et al., 2020), RefMod-Miner (Hake et al., 2017), or
this perspective. Future work in this area is unlikely to TheoryOn (Li et al., 2020). Developing theories lies at the
succeed when considering isolated fragments. Instead, core of IS research (Leidner, 2018; Rivard, 2014; Webster
identifying main arguments and influential ideas will re- and Watson, 2002). AILRs, which can operate on larger
quire consideration of a broader context. This pertains to an volume of literature, provide an ideal basis for large-scale
argument’s position in the overall structure of the paper inductive theory development (Berente et al., 2019;
(Prester et al., 2020), overarching ontologies (Li et al., Choudhury et al., 2018; Nelson, 2020), resulting in sub-
2020), or its relation to previous work and the meaning stantive opportunities for future studies. Here, many open
associated with cited papers (Hassan et al., 2020; Small, questions remain, for instance, ascertaining which steps of
1978). It will require researchers to go beyond basic NLP the theory-building process based on literature are most
models and improve NLP algorithms for context-sensitive ripe for automation. Another key challenge is assurance
data extraction, analysis, and synthesis which are capable of of validity and rigor in inductive generation of components
capturing and representing higher order linguistic structures needed for theory construction (i.e., constructs, relationships,
Wagner et al. 219
and boundary conditions) from heterogenous literature 2016; Prester et al., 2020). However, a recent review of
sources which may vary in quality and contain tacit as- design validities, including AI-based validities (Larsen
sumptions. Automated discovery of causal chains from et al., 2020), concluded that we continue to lack special-
literature is a booming practice in medicine and bio- ized design science validities needed to establish common
informatics (Hossain et al., 2012). However, it has yet to be principles of rigor when designing and applying such ar-
widely utilized in IS, which offers a lucrative prospect of tifacts as ML and NLP. In the context of evaluation and
finding novel connections among distal IS phenomena (e.g., validity, many open questions remain: How can bias in
organizational IT, social media, and Internet of things). AILRs arising from using unrepresentative literature
Although, as we stated earlier, we do not envision tools to sources or specific local design choices (e.g., feature en-
replace humans; here, we are excited to see how a theory gineering decisions) be detected and mitigated? Do we need
development could benefit from scalable automated pattern specialized validity categories (Larsen et al., 2020) to
discoveries in IS. capture the nuances of automated literature reviews? For
AILRs such validities could be concerned with the quality
All steps. We conclude this section by outlining four aspects of inputs (e.g., research papers and other variables, such as
relevant to AI-based tools across the six steps of the lit- hyperparameters), whether the internal model’s character-
erature review process: (1) evaluation and validity, (2) istics could interfere with the outcomes and conclusions,
transparency and replicability, (3) compatibility for re- and whether language interpretation is valid and appropriate
combination, and (4) usability. These aspects point to the to the domain of IS and specific contexts of the study.
need for methodological research as well as an accompa-
nying discourse in the IS community, both critical pillars for Transparency and replicability. There has been concern
the development of best practices (further discussed in level and growing effort to make IS research more transparent to
3 of this agenda). support replication of findings and application of IS
knowledge in practice (Burton-Jones et al., 2021). Ac-
Evaluation and validity. Evaluation methods and studies of cordingly, the traditional literature review process seeks to
the feasibility, effectiveness, and utility of AILR methods be transparent and replicable in IS and beyond (Paré et al.,
and tools are needed, considering that they operate at the 2016; Templier and Paré, 2018). Typically, researchers
intersection of data, theory, and complex computational query databases (e.g., Web of Science) with explicit key-
software systems. It is unlikely that a single evaluation type words and then follow pre-defined steps for screening,
(e.g., behavioral experiment) could lend sufficient and extracting, and analyzing the results. Yet, decisions can
comprehensive support for the various considerations in sometimes be idiosyncratic, leading to reproducibility
developing and using AILR tools. Presently, little guidance concerns (Cram et al., 2020). AILRs may make literature
exists on what constitutes an effective evaluation of these extraction and coding more reproducible, as the same ML or
tools, and researchers have started to draw attention to the NLP logic can be applied to new datasets, further allowing
lack of methodological guidance on performing such reviews to incorporate recently published papers or add new
evaluations (Li et al., 2020). Drawing inspiration from Li journals. At the same time, AILRs raise new transparency
et al.’s (2020) multi-stage evaluation approach, we en- concerns. The problem is two-fold. First, AI frequently
courage future work on the evaluation of AILR methods and relies on black box models for search, data extraction, and
tools, potentially combining ML experiments, behavioral classification tasks. The very power of such approaches
field and laboratory studies, and applicability checks. As (e.g., deep learning neural networks) lies in their ability to
part of reporting AILRs, as well as designing and evaluating form thousands of extremely nuanced and complex rules
AILR tools, researchers invariably make assertions and resulting from millions or even billions of iterations over
claims about properties, behavior, and value of these tools, training data (Castelvecchi, 2016; Holzinger, 2016). Indeed,
raising questions of validity in this context. Validity deals the complexity of the resulting models is so high that the
with justification of claims (such as inferences and con- scientists themselves may not fully understand how the
clusions) in research studies (Lindzey et al., 1998), in- algorithms work exactly (Hutson, 2018a). Second, even if
cluding literature reviews (Paré et al., 2016). The area of AI AI models themselves are simple and relatively interpret-
developed a set of validation procedures to establish the able (e.g., regression models or decision trees), the process
performance of automated classifiers. The most common of generating these models may be opaque, lacking rigor
measures are precision, recall, their harmonic mean (F- and systematization. For example, in training ML models,
measure), or the area under the receiver operating charac- many local choices need to be made (e.g., input normali-
teristic curve (AUC), which captures the diagnostic ability zation, dimensionality reduction, missing value imputation,
of a classifier based on varying discrimination thresholds feature engineering, and hyperparameter selection). In the
(O’Mara-Eves et al., 2015). These measures may be used to absence of standardized procedures, these are commonly
assess the validity of AILR findings (Larsen and Bong, performed in an ad hoc manner, with great reliance on the
experience and intuition of researchers as well as trial and Usability. Developing AILR tools is not only a technical
error approaches (Anderson et al., 2013; Duboue, 2020). problem, but is also a usability challenge, resulting in ample
These choices are then seldomly explained or shared. This opportunities for research on human factors in tool use. Poor
means that it could be unclear how to produce these exact usability has been a persistent concern in this area (Marshall
models given the inputs (Castellanos et al., 2021). This and Wallace, 2019) with current tools often created on an ad
makes it challenging to follow the logic and replicate AILRs hoc basis, implementing idiosyncratic interfaces, which are
using traditional human coders. In short, the lack of model difficult to understand. The tools also face the challenging
preparation transparency is a key culprit for the current tension between offering simple interfaces that are acces-
reproducibility crisis in ML (Hutson, 2018b; Jones, 2018), sible to users without AI knowledge while at the same time
which could also affect AILRs. This creates an opportunity allowing advanced users to adapt, modify, and combine
for research on the most appropriate and effective strategies algorithms. We note that ASReview is a promising exemplar
in the AILR context. To improve replicability of literature in this regard since it offers a graphical user interface as well
reviews (Cram et al., 2020), research on transparency and as command-line access to the underlying code (Van de
explainability of ML and NLP models is needed. Such Schoot et al., 2021). These issues create an opportunity for
efforts can build on the extant research in the computer researchers in human–computer interaction and usability
science (Castelvecchi, 2016; Gunning and Aha, 2019; research to study and improve upon the interface design and
Knight, 2017), and we invite IS scholars to contribute to process flow of AILR tools. As the portfolio of methods and
these efforts in the context of AILRs. More broadly, we tools supporting AILRs continuously expands, we en-
hope future researchers can begin addressing such questions courage IS researchers to join forces in review teams,
as: How can the AI-based literature identification and search bringing together researchers with expertise in state-of-
be made more transparent? How can automated data ex- the-art tools and the broader spectrum of NLP and ML
traction and analysis become more explainable? Also, how techniques with others who have experience in theory
can transparency be improved when AI is used at multiple development, for example. There are many interesting and
stages of the literature review process? Until these questions exciting possibilities of integrating the unique strength of
can be answered to a degree of satisfaction of the research AI and human–computer interaction and usability re-
community, transparency is likely to remain a persisting searchers in the collaborative process of conducting AILRs
constraint on AILRs. (cf. Raisch and Krakowski, 2020; Seeber et al., 2020).
Of special importance is development of methods for
undertaking local decisions appropriate for the AILR
Level III: Research practice
context. Using ML and NLP algorithms involves making a
large number of specific, local decisions. For example, AILRs require broader considerations, which we believe,
before running an ML algorithm, a researcher has to specify should involve the entire IS community, including authors
parameters (hyperparameters). Furthermore, to improve of papers surveyed by AILRs, their reviewers, community
performance, the data (here, contents of research articles) thought leaders, and innovators interested in improving the
are typically transformed using a variety of techniques (e.g., way we conduct research. We highlight two broad streams
normalization, dimensionality reduction, and missing value of discussion pertaining to standardization and sharing.
imputation).
Standardization debate within IS. To support AILR practice
Compatibility for recombination. It is also important to within IS, a discussion is needed on the potential and
design AILR tools with future recombination in mind boundaries of greater standardization within the discipline.
(Beller et al., 2018). Presently, a major limitation of many Clearly, our discipline being so diverse would not be well-
tools, especially those aiming at automating multiple steps served if we begin to straitjacket ideas through umbrella
of the review process, is that they do not effectively inte- standardizations. However, some local cases of standardi-
grate with preexisting components. Thus, while such tools zation (e.g., use common evaluation metrics, such as F-
are accessible, they may not be as powerful, as they restrict scores or AUCs in ML studies) may become beneficial.
researchers in incorporating the most effective tools for Indeed, many AILRs rely on limited information, such
the task. Consistent with previous calls (Al-Zubidy et al., as paper abstracts only (cf., Sidorova et al., 2008), and
2017; Germonprez et al., 2007; O’Connor et al., 2018), we difficulties in separating relevant elements and sections of
encourage more research on making AILR tools more the paper may introduce noise in the analysis. Identifying
flexible, modular, and tailorable (which should also con- similar entities within papers remains a challenge due to
tribute to a greater AILR transparency). Promising packages lack of agreed upon conventions for describing and pre-
and ongoing projects in this area can be found in the Ev- senting, for example, theoretical constructs or measurement
idence Synthesis Hackathon Series, which has been initiated items (Endicott et al., 2017). To continue building a cu-
recently.5 mulative research body of knowledge, the IS discipline
Wagner et al. 221
could consider adopting common naming conventions and they prepare data for future problems, needs, or changes. As
domain ontologies. For instance, this could mean avoiding part of this role, IS researchers will need to anticipate
giving the same construct different names (see Larsen and concerns that go beyond a single research project to gen-
Bong, 2016) or consistent naming of standard sections of erate approaches and infrastructures that build capacity for,
research papers (e.g., “methods”and “results”), as long as it and facilitate work across, multiple research settings and
does not detract from the ability to present the results in a projects (e.g., to address problems such as data sharing).” (p.
unique manner. A. The rationale is that even technically 1266). While community-level norms for sharing may take
perfect tools (like researchers) would struggle to extract and time to emerge (Burton-Jones et al., 2021), authors could
interpret information from sources which use ambiguous, lead this effort by voluntarily sharing those components of
confusing language, and presentation. Standardization, their own work they deem appropriate. In doing so, authors
however, brings its own challenges and invites additional would make their papers more accessible to the AILR tools
considerations, especially in disciplines of great diversity, of the future and hence increase the exposure and potential
such as IS. Rather than endorsing the need to standardize, impact of their work.
we call on the community to engage in the debates about its
merits and limitations. In particular, we suggest considering
which areas of IS and type of papers are most amenable to
Concluding remarks
standardization, where standardization may bring benefits, Scholars in many scientific disciplines share excitement
while constantly remaining cognizant of the negative effects of about the opportunities of leveraging AI in support of
standardization, some of which may not be easy to anticipate various research tasks. In this essay, we explored how lit-
at the onset. It is important to ensure that in pursuing the goal erature reviews can benefit from AI-support, summarizing
of integrated science, we do not alter the spirit of those the current state-of-research and sketching opportunities for
contributions, which purposefully strive for multiple, nuanced, future research, design, and action. While some trends target
and at times, contradictory perspectives and interpretations (partial) automation of repetitive tasks, others are more
(Avison and Malaurent, 2014; Klein and Myers, 1999). ambitious, advancing the use of AI in the analysis and
interpretation steps. Not unexpectedly, such visionary ap-
Debate on sharing complementary research outputs. For proaches are met with excitement from some and reser-
AILRs to go beyond text of papers and analyze other re- vations from others. In this opinionated discourse, we
search outputs (e.g., IT artifacts, empirical data, and ML emphasize that AILRs are not an end in itself but a means to
models), we need to develop stronger sharing tradition in the end of making a strong contribution to knowledge and
the IS discipline. Corresponding calls to improve data and theory development. We expect top IS journals to continue
IT artifact sharing practices within the IS discipline are their tradition of championing papers that thoughtfully
mounting (Lukyanenko and Parsons, 2020; Maass et al., integrate previous research streams, develop new theories,
2018), culminating with the recent MISQ Editorial on or elaborate on existing ones (Leidner, 2018; Rivard, 2014;
transparency, where sharing of research components is a Webster and Watson, 2002). While AI can certainly auto-
major recommendation (Burton-Jones et al., 2021). Sharing mate repetitive tasks and support others, there is no doubt
components of research is not without challenges, such as that these contributions require human interpretation and
protection of privacy or intellectual property rights of insightful syntheses, as well as novel explanation and theory
software code or adding to an already long list of things to building. Having surveyed a range of promising examples
do for scientists. Furthermore, as the IS discipline is in- of AI-based tools for literature reviews, we recognize that
vestigating properties of IT artifacts (in addition to human much remains to be done to support the more repetitive tasks
behavior), it is not unreasonable to foresee research which and to facilitate insightful contributions. We therefore
uses advanced computational techniques (such as computer propose a multi-level agenda for AILR-centric research,
vision) to mine properties of IT artifacts and use those as design, and action. Our main ambition is to foster a vibrant
units of LR analysis. Motivated by its potential benefits to and constructive AILR tradition in IS, which offers exciting
AILRs, we call on the community to investigate technical opportunities for the entire research community, including
approaches, requisite infrastructure (e.g., at the conference authors and reviewers, as well as external stakeholders from
and journal levels), and community practices (e.g., during other disciplines and the industry. Especially for design
review stage) for data, model, and artifact sharing. This science researchers, there is significant potential for ad-
includes making the data open and publicly accessible, vancing AI-based tools and methods beyond the IS disci-
complete with appropriate meta-data to facilitate identifi- pline. We hope that our vision encourages scholars to
cation of the data semantics. Our call joins a chorus of engage in debates and reflections on how AI can be lev-
suggestions made by other researchers. Thus, Maass et al. eraged for the progress of research in IS and its neighboring
(2018) impel researchers to “play a proactive role in which disciplines.
Declaration of conflicting interests Avison D and Malaurent J (2014) Is theory king?: questioning the
theory fetish in Information Systems. Journal of Information
The author(s) declared no potential conflicts of interest with re-
Technology 29(4): 327–336.
spect to the research, authorship, and/or publication of this article.
Balducci B and Marinova D (2018) Unstructured data in mar-
keting. Journal of the Academy of Marketing Science 46(4):
Funding
557–590.
The author(s) received no financial support for the research, au- Bandara W, Furtmueller E, Gorbacheva E, et al. (2015) Achieving
thorship, and/or publication of this article. rigour in literature reviews: insights from qualitative data
analysis and tool-support. Communications of the Association
ORCID iD for Information Systems 34(8): 154–204.
Guy Paré  https://orcid.org/0000-0001-7425-1994 Batini C, Rula A, Scannapieco M, et al. (2015) From data quality to
big data quality. Journal of Database Management 26(1):
Notes 60–82.
Bax L, Yu L-M, Ikeda N, et al. (2007) A systematic comparison of
1. We use the term “tool” in a broad sense, covering applications
software dedicated to meta-analysis of causal studies. BMC
offering graphical user interfaces (GUI), statistical packages, as
Medical Research Methodology 7(1): 1–9.
well as programming libraries, for example.
Beller E, Clark J, Tsafnat G, et al. (2018) Making progress with the
2. For the sake of completeness, we recognize that some tools
automation of systematic reviews: principles of the Interna-
support multiple steps of the review process (e.g., Covidence,
tional Collaboration for the Automation of Systematic Reviews
Rayyan QCRI, Parsifal, SRDB.PRO, and SESRA). These tools
(ICASR). Systematic Reviews 7(1): 77. BioMed Central: 1–7.
tend to focus on data, workflow, and collaboration management
Berente N, Seidel S and Safadi H (2019) Research commentary-
functionality without necessarily drawing on AI capabilities. In
data-driven computationally intensive theory development.
this commentary, we focus on tools supporting individual steps
Information Systems Research 30(1): 50–64.
because they tend to be more amenable to code inspection and
Boell SK and Wang B (2019) wwwlitbaskets.io, an IT artifact
extension (i.e., published under open source, non-commercial
supporting exploratory literature searches for Information
licenses), as well as independent validation.
Systems research. In: Proceedings of the Pacific Asia con-
3. https://crowd.cochrane.org/
ference on information systems (eds KK Wei, WW Huang, JK
4. https://is.theorizeit.org/wiki/Main_Page
Lee, et al.), X’ian, China, 8–12 July 2019.
5. https://www.eshackathon.org/
Burton-Jones A, Boh WF, Oborn E, et al. (2021) Editor’s Com-
ments: advancing research transparency at MIS quarterly: a
References pluralistic approach. MIS Quarterly 45(2): iii–xviii.
Adams CE, Polzmacher S, and Wolff A (2013) Systematic re- Carver JC, Hassler E, Hernandes E, et al. (2013) Identifying
views: work that needs to be done and not to be done. Journal barriers to the systematic literature review process. In: In-
of Evidence-Based Medicine 6(4): 232–235. ternational symposium on empirical software engineering and
Alter S (2005) Architecture of Sysperanto: a model-based ontology measurement, Baltimore, MD, USA, 10–11 October 2013.
of the is field. Communications of the Association for In- Castellanos A, Castillo A, Tremblay MC, et al. (2021) Improving
formation Systems 15(1): 1–40. machine learning performance using conceptual modeling.
Alvesson M and Sandberg J (2011) Generating research questions In: AAAI spring symposium: Combining machine learning
through problematization. Academy of Management Review with knowledge engineering, 2021.
36(2): 247–271. Castelvecchi D (2016) Can we open the black box of AI? Nature
Al-Zubidy A, Carver JC, Hale DP, et al. (2017) Vision for SLR 538(7623): 20–23.
tooling infrastructure: prioritizing value-added requirements. Choudhury P, Allen R and Endres M (2018) Developing theory
Information and Software Technology 91: 72–81. using machine learning methods. Available at: http://ssrn.
Anderson MR, Antenucci D, Bittorf V, et al. (2013) Brainwash: a com/abstract=3251077 http://ssrn.com/abstract=3251077.
data system for feature engineering. In: Proceedings of the Cooper HM (1988) Organizing knowledge syntheses: a taxonomy
biennial conference on innovative data systems research, of literature reviews. Knowledge in Society 1(1): 104–126.
2013, Asilomar, CA, 2013. Cram WA, Templier M, Templier M, et al. (2020) (Re)considering
Antons D and Breidbach CF (2017) Big data, big insights? Ad- the concept of literature review reproducibility. Journal of the
vancing service innovation and design with machine learning. Association for Information Systems 21(5): 1103–1114.
Journal of Service Research 21(1): 17–39. Dalgali A and Crowston K (2019) Sharing open deep learning
Antons D, Breidbach CF, Joshi AM, et al. (2021) Computational models. In: Proceedings of the hawaii international conference
literature reviews: method, algorithms, and roadmap. Orga- on system sciences, Grand Wailea, Hawaii, USA, January
nizational Research Methods: 1094428121991230. 8–11, 2019.
Wagner et al. 223
Dang Y, Zhang Y, Chen H, et al. (2009) Arizona literature mapper: Holzinger A (2016) Interactive machine learning for health in-
an integrated approach to monitor and analyze global bio- formatics: when do we need the human-in-the-loop?. Brain
terrorism research literature. Journal of the American Society Informatics 3(2): 119–131.
for Information Science and Technology 60(7): 1466–1485. Hossain MS, Gresock J, Edmonds Y, et al. (2012) Connecting the
Dörpinghaus J and Stefan A (2019) Knowledge extraction and ap- dots between PubMed abstracts. PLoS One 7(1): 1–23.
plications utilizing context data in knowledge graphs. In: Pro- Hunter JE and Schmidt FL (2014) Methods of Meta-Analysis:
ceedings of the federated conference on computer science and Correcting Error and Bias in Research Findings. 2nd edition.
information systems, Leipzig, Germany. September 1-4, 2019, Thousand Oaks, CA: Sage.
pp. 265–272. Hutson M (2018a) Has artificial intelligence become alchemy?
Dubé L and Paré G (2003) Rigor in information systems positivist Science 360(6388): 478–479.
case research: current practices, trends, and recommenda- Hutson M (2018b) Artificial intelligence faces reproducibility
tions. MIS Quarterly 27(4): 597–636. crisis. Science 359(6377): 725–726.
Duboue P (2020) The Art of Feature Engineering: Essentials for Jalali S and Wohlin C (2012) Systematic literature studies: da-
Machine Learning. Cambridge, UK: Cambridge University Press. tabase searches vs. backward snowballing. In: Proceedings
Eisenstein J (2019) Introduction to Natural Language Processing. of the ACM-IEEE international symposium on empiri-
Cambridge, MA: MIT press. cal software engineering and measurement, Lund, Sweden,
Endicott J, Larsen K, Lukyanenko R, et al. (2017) Integrating scientific September 20-21, 2012, pp. 29–38.
research: Theory and design of discovering similar constructs. In: Johnson CD, Bauer BC and Niederman F (2019) The Automation
AIS SIGSAND Symposium, Cincinnati, Ohio, 2017, pp. 1–7. of Management and Business Science. Academy of Man-
Evans JA and Foster JG (2011) Metaknowledge. Science 331(11): agement Perspectives 35(2): 292–309.
721–725. Jones M (2018) How do we address the reproducibility crisis in
Germonprez M, Hovorka D, Hovorka D, et al. (2007) A theory of artificial intelligence? Forbes. Available at: https://www.
tailorable technology design. Journal of the Association for forbes.com/sites/forbestechcouncil/2018/10/26/how-do-we-
Information Systems 8(6): 351–367. address-the-reproducibility-crisis-in-artificial-intelligence/
Gleasure R. Conceptual design science research? How and why un- #54236c5b7688.
tested meta-artifacts have a place in IS. In: Proceedings of the Jonnalagadda SR, Goyal P, and Huffman MD (2015) Automating
international conference on design science research in information data extraction in systematic reviews: a systematic review.
systems and technology. Miami, FL, USA, 2014, pp. 99–114. Systematic Reviews 4(1): 78.
Goodfellow I, Bengio Y and Courville A (2016) Deep Learning, Kallinikos J and Tempini N (2014) Patient data as medical
Adaptive Computation and Machine Learning Series. facts: social media practices as a foundation for medical
Cambridge, MA: MIT Press. knowledge creation. Information Systems Research 25(4):
Gunning D and Aha D (2019) DARPA’s Explainable Artificial 817–833.
Intelligence (XAI) Program. AI Magazine 40(2): 44–58. Kenett RS and Shmueli G (2016) Information Quality: The Po-
Hake P, Fettke P, Neumann G, et al. Extracting business objects tential of Data and Analytics to Generate Knowledge.
and activities from labels of German process models. In: Hoboken, NJ: John Wiley & Sons.
Proceedings of the international conference on design science King RD, Rowland J, Oliver SG, et al. (2009) The automation of
research in information system and technology, Karlsruhe, science. Science 324(5923): 85–89.
Germany, May 30 - June 1, 2017, pp. 21–38. Kitchenham BA and Charters S (2007) Guidelines for performing
Harrison H, Griffin SJ, Kuhn I, et al. (2020) Software tools to systematic literature reviews in software engineering. EBSE
support title and abstract screening for systematic reviews in Technical Report.
healthcare: an evaluation. BMC Medical Research Method- Klein HK and Myers MD (1999) A set of principles for conducting
ology 20(7): 1–12. and evaluating interpretive field studies in information sys-
Hartling L, Ospina M, Liang Y, et al. (2009) Risk of bias versus tems. MIS Quarterly 23(1): 67–94.
quality assessment of randomised controlled trials: cross Knight W (2017) Darpa Is Funding Projects that Will Try to Open
sectional study. British Medical Journal 339(1): 1–6. up AI’s Black Boxes. MIT Technology Review. Available at:
Hassan NR, Prester J and Wagner G (2020) Seeking out clear and https://www.technologyreview.com/2017/04/13/152590/the-
unique Information Systems Concepts: a natural language financial-world-wants-to-open-ais-black-boxes/. Accessed
processing approach. In: Proceedings of the European con- on Sept 25, 2020.
ference on information systems (eds MLF Rowe and R Kobayashi VB, Mol ST, Berkers HA, et al. (2017) Text mining in
El Amrani), Marrakech, Morocco, 15–17 June 2020. organizational research. Organizational Research Methods
Higgins JPT and Green S (2008) Cochrane Handbook for Sys- 21(3): 733–765.
tematic Reviews of Interventions. Chichester, UK: John Wiley Kohl C, Mcintosh EJ, Unger S, et al. (2018) Online tools supporting
& Sons, Ltd. the conduct and reporting of systematic reviews and systematic
maps: a case study on cadima and review of existing tools. challenges for Information Systems research. Journal of the
Environmental Evidence 7(8): 1–17. Association for Information Systems 19(12): 1253–1273.
Chandra Kruse L, Seidel S and vom Brocke J (2019) Design ar- Manning CD and Schütze H (1999) Foundations of Statistical
chaeology: Generating design knowledge from real-world artifact Natural Language Processing. Cambridge, MA: MIT Press.
design. In: Proceedings of the International Conference on Design March ST and Allen GN (2014) Toward a social ontology for
Science Research in Information Systems and Technology, conceptual modeling. Communications of the Association for
Worcester, MA, USA, June 4-6, 2019, pp. 32–45. Information Systems 34(70): 1347–1359.
Kunc M, Mortenson MJ and Vidgen R (2018) A computational Marshall IJ and Wallace BC (2019) Toward systematic review
literature review of the field of system dynamics from 1974 to automation: a practical guide to using machine learning tools
2017. Journal of Simulation 12(2): 115–127. in research synthesis. Systematic Reviews 8(1): 163.
Larsen KR, Bong CH and Bong CH (2016) A tool for addressing Marshall IJ, Kuiper J and Wallace BC (2015) RobotReviewer:
construct identity in literature reviews and meta-analyses. evaluation of a system for automatically assessing bias in
MIS Quarterly 40(3): 529–551. clinical trials. Journal of the American Medical Informatics
Larsen K, Hovorka D, Dennis AR, et al. (2019) Understanding the Association 23(1): 193–201.
elephant: the discourse approach to boundary identification and Mitchell TM (1997) Machine learning. Burr Ridge, IL: McGraw-
corpus construction for theory review articles. Journal of the Hill, 45, pp. 870–877.
Association for Information Systems 20(7): 887–928. Mortenson MJ and Vidgen R (2016) A computational literature
Larsen K, Lukyanenko R, Muller R, et al. (2020) Validity in review of the technology acceptance model. International
design science research. In: Proceedings of the International Journal of Information Management 36(6): 1248–1259.
Conference on Design Science Research in Information Müller-Bloch C and Kranz J (2015) A framework for rigorously
Systems and Technology, Kristiansand, Norway, December identifying research gaps in qualitative literature reviews. In:
2-4, 2020. Proceedings of the international conference on information
Laurell C, Sandström C, Berthold A, et al. (2019) Exploring systems (eds T Carte, A Heinzl and C Urquhart), Fort Worth,
barriers to adoption of virtual reality through social media Texas, USA, 16 December 2015.
analytics and machine learning–an assessment of technology, Nakagawa S, Samarasinghe G, Haddaway NR, et al. (2019) Re-
network, price and trialability. Journal of Business Research search weaving: visualizing the future of research synthesis.
100: 469–474. Trends in Ecology & Evolution 34(3): 224–238.
Leidner DE (2018) Review and theory symbiosis: an introspective Nelson LK (2020) Computational grounded theory: a methodo-
retrospective. Journal of the Association for Information logical framework. Sociological Methods & Research 49(1):
Systems 19(6): 552–567. 3–42.
Li J, Larsen K, Abbasi A, et al. (2020) TheoryOn: a design O’Connor AM, Tsafnat G, Gilbert SB, et al. (2018) Moving toward
framework and system for unlocking behavioral knowledge the automation of the systematic review process: a summary
through ontology learning. MIS Quarterly 44(4): 1733–1772. of discussions at the second meeting of the International
Lindberg A (2020) Developing theory through integrating human Collaboration for the Automation of Systematic Reviews
and machine pattern recognition. Journal of the Association (ICASR). Systematic Reviews 7(3): 1–5.
for Information Systems 21(1): 90–116. O’Mara-Eves A, Thomas J, McNaught J, et al. (2015) Using text
Lindzey G, Gilbert D, and Fiske ST (1998) The Handbook of mining for study identification in systematic reviews: a sys-
Social Psychology. New York: Oxford University Press. tematic review of current approaches. Systematic Reviews 4.
Lukyanenko R (2020) A journey to BSO: evaluating earlier and Papaioannou D, Sutton A, Carroll C, et al. (2009) Literature
more recent ideas of Mario Bunge as a foundation for in- searching for social science systematic reviews: consideration
formation and software development. In: Exploring Modeling of a range of search techniques. Health Information & Li-
Methods for Systems Analysis and Development, Grenoble, braries Journal 27(2): 114–122.
France, 2020, pp. 1–15. Paré G, Cameron A-F, Poba-Nzaou P, et al. (2013) A systematic
Lukyanenko R, Parsons J and Parsons J (2020) Research assessment of rigor in information systems ranking-type
perspectives: design theory indeterminacy: what Is it, how Delphi studies. Information & Management 50(5): 207–217.
can it be reduced, and why did the polar bear drown? Paré G, Tate M, Johnstone D, et al. (2016) Contextualizing the twin
Journal of the Association for Information Systems 21(5): concepts of systematicity and transparency in information
1343–1369. systems literature reviews. European Journal of Information
Lukyanenko R, Parsons J, Wiersma YF, et al. (2019) Expecting the Systems 25(6): 493–508.
unexpected: effects of data collection design choices on the Pinsonneault A and Kraemer K (1993) Survey research method-
quality of crowdsourced user-generated content. MIS Quar- ology in management information systems: an assessment.
terly 43(2): 623–648. Journal of Management Information Systems 10(2): 75–105.
Maass W, Parsons J, Purao S, et al. (2018) Data-driven meets Prester J, Wagner G, Schryen G, et al. (2020) Classifying the
theory-driven research in the era of big data: opportunities and ideational impact of Information Systems review articles: a
Wagner et al. 225
content-enriched deep learning approach. Decision Support Taulli T and Oni M (2019) Artificial Intelligence Basics. 1st
Systems 140: 113432. edition. Berkeley, CA: Apress.
Raganato A, Bovi CD and Navigli R (2017) Neural sequence learning Templier M and Paré G (2018) Transparency in literature reviews:
models for word sense disambiguation. In: Proceedings of the an assessment of reporting practices across review types and
conference on empirical methods in natural language processing, genres in top IS journals. European Journal of Information
Copenhagen, Denmark, September 7-11, 2017, pp. 1156–1167. Systems 27(5): 503–550.
Rai A (2016) Editor’s comments: synergies between big data and Thilakaratne M, Falkner K and Atapattu T (2019) A systematic
theory. MIS Quarterly 40(2): iii–ix. review on literature-based discovery: general overview,
Raisch S and Krakowski S (2020) Artificial intelligence and methodology, & statistical analysis. ACM Computing Surveys
management: the automation-augmentation paradox. Acad- 52(6): 1–34.
emy of Management Review 46(1): 192–210. Thomas J, Noel-Storr A, Marshall I, et al. (2017) Living systematic
Rivard S (2014) Editor’s comments: the ions of theory con- reviews: 2. Combining human and machine effort. Journal of
struction. MIS Quarterly 38(2): iii–xiv. Clinical Epidemiology 91: 31–37.
Russell SJ and Norvig P (2016) Artificial Intelligence: A Modern Tkaczyk D, Collins A, Sheridan P, et al. Machine learning vs.
Approach. Malaysia: Pearson Education Limited. Rules and out-of-the-box vs. retrained. In: Proceedings of the
Russell-Rose T and Shokraneh F (2019) 2Dsearch: Facilitating ACM/IEEE on Joint Conference on Digital Libraries, Fort
reproducible and valid searching in evidence synthesis. BMJ Worth, Texas, USA, June 3-7, 2018.
Evidence-Based Medicine 24(Suppl 1): A36. Tremblay MC, van der Meer D and Beck R (2018) The effects of
Schmiedel T, Müller O and vom Brocke J (2019) Topic modeling the quantification of faculty productivity: perspectives from
as a strategy of inquiry in organizational research: A tutorial the design science research community. Communications
with an application example on organizational culture. Or- of the Association for Information Systems 43: 625–661.
ganizational Research Methods 22(4): 941–968. Tsafnat G, Glasziou P, Choong MK, et al. (2014) Systematic re-
Schryen G, Wagner G, Benlian A, et al. (2020) A knowledge view automation technologies. Systematic Reviews 3: 1–15
development perspective on literature reviews: validation of a van de Schoot R, de Bruin J, Schram R, et al. (2021) An open
new typology in the IS field. Communications of the Asso- source machine learning framework for efficient and trans-
ciation for Information Systems 46: 134–168. parent systematic reviews. Nature Machine Intelligence 3,
Seeber I, Bittner E, Briggs RO, et al. (2020) Machines as team- 125–133.
mates: a research agenda on AI in team collaboration. In- van Dinter R, Tekinerdogan B and Catal C (2021) Automation of
formation & Management 57(2): 1–22. systematic literature reviews: a systematic literature review.
Shamliyan T, Kane RL, Dickinson S, et al. (2010) A systematic Information and Software Technology 136: 106589.
review of tools used to assess the quality of observational studies vom Brocke J, Simons A, Riemer K, et al. (2015) Standing on the
that examine incidence or prevalence and risk factors for dis- shoulders of giants: challenges and recommendations of
eases. Journal of Clinical Epidemiology 63(10): 1061–1070. literature search in information systems research. Commu-
Sidorova A, Evangelopoulos N, Valacich JS, et al. (2008) Un- nications of the Association for Information Systems 37(9):
covering the intellectual core of the information systems 205–224.
discipline. MIS Quarterly 32(3): 467–482. Wahyudi A, Kuk G and Janssen M (2018) A process pattern model
Small HG (1978) Cited documents as concept symbols. Social for tackling and improving big data quality. Information
Studies of Science 8(3): 327–340. Systems Frontiers 20(3): 457–469.
Stanovsky G, Eckle-Kohler J, Puzikov Y, et al. (2017) Integrating Wallace BC, Small K, Brodley CE, et al. Deploying an interactive
deep linguistic features in factuality prediction over unified machine learning system in an evidence-based practice
datasets. In: Proceedings of the Annual Meeting of the As- center: Abstrackr. In: Proceedings of the ACM SIGHIT In-
sociation for Computational Linguistics. Vancouver, Canada, ternational Health Informatics Symposium, Miami, Florida,
July 30 - August 4, 2017, pp. 352–357. USA, January 28-30, 2012, pp. 819–824.
Sturm B and Sunyaev A (2018) Design principles for systematic Wand Y and Weber R (1988) An ontological analysis of some
search systems: a holistic synthesis of a rigorous multi-cycle fundamental information systems concepts. Proceedings
design science research journey. Business & Information of the International Conference on Information Systems:
Systems Engineering 61(1): 91–111. 213–226.
Swanson DR and Smalheiser NR (1997) An interactive system for Wang Z, Nayfeh T, Tetzlaff J, et al. (2020) Error rates of human
finding complementary literatures: a stimulus to scientific reviewers during abstract screening in systematic reviews.
discovery. Artificial Intelligence 91(2): 183–203. PLoS One 15(1): e0227742.
Tate WL, Ellram LM and Kirchoff JF (2010) Corporate social van Zoonen W and van der Meer TGLA (2016) Social media
responsibility reports: a thematic analysis related to supply research: the application of supervised machine learning in
chain management. Journal of Supply Chain Management organizational communication research. Computers in Hu-
46(1): 19–44. man Behavior 63: 132–141.
Webster J and Watson RT (2002) Analyzing the past to prepare for Canada. In his research, Roman investigates and develops
the future: writing a literature review. MIS Quarterly 26(2): innovative information technology solutions to support
xiii–xxiii. management of natural resources, development of smart
Whetten DA (1989) What constitutes a theoretical contribution? cities and decision making in healthcare systems. Roman
Academy of Management Review 14(4): 490–495. published over 120 scientific conference and journal ar-
Xu J, Benbasat I, Benbasat I, et al. (2013) Integrating service ticles, including in leading scientific journals, such as
quality with system and information quality: an empirical test Nature, MIS Quarterly, Information Systems Research,
in the e-service context. MIS Quarterly 37(3): 777–794. and Conservation Biology. Roman’s recent book, “A
Message Almost Lost”, offers a new perspective on hu-
Author Biographies manity’s existential issues. Roman is the current president
Gerit Wagner is a postdoctoral fellow at HEC Montréal. His of AIS SIGSAND and a co-developer of https://sigsand.
research focuses on literature reviews, the impact of recom/.
search methods, digital health, and digital platforms for
Guy Paré is Professor of Information Technology and holds
knowledge work. His research has been published in in-
the Research Chair in Digital Health at HEC Montréal. His
ternational journals, including the Journal of Strategic In-
current research interests involve the barriers to adoption,
formation Systems, Journal of Medical Internet Research,
effective use, and impacts of e-health technologies as well as
Information & Management, and Decision Support Sys-
literature review approaches and methods. His publications
tems. He is a member of the Association for Information
have appeared in top-ranked journals including MIS Quar-
Systems and he regularly serves as a reviewer for Infor-
terly, Journal of Information Technology, European Journal
mation Systems journals and conferences.
of Information Systems, Journal of the Association for In-
Roman Lukyanenko is an associate professor of informa- formation Systems, Information & Management, Journal of
tion systems at HEC Montreal, Canada. He obtained his the American Medical Informatics Association, and Journal
PhD from Memorial University of Newfoundland, of Medical Internet Research.

Arti Ficial Intelligence and The Conduct of Literature Reviews

Uploaded by

Copyright:

Available Formats

Arti Ficial Intelligence and The Conduct of Literature Reviews

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Arti Ficial Intelligence and The Conduct of Literature Reviews

Uploaded by

Copyright:

Available Formats

Debates and Perspectives Paper

Journal of Information Technology

Gerit Wagner, Roman Lukyanenko and Guy Paré 

Table 1. AI-based tools for steps of the review process.

Step AI-based tools Potential for AI-support

Figure 1. A research agenda for AILR-centric research, design, and action.

You might also like