Nothing Special   »   [go: up one dir, main page]

Artificial Intelligence For Caries Detection: Value of Data and Information

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/362903948

Artificial Intelligence for Caries Detection: Value of Data and Information

Article  in  Journal of Dental Research · August 2022


DOI: 10.1177/00220345221113756

CITATIONS READS

0 27

6 authors, including:

Falk Schwendicke Hendrik Meyer-Lueckel


Charité Universitätsmedizin Berlin Universität Bern
438 PUBLICATIONS   31,710 CITATIONS    219 PUBLICATIONS   6,254 CITATIONS   

SEE PROFILE SEE PROFILE

Joachim Krois
Charité Universitätsmedizin Berlin
98 PUBLICATIONS   1,987 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Dental filling materials for managing carious lesions in the primary dentition (review) View project

Ambassadors of CED-IADR Continental European Division of the International Association of Dental Research View project

All content following this page was uploaded by Falk Schwendicke on 30 August 2022.

The user has requested enhancement of the downloaded file.


1113756
research-article2022
JDRXXX10.1177/00220345221113756Journal of Dental Research</italic> X(X)Value of Data for AI

Research Reports: Clinical


Journal of Dental Research
1­–7
Artificial Intelligence for Caries © International Association for Dental
Research and American Association for Dental,

Detection: Value of Data Oral, and Craniofacial Research 2022

and Information https://doi.org/10.1177/00220345221113756


Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/00220345221113756
journals.sagepub.com/home/jdr

F. Schwendicke1 , J. Cejudo Grano de Oro1, A. Garcia Cantu1,


H. Meyer-Lueckel2, A. Chaurasia3, and J. Krois1

Abstract
If increasing practitioners’ diagnostic accuracy, medical artificial intelligence (AI) may lead to better treatment decisions at lower costs,
while uncertainty remains around the resulting cost-effectiveness. In the present study, we assessed how enlarging the data set used for
training an AI for caries detection on bitewings affects cost-effectiveness and also determined the value of information by reducing the
uncertainty around other input parameters (namely, the costs of AI and the population’s caries risk profile). We employed a convolutional
neural network and trained it on 10%, 25%, 50%, or 100% of a labeled data set containing 29,011 teeth without and 19,760 teeth with
caries lesions stemming from bitewing radiographs. We employed an established health economic modeling and analytical framework to
quantify cost-effectiveness and value of information. We adopted a mixed public–private payer perspective in German health care; the
health outcome was tooth retention years. A Markov model, allowing to follow posterior teeth over the lifetime of an initially 12-y-old
individual, and Monte Carlo microsimulations were employed. With an increasing amount of data used to train the AI sensitivity and
specificity increased nonlinearly, increasing the data set from 10% to 25% had the largest impact on accuracy and, consequently, cost-
effectiveness. In the base-case scenario, AI was more effective (tooth retention for a mean [2.5%–97.5%] 62.8 [59.2–65.5] y) and less
costly (378 [284–499] euros) than dentists without AI (60.4 [55.8–64.4] y; 419 [270–593] euros), with considerable uncertainty. The
economic value of reducing the uncertainty around AI’s accuracy or costs was limited, while information on the population’s risk profile
was more relevant. When developing dental AI, informed choices about the data set size may be recommended, and research toward
individualized application of AI for caries detection seems warranted to optimize cost-effectiveness.

Keywords: AI, caries detection/diagnosis/prevention, computer simulation, dental informatics, economic evaluation, radiology

Introduction et al. 2020). However, we also demonstrated the uncertainty


involved in this cost-effectiveness.
Artificial intelligence (AI) and applications from the subfield Quantifying uncertainty is relevant for decision makers: cli-
of deep learning have rapidly entered the medical arena. nicians want to know the risks of falsely relying on one instead
Especially image analysis using convolutional neural networks of the other possible treatment option, in our case, using AI
(CNNs) was shown to have the potential for increasing practi- versus not using AI for caries detection. Health care payers
tioners’ reliability and accuracy. CNNs learn the statistical pat- may want to assess the chances of saving money when incen-
terns inherent in imagery by repeatedly digesting pairs of tivizing one option over the other. Researchers desire to under-
images and image labels (e.g., “this image contains a certain stand the required efforts to reduce this uncertainty. Eventually,
pathology”), with labels usually provided by medical experts, all parties are interested in the value of information (VOI).
and are eventually able to assess unseen data (LeCun et al.
2015). For detecting caries lesions, we found a CNN to yield 1
Department of Oral Diagnostics, Digital Health and Health Services
diagnostic accuracies superior to individual dentists in a diag- Research, Charité–Universitätsmedizin Berlin, Berlin, Germany
nostic accuracy study (Cantu et al. 2020) and confirmed this in 2
Department of Restorative, Preventive and Pediatric Dentistry, zmk
a randomized controlled trial (Mertens et al. 2021). bern, University of Bern, Bern, Switzerland
The detection of a pathology like a caries lesion itself does 3
Department of Oral Medicine and Radiology, King George’s Medical
not transport any tangible value to patients or the health care University, Lucknow, India
system. Instead, health benefits (and further costs) emanate A supplemental appendix to this article is available online.
from the subsequent (correctly or incorrectly assigned) treat-
Corresponding Author:
ment. For caries detection on radiographs, a CNN has been
F. Schwendicke, Department of Oral Diagnostics, Digital Health and
found cost-effective in a modeling study, where a Markov Health Services Research, Charité–Universitätsmedizin Berlin, Germany,
model was used to follow detected (or nondetected) and treated Aßmannshauser Str. 4-6, Berlin, 14197, Germany.
(or untreated) lesions over the patients’ lifetime (Schwendicke Email: falk.schwendicke@charite.de
2 Journal of Dental Research 00(0)

VOI analyses quantify the costs of making the wrong deci- revised by a fourth expert. The reference test was constructed
sion (more money spent than required and/or lower health ben- by the union of all labels.
efit than possible) due to uncertainty (Ford et al. 2012), that is,
they translate uncertainty into monetary value and allow to
quantify the value of further information to reduce this uncer- Model Training and Testing
tainty. For AI applications, one primary source of uncertainty To assess the impact of using more training data on perfor-
stems from its diagnostic performance. Further sources are mance and cost-effectiveness, the number of tooth crops
uncertain costs or the risk profile of the specific target popula- employed for training/validation was incrementally increased
tion, for example. from 10% of the total data set to 25%, 50%, and 100%, respec-
Increasing the amount of data an AI model is trained on tively, resulting in 4 different models, whose classification
tends to increase its diagnostic performance, while in parallel, accuracy (true and false positive or negative findings) was
one would expect this to reduce uncertainty around the perfor- employed to inform the health economic model (see below).
mance estimate. Assuming that the performance gains and We performed 5-fold cross-validation where the validation
uncertainty reductions are not linear, the monetary value of data set was a random sample from the training data. We used
increasing the data set would not be linear, too. Moreover, the Resnet-18 architecture pretrained on the ImageNet data set
these effects can be assumed to be different in different risk as a feature extraction module and a classification head with 2
groups or associated with other uncertain parameters (like the output neurons followed by the Softmax function. Further
costs of AI, which may grow if training is more resource details can be found in the Appendix.
intense). We aimed to quantify the value of data used to train an Testing was performed on the overall test data set and on
AI for caries detection on dental radiographs and also to assess subgroups of different lesion depths (E2: lesions into the inner
the VOI of knowing the precise costs of the AI as well as the enamel half, D1: lesions into the outer third of dentin; D2–D3:
target population’s caries risk profile. lesions into the middle or inner third of dentin, assessed by 2
examiners in agreement).
Methods
Setting, Perspective, Population, Horizon
Study Design
We adopted a mixed public–private payer perspective in
In a previous model-based cost-effectiveness evaluation German health care (see Appendix). A population of posterior
(Schwendicke et al. 2020) building on a diagnostic accuracy permanent teeth in initially 12-y-old individuals was modeled
study (Cantu et al. 2020), we showed that using a CNN to (TreeAge Pro 2019 R1.1; TreeAge Software). The initial age
detect caries lesions on bitewing radiographs had a high chance determined the horizon via the remaining lifetime of the indi-
of being cost-effective. In the present study, we trained a CNN vidual (see below). The horizon was not varied across
on a data set of cropped tooth images stemming from 3,826 simulations.
bitewing radiographs and employed this health economic mod- We assumed the teeth’s proximal surfaces to start the simu-
eling framework for the described analyses. Reporting of this lation at a 1) sound, 2) initially carious (E2, D1), or 3) advanced
study follows the Consolidated Health Economic Evaluation carious status (D2–D3); the prevalence for these states had
Reporting Standards (CHEERS) (Husereau et al. 2013). been estimated before (Schwendicke, Paris, and Stolpe 2015;
Schwendicke et al. 2020) and was independent from the preva-
lence of lesions in our image data set used for model develop-
Data Input
ment. Only 1 lesion per tooth was modeled.
To vary the amount of training data and valuate their contribu- Besides the uncertainty stemming from the performance
tion to performance gains and certainty and, indirectly, mone- of the model, the caries risk profile of the population the AI
tary benefit, we used the imagery data set employed in our is applied in has been found to introduce uncertainty
previous diagnostic accuracy study where we had trained a (Schwendicke et al. 2020). We modeled 2 populations: 1 with
CNN for caries detection (Cantu et al. 2020), consisting of a low risk (low prevalence of caries lesions) and 1 with high risk
total of 3,686 and 140 retrospectively collected bitewings, (high prevalence). In the base-case analysis, we did not specify
respectively. For the present study, each image in the training in which of these the AI was employed (i.e., introduced maxi-
data set had been cropped tooth-wise (showing 1 tooth only) by mum uncertainty and quantified the VOI of reducing this
a previously developed deep learning segmentation model, uncertainty). The construction of both cohorts is described in
yielding 29,011 tooth crops without caries lesions and 19,760 the Appendix.
tooth crops with caries lesions, respectively. The size of the
tooth determined the size of the crop. Similarly, the test data set
contained 692 tooth crops without caries lesions and 401 tooth Comparators
crops with caries. Data collection had been ethically approved Similar to the original health economic study (Schwendicke
(ethics committee of Charité Berlin, EA4/080/18). Images had et al. 2020), we compared dentists’ diagnostic accuracy when
been labeled by 3 expert dentists, as well as reviewed and using biannual visual-tactile caries detection plus radiographic
Value of Data for AI 3

caries detection on bitewings every 2 y with that of biannual Gebührenordnung für Zahnärzte (GOZ), and included sub-
visual-tactile caries detection plus CNN-based AI for radio- groups of costs for diagnostics and treatments, as well as costs
graphic caries detection. As both dentists and AI may show covered by insurances or out-of-pocket expenses. Costs of AI
different accuracy depending on the lesion stage, we used were assumed to vary between 4 and 12 euros per application
lesion depth–specific accuracies for E2, D1, and D2–D3 (see Appendix). Costs occurring over the lifetime (i.e., in the
lesions. While visual-tactile means allowed to detect advanced future) were discounted at 3% per annum (IQWiG 2017).
(D2–D3) lesions with some accuracy, leading to restorative
care (Schwendicke, Paris, and Stolpe 2015), initial lesions
(E2–D1) were assumed to be only detectable using radio- Analytical Methods
graphs. The accuracy of dentists was built on a meta-analysis We first performed cost-effectiveness analysis using Monte
(Schwendicke, Tzschoppe, and Paris 2015), while as described, Carlo microsimulations, with 1,000 independent teeth being
the accuracy (and the associated uncertainty) of the AI ema- followed over the mean expected lifetime (which was 66 y)
nated from training on data sets of different size, leading to (statista 2022) in annual cycles. We randomly sampled transi-
different accuracy and uncertainty and, subsequently, treat- tion probabilities from uniform or triangular distributions
ment decisions and costs. (Briggs et al. 2002), as outlined in the Appendix. The probability
that a strategy was acceptable to payers at different willingness-
to-pay ceiling thresholds was also explored. In addition, we
Cost-Effectiveness Model and Assumptions
performed a range of sensitivity analyses.
We used a Markov simulation model, modeling posterior teeth Cost-effectiveness analyses indicate which strategies may
and their proximal surfaces over their lifetime. E2–D1 were be most cost-effective but accept the involved uncertainties.
assumed be detected only radiographically and managed using Reducing these uncertainties could lead to health gains or cost
microinvasive care (caries infiltration) to arrest them. If unde- reductions from improved resource allocation (Claxton 1999).
tected or unarrested, these lesions could progress to D2–D3 VOI allows to assess the foregone benefits and costs emanat-
lesions, which would at some point be restored using a com- ing from imperfect information. The VOI is estimated using
posite restoration. Restorations could fail and be replaced or the net monetary benefit (NMB), calculated as
repaired, and after repeated failure, a crown was to be placed,
which again could fail and be replaced once. In parallel, end- NMB = λ × De – Dc,
odontic complications could occur, which would be treated
using root canal treatment, which was also assumed to fail with with λ denoting the ceiling threshold of willingness to pay, that
some chance and then required nonsurgical and eventually sur- is, the additional costs (c) a decision maker is willing to bear
gical retreatment. If no further treatment options remained, an for gaining an additional unit of effectiveness (e) (Drummond
extraction was assumed, with teeth being replaced with et al. 2005). For our analyses, we assumed the NMB to be λ = 0
implant-supported single crowns. Simulation was performed as there are no agreed-on paying thresholds defined for an
in discrete annual cycles. The transition probabilities between additional year of tooth retention, but also as this threshold
these different health states are shown in Appendix Table 1. seemed justifiable from a payer’s perspective.
The model is summarized in Appendix Figure 1. VOI was then estimated as NMBperfect information –
NMBimperfect information. To estimate how perfect knowledge
would change the NMB, one can identify the strategy with the
Input Variables highest NMB at each simulation and compare the average NMB
Further input variables (see Appendix) were largely built on of these “ideal” strategies with the NMB under imperfect infor-
data from large cohort studies or systematic reviews and had mation. We estimated the VOI of having perfect information on
been validated in various health economic evaluations before all uncertain parameters (expected value of information, EVPI),
(Schwendicke et al. 2013; Schwendicke, Meyer-Lueckel, et al. as well as the VOI for reducing uncertainty in specific parame-
2014; Schwendicke, Paris, and Stolpe 2015; Schwendicke, ters (expected value of partial perfect information, EVPPI),
Stolpe, et al. 2015). namely, the AI’s accuracy (amount of training data), the costs of
AI, and the population’s risk profile (Ford et al. 2012). The
EVPI estimates the value of simultaneously eliminating all
Health Outcomes, Costs, and Discounting uncertainty in an analysis, while the EVPPI can assess which
Our health outcome was the time a tooth was retained (in parameters contribute most to the overall uncertainty.
years), mainly as valid data to assign valuations to other health
states of retained teeth (e.g., nonrestored, filled, crowned tooth)
are not common, while increasing research in this field may Results
allow for more detailed consideration of cost-utility (instead of
Study Parameters and Performance of the CNN
only cost-effectiveness) in the future (Hettiarachchi et al.
2018). Costs were estimated using the German public and pri- The input parameters for our study are shown in Appendix
vate dental fee catalogues, Bewertungsmaßstab (BEMA) and Table 1. With an increasing amount of data used to train the AI,
4 Journal of Dental Research 00(0)

Figure 1.  Mean true-positive (TP) and true-negative (TN) rates (in %,
y-axis) for sound and carious surfaces (lines) and standard deviations
(shaded areas) of artificial intelligence models trained on different
proportions of the overall data set (x-axis).

both sensitivity and specificity increased. Notably, this increase


was not linear; the increase was largest when increasing the
data set from 10% to 25%, and limited afterwards (see Table 2
in Appendix). The resulting percentages of true-negative (for C
sound surfaces) and true-positive (for carious ones) findings
are displayed in Figure 1.

Base-Case Scenario
In the base-case scenario (uncertain accuracy of the AI, uncer-
tain risk profile of the population, uncertain costs of AI), AI
was more effective (tooth retention for a mean [2.5%–97.5%]
62.8 [59.2–65.5] y) and less costly (378 [284–499] euros) than
dentists without AI (60.4 [55.8–64.4] y; 419 [270–593] euros).
Figure 2 shows the cost-effectiveness plane (Fig. 2A), with AI
being more effective and less costly in most simulations. This
was also reflected in the incremental cost-effectiveness plane
(Fig. 2B). The high cost-effectiveness acceptability was found Figure 2.  Cost-effectiveness and acceptability of the base case.
(A) Cost-effectiveness plane. The costs and effectiveness of artificial
regardless of a payer’s willingness to pay exceeding (Fig. 2C). intelligence (AI) versus no AI are plotted for 1,000 sampled individuals
in each group. (B) Incremental cost-effectiveness. Incremental costs and
effectiveness of AI compared with no AI are plotted. Quadrants indicate
Sensitivity Analyses comparative cost-effectiveness (e.g., lower right: lower costs and higher
effectiveness). Inserted cross-tabulation: Percentage of samples lying in
A range of sensitivity analyses was performed (Table 1). In different quadrants. (C) Cost-effectiveness acceptability. We plotted
low-risk populations, the cost-effectiveness of AI was lower the probability of comparators being acceptable in terms of their cost-
effectiveness depending on the willingness-to-pay threshold of a payer.
compared with the base case (and vice versa for high-risk pop-
The range of willingness to pay was expanded from 0 to 100 euros and
ulations). The amount of data used for training showed a rele- did not considerably change beyond this threshold.
vant effect on costs; in low-risk populations, AI was more
effective but also more costly when only 10% or 25% of the
data were used for training, while if more data were used for
Value of Information
training, it was both more effective and less costly. In high-risk
populations, AI was more effective and less costly regardless The EVPI and the EVPPI at different willingness-to-pay
of the amount of data. The impact of varying the costs of AI thresholds of a payer are shown in Figure 3. Both EVPI and
was limited. Discounting at different rates changed the overall EVPPI decreased with increasing willingness to pay. The EVPI
costs but did not change the ranking of strategies. at a threshold of 0 euros was 12.40 euros and decreased to a
Value of Data for AI 5

Table.  Cost-Effectiveness in the Base-Case and Sensitivity Analyses.

Dentists with AI Dentists without AI

Analysis Cost (Euros) Effectiveness (y) Cost (Euros) Effectiveness (y)

Base case (uncertain accuracies, uncertain AI costs, uncertain risk) 378 (284–499) 62.8 (59.2–65.5) 419 (270–593) 60.4 (55.8–64.4)
10% training data, low risk, AI costs 8 euros 379 (309–456) 63.8 (61.5–65.8) 326 (260–392) 62.4 (60.0–64.4)
25% training data, low risk, AI costs 8 euros 333 (261–410) 63.8 (60.9–65.6) 326 (260–392) 62.4 (60.0–64.4)
50% training data, low risk, AI costs 8 euros 332 (261–413) 64.1 (61.7–65.9) 326 (260–392) 62.4 (60.0–64.4)
100% training data, low risk, AI costs 8 euros 323 (250–391) 64.1 (62.1–65.7) 326 (260–392) 62.4 (60.0–64.4)
10% training data, high risk, AI costs 8 euros 451 (370–550) 61.0 (58.1–63.8) 514 (437–609) 57.9 (54.5–60.9)
25% training data, high risk, AI costs 8 euros 425 (353–506) 61.1 (58.4–63.8) 514 (437–609) 57.9 (54.5–60.9)
50% training data, high risk, AI costs 8 euros 404 (329–483) 61.8 (59.1–64.1) 514 (437–609) 57.9 (54.5–60.9)
100% training data, high risk, AI costs 8 euros 392 (318–470) 61.9 (59.7–63.9) 514 (437–609) 57.9 (54.5–60.9)
Low costs for AI (4.00 euros/analysis) 371 (275–488) 62.8 (59.2–65.5) 419 (270–593) 60.1 (55.1–64.2)
High costs for AI (12.00 euros/analysis) 392 (284–492) 62.8 (59.2–65.5) 419 (270–593) 60.1 (55.1–64.2)
Discounting rate 1% 630 (454–856) 62.8 (59.2–65.5) 745 (468–1,050) 60.4 (55.8–64.4)
Discounting rate 5% 260 (195–333) 62.8 (59.2–65.5) 270 (177–373) 60.4 (55.8–64.4)

Mean and 2.5% to 97.5% percentiles are shown. The rationale behind modeling a lower and upper bound of artificial intelligence (AI) costs of 4.00 and
12.00 euros is provided in the Appendix. The range of discounting rates follows recommendations for cost-effectiveness studies in our setting (IQWiG
2017).

lower plateau of 5.60 euros at a higher willingness to pay. The


EVPPI of training the AI with more data (affecting perfor-
mance and uncertainty) was 0.87 euros at a threshold of 0 euros
and flattened out to 0 euros; that of the risk profile (caries prev-
alence) of the population was 6.61 euros at a threshold of 0
euros and also decreased toward 0 euros at higher willingness
to pay. The EVPPI of the costs of AI was 0 euros regardless of
the threshold (and is hence not shown in Fig. 3).

Discussion
While studies on the accuracy of AI applications for medical
purposes are widespread, there are only few health economic
evaluations of medical AI, and most of these suffer from meth- Figure 3.  Value-of-information analysis. The overall expected value
of perfect information (EVPI) and the expected value of partial perfect
odological shortcomings (Wolff et al. 2020). In oral and dental information (EVPPI) for specific parameters were plotted against the
research, a similar increase in studies on AI is notable, while willingness-to-pay threshold of a payer. The EVPI and EVPPI indicate
assessments of the value of AI for dental patients, providers, or the monetary value of being able to make better decisions (avoid more
payers are scarce. Cost-effectiveness models allow to deter- costly or less effective decisions) based on better overall or partial
information. EVPPI was estimated for the available data for training the
mine the potential long-term health effects and resulting costs artificial intelligence (AI) (affecting accuracy and its uncertainty), the
and thereby translate accuracy into tangible value. risk profile (caries prevalence) of the population of interest, and the
The present study assessed the value of enlarging the train- costs of AI (4–12 euros per application). For the latter, the EVPPI was
0 euros regardless of the threshold and hence not shown. The range of
ing data set used for developing an AI and, indirectly, the
willingness to pay was expanded from 0 to 100 euros.
resulting accuracy gains (which may be nonlinear and also dif-
fer for sensitivity and specificity or different lesion stages) and
uncertainty reductions. We further assess the value of knowing researchers should identify data points that contribute to the
the costs of the AI and the population’s risk profile. In high- heterogeneity of the data set and thereby increase accuracy and
risk (high-prevalence) populations, even moderate sensitivity generalizability more efficiently. Moreover, it needs highlight-
gains of AI may lead to considerable cost-effectiveness, while ing that as expected, gains in sensitivity and specificity were
in low-risk populations, false-positive detections (i.e., specific- not identical and were further lesion stage specific, all of which
ity) will be more relevant. had a joint impact on cost-effectiveness.
In the present study, we showed that the benefit of more We further explored the value of reducing these (and other)
training data was not linearly increasing but saturated after uncertainties in our analysis. At 12.40 euros per individual,
limited increases in data and that in certain (high-risk) popula- however, the monetary impact of eliminating all parameter
tions, AI was also cost-effective when only minimal amounts uncertainty (EVPI) was limited compared with the observed
of data were used for training. It is recommendable that instead lifetime costs. Moreover, and against our expectations, we
of increasing data sets on a noninformed (random) basis, showed that the value of knowing what accuracy gains are
6 Journal of Dental Research 00(0)

generated by which training data set size was small and that uncertainty, specifically around the AI’s accuracy or costs, was
uncertainty around the costs of AI was also irrelevant. Instead, limited, though. Instead, the risk profile of the population of
the population’s risk profile and a range of other joint uncer- interest was more important. When developing dental AI,
tainties (which we did not explore in detail) were relevant. informed choices about the data set size may be recommended,
Identifying the economic value of increasing information on and research toward individualized application of AI for caries
specific parameters helps to make informed decisions about detection seems warranted to optimize cost-effectiveness.
research and development: for instance, knowledge on the car-
ies prevalence in a specific patient pool or patients’ risk profile Author Contributions
(e.g., by using caries risk assessment) may support a more tar- F. Schwendicke, contributed to conception and design, data acqui-
geted decision toward using AI or not and thereby optimize the sition, analysis, and interpretation, drafted and critically revised
cost-effectiveness of AI. the manuscript; J. Cejudo Grano de Oro, contributed to design,
This study has a number of strengths and limitations. First, data acquisition, analysis, and interpretation, critically revised the
and as a strength, this is the first study assessing the value of manuscript; A. Garcia Cantu, contributed to data interpretation,
training data for dental AI applications and generally one of critically revised the manuscript; H. Meyer-Lueckel, contributed
few VOI analyses in dentistry. Our study can inform research- to data analysis and interpretation, critically revised manuscript;
ers, funding agencies, and developers of AI toward which A. Chaurasia, contributed to data interpretation, critically revised
uncertainties have more or less impact on health and costs. the manuscript; J. Krois, contributed to conception and design,
Second, the employed Markov model and the analytic frame- data acquisition, analysis, and interpretation, critically revised the
work have been validated before; they allow to extrapolate manuscript. All authors gave final approval and agree to be
accuracy data into long-term health and economic outcomes. accountable for all aspects of the work.
Third, and as a limitation, our analysis was setting specific, and
so will be our results to some degree. Notably, cost estimation Acknowledgments
using German fee items has been found to closely reflect the We thank the dentists for their effort of labeling the image data.
true treatment costs and to yield estimates comparable with
those from other health care settings (Schwendicke, Graetz, Declaration of Conflicting Interests
et al. 2014; Schwendicke et al. 2018). Fourth, construction of the
reference test for training and testing the model was performed The authors declared the following potential conflicts of interest
as described, with the chosen strategy being one (albeit fre- with respect to the research, authorship, and/or publication of this
article: F. Schwendicke and J. Krois are cofounders of a Charité
quently chosen) option among others. Also, we assumed early
startup on dental image analysis, dentalXrai Ltd. The conduct,
lesions to be detected radiographically, not visually, while a
analysis, and interpretation of this study and its findings were
number of studies found visual assessment to have moderate
unrelated to this.
sensitivity for detecting early proximal lesions, too (Foros
et al. 2021). Fifth, the accuracy values assumed in our control
Funding
group (dentists without AI) stemmed from a systematic review
that also confirmed that many of the included diagnostic accu- The authors received no financial support for the research, author-
racy studies suffered from bias and limited applicability. ship, and/or publication of this article.
Notably, we have investigated the impact of different accuracy
values in the control group in a previous cost-effectiveness Data Availability Statement
study (Schwendicke 2020) and did not find the introduced Data used in this study can be made available if needed within data
variances in accuracy to change our conclusion. Moreover, we protection regulation boundaries.
have assessed the cost-effectiveness of AI for this purpose
not only against systematically reviewed and synthesized data
but also recent data from a prospective controlled trial ORCID iDs
(Schwendicke et al. 2022). In the present study, our focus was F. Schwendicke https://orcid.org/0000-0003-1223-1669
not on the comparative cost-effectiveness but the uncertainty J. Krois https://orcid.org/0000-0002-6010-8940
around it. Last, our simulation simplified decision making in
practice; dentists may deviate from AI detections and apply a References
range of therapies beyond those assumed in our study. The lat- Briggs AH, O’Brien BJ, Blackhouse G. 2002. Thinking outside the box: recent
ter point is relevant, as the assigned treatment has been shown advances in the analysis and presentation of uncertainty in cost-effective-
ness studies. Annu Rev Public Health. 23(1):377–401.
to affect cost-effectiveness (Schwendicke et al. 2020). Cantu AG, Gehrung S, Krois J, Chaurasia A, Rossi JG, Gaudin R, Elhennawy
In conclusion, and within the limitations of this study, K, Schwendicke F. 2020. Detecting differently deep caries lesions on
increasing the amount of data for training an AI to detect caries bitewing radiographs using a fully convolutional neural network. J Dent.
100:103425.
lesions on bitewings improved cost-effectiveness. Notably, Claxton K. 1999. The irrelevance of inference: a decision-making approach
limited increases in data led to significant increases in cost- to the stochastic evaluation of health care technologies. J Health Econ.
effectiveness, and enlarging the data set even further was of 18(3):341–364.
Drummond M, Sculpher M, Torrance G, O’Brien B, Stoddart G. 2005. Methods
limited benefit. There was considerable uncertainty around the for economic evaluation of health care programmes. Oxford University
cost-effectiveness. The economic value of reducing this Press.
Value of Data for AI 7

Ford E, Adams J, Graves N. 2012. Development of an economic model to Schwendicke F, Mertens S, Cantu AG, Chaurasia A, Meyer-Lueckel H, Krois
assess the cost-effectiveness of hawthorn extract as an adjunct treatment for J. 2022. Cost-effectiveness of AI for caries detection: randomized trial.
heart failure in Australia. BMJ Open. 2(5):e001094. J Dent. 119:104080.
Foros P, Oikonomou E, Koletsi D, Rahiotis C. 2021. Detection methods for Schwendicke F, Meyer-Lueckel H, Stolpe M, Dörfer CE, Paris S. 2014. Costs
early caries diagnosis: a systematic review and meta-analysis. Caries Res. and effectiveness of treatment alternatives for proximal caries lesions.
55(4):247–259. PLoS One. 9(1):e86992.
Hettiarachchi RM, Kularatna S, Downes MJ, Byrnes J, Kroon J, Lalloo R, Schwendicke F, Paris S, Stolpe M. 2015. Detection and treatment of proxi-
Johnson NW, Scuffham PA. 2018. The cost-effectiveness of oral health mal caries lesions: milieu-specific cost-effectiveness analysis. J Dent.
interventions: a systematic review of cost-utility analyses. Community 43(6):647–655.
Dent Oral Epidemiol. 46(2):118–124. Schwendicke F, Rossi JG, Göstemeyer G, Elhennawy K, Cantu AG, Gaudin
Husereau D, Drummond M, Petrou S, Carswell C, Moher D, Greenberg D, R, Chaurasia A, Gehrung S, Krois J. 2020. Cost-effectiveness of artificial
Augustovski F, Briggs AH, Mauskopf J, Loder E. 2013. Consolidated intelligence for proximal caries detection. J Dent Res. 100(4):369–376.
health economic evaluation reporting standards (CHEERS)—explana- Schwendicke F, Stolpe M, Meyer-Lueckel H, Paris S. 2015. Detecting and
tion and elaboration: a report of the ISPOR health economic evaluation treating occlusal caries lesions: a cost-effectiveness analysis. J Dent Res.
publication guidelines good reporting practices task force. Value Health. 94(2):272–280.
16(2):231–250. Schwendicke F, Stolpe M, Meyer-Lueckel H, Paris S, Dörfer CE. 2013. Cost-
IQWiG. 2017. General Methods (Allgemeine Methoden) [accessed 2022 Jun 29]. effectiveness of one- and two-step incomplete and complete excavations.
https://www.iqwig.de/methoden/allgemeine-methoden_version-5-0.pdf. J Dent Res. 90(10):880–887.
LeCun Y, Bengio Y, Hinton G. 2015. Deep learning. Nature. 521(7553):436–444. Schwendicke F, Tzschoppe M, Paris S. 2015. Radiographic caries detection: a
Mertens S, Krois J, Cantu AG, Arsiwala LT, Schwendicke F. 2021. Artificial systematic review and meta-analysis. J Dent. 43(8):924–933.
intelligence for caries detection: randomized trial. J Dent. 115:103849. statista. 2022. Sterbetafel 2018/2020 [accessed 2022 Jun 28]. https://de.statista
Schwendicke F, Engel AS, Graetz C. 2018. Long-term treatment costs of chronic .com/statistik/daten/studie/1783/umfrage/durchschnittliche-weitere-leb
periodontitis patients in Germany. J Clin Periodontol. 45(9):1069–1077. enserwartung-nach-altersgruppen/.
Schwendicke F, Graetz C, Stolpe M, Dorfer CE. 2014. Retaining or replacing Wolff J, Pauling J, Keck A, Baumbach J. 2020. The economic impact of arti-
molars with furcation involvement: a cost-effectiveness comparison of dif- ficial intelligence in health care: systematic review. J Med Internet Res.
ferent strategies. J Clin Periodontol. 41(11):1090–1097. 22(2):e16866.

View publication stats

You might also like