Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Comparing the diagnostic performance of methods used in a full-factorial design multi-reader multi-case studies

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In radiology, patients are frequently diagnosed according to the subjective interpretations of radiologists based on an image. Such diagnosis results may be biased and significantly differ among evaluators (i.e., readers) due to different education levels and experiences. One solution to overcome this problem is to use a multi-reader multi-case study design in which there are multiple readers, and the same images are evaluated multiple times. Several methods, including model-based and bootstrap-based, are available for analyzing the multi-reader multi-case studies. In this study, we aimed to compare the performance of available methods on a mammogram dataset. We also conducted a comprehensive simulation study to generalize the results to more general scenarios. We considered the effect of the number of samples and readers, data structures (i.e., correlation structures and variance components), and overall accuracy of diagnostic tests (AUC) in the simulation set-up. Results showed that the model-based methods had type-I error rates close to the nominal level as the number of samples and readers increased. Bootstrap-based methods, on the other hand, were generally conservative. However, they performed the best when the sample size was small, and the AUC level was high. In conclusion, the performance of the proposed methods was not the same under all conditions and was affected by the factors we considered in the simulation study. Therefore, it is not a perfect strategy to use one method under all scenarios because it may lead to biased conclusions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Code availability

All the source codes written in R are publicly available in the GitHub https://github.com/basolmerve/MRMC-Simulation-ArticleSupplementary.git environment.

References

Download references

Acknowledgements

We would like to thank anonymous reviewers for their valuable comments that improved the quality of our manuscript.

Funding

This study was not supported by any institution/organization.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Merve Basol.

Ethics declarations

Conflict of interest

Authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 92 KB)

Appendices

Appendix

Defining data correlation and reader variance

Consider the following full factorial experimental design.

$$\begin{aligned} Y_{ijk} = \mu&+ \tau _i + R_j + C_k + \left( \tau R \right) _{ij} + \left( \tau C \right) _{ik} \nonumber \\&+ \left( RC \right) _{jk} + \left( \tau R C \right) _{ijk} + \varepsilon _{ijk} \nonumber \\&i: 1, 2, \dots , t \quad j: 1, 2, \dots , r \quad k: 1, 2, \dots , n \end{aligned}$$
(9)

From model Eq. 9, the decision variable \(Y_{ijk}\) has variance

$$\begin{aligned} \sigma ^2 = \sigma _{C}^{2} + \sigma _{\tau C}^{2} + \sigma _{RC}^{2} + \sigma _{\varepsilon }^{2} \end{aligned}$$
(10)

for both diseased and non-diseased subjects, i.e., equal-variance components. To create unequal- variance components, Hillis (2012) modified variance components in Eq. 10 by defining

$$\begin{aligned} \sigma _{*(1)}^2 = \dfrac{1}{b^2} \sigma _{*(0)}^2 \end{aligned}$$
(11)

such that \(\sigma _{C(1)}^2 = \frac{1}{b^2} \sigma _{C(0)}^2\) where \(b = \frac{\sigma _{(0)}}{\sigma _{(1)}}\) is sigma ratio for some \(b > 0\). By using variance components (10), we define

$$\begin{aligned} \rho _{WR} = \dfrac{\sigma ^{2}_{C} + \sigma ^{2}_{\tau C} + \sigma ^{2}_{RC}}{\sigma ^{2}_{C} + \sigma ^{2}_{\tau C} + \sigma ^{2}_{RC} + \sigma ^{2}_{\varepsilon }}, \qquad \rho _{BR} = \dfrac{\sigma ^{2}_{C} + \sigma ^{2}_{\tau C}}{\sigma ^{2}_{C} + \sigma ^{2}_{\tau C} + \sigma ^{2}_{RC} + \sigma ^{2}_{\varepsilon }} \end{aligned}$$
(12)

where \(\rho _{WR}\) and \(\rho _{BR}\) are used to define first letter (i.e., data correlation) and \(\sigma _R^2\) or \(\sigma _{\tau R}^2\) are used to define second letter (i.e., reader variance) of data structures given in Supplementary Tables S1 and S2. For example, HH stands for high data correlation and high reader variance. The correlations in Eq. 12 can be estimated using variance components from one of diseased or non-diseased groups. For more details, see Roe and Metz (1997a) and Hillis (2012).

Test statistics for the ANOVA models of the DBM and OR methods

The DBM and OR models are based on the three- and two-way mixed-effect ANOVA models, respectively. The statistical significance of the test statistics for each component in the DBM and OR model are evaluated via an F statistic calculated using the mean squared errors.

The DBM model The F statistic for the significance of the test effect \(\tau\) in the ANOVA model Eq. (3) is obtained using the mean squares (MS) as

$$\begin{aligned} F_{DBM} = \dfrac{MS\left( \tau \right) }{MS\left( \tau R \right) + MS\left( \tau C \right) - MS\left( \tau R C\right) } \end{aligned}$$
(13)

where the denominator degrees of freedom, \(df_2\), is calculated as in Eq. (14) by using the Satterthwaite’s approximation (Satterthwaite 1946).

$$\begin{aligned} df_2 = \dfrac{\left[ MS\left( \tau R \right) + MS\left( \tau C \right) - MS\left( \tau R C\right) \right] ^2}{\dfrac{MS\left( \tau R \right) ^2}{\left( t - 1\right) \left( r - 1\right) } + \dfrac{MS\left( \tau C \right) ^2}{\left( t - 1 \right) \left( n - 1 \right) } + \dfrac{MS\left( \tau R C \right) ^2}{\left( t - 1 \right) \left( r - 1\right) \left( n-1 \right) }} \end{aligned}$$
(14)

The OR model \(\cdot\) The corrected F statistic for the significance of the test effect \(\tau\) in the ANOVA model Eq. (4) is

$$\begin{aligned} F_{OR} = \dfrac{MS\left( \tau \right) }{MS\left( \tau R \right) + r\left( {\widehat{Cov}}_2 - {\widehat{Cov}}_3 \right) } \end{aligned}$$
(15)

with degrees of freedom \(df_1\) and \(df_2\). Here, \(df_1\) equals \((t - 1)\) and \(df_2\) is calculated as in Eq. (16).

$$\begin{aligned} df_2 = \dfrac{\left\{ MS\left( \tau R \right) + r\left( {\widehat{Cov}}_2 - {\widehat{Cov}}_3 \right) \right\} ^2}{\dfrac{MS \left( \tau R \right) ^2}{(t - 1)(r - 1)}} \end{aligned}$$
(16)

where the covariance estimates are calculated from the Eq. (5).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Basol, M., Goksuluk, D. & Karaagaoglu, E. Comparing the diagnostic performance of methods used in a full-factorial design multi-reader multi-case studies. Comput Stat 38, 1537–1553 (2023). https://doi.org/10.1007/s00180-022-01309-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-022-01309-1

Keywords

Navigation