Prediction of infectious disease epidemics via weighted density ensembles
Fig 4
Permutation test results for pairwise comparisons of the mean log scores for each method.
For each combination of 3 prediction targets, 11 regions, and 5 test phase seasons, we calculated the mean log score for all predictions made by each method in weeks before the event being predicted occurred. Panel A presents the overall mean of these values for each method; higher mean log scores indicate better performance. Panel B displays the difference in mean log scores for each pair of models. Positive values indicate that the model on the vertical axis outperformed the model on the horizontal axis on average. A permutation test was used to obtain approximate p-values for these differences (see S1 Text for details). For reference, a Bonferroni correction at a familywise significance level of 0.05 for all pairwise comparisons leads to a significance cutoff of approximately 0.0014.