Evaluation and comparison of multi-omics data integration methods for cancer subtyping
Fig 4
Clinical-based performance of Dataset group #1 Nine-cancer Datasets.
The representations of the abbreviations are the same as those in Fig 3. We calculated the -log10(log-rank test p-value) of the subtyping results based on every possible k, combination, and cancer of each method. (A) Clinical-based performance based on the suggested k of methods. The upper plot shows the average ranking of the ability to cluster patients into clinically-significant subtypes of each method. Each data point in the box was calculated as follows. We fixed cancer and combination to rank the -log10(p-value) among all methods, which represented the ability of clustering patients into clinically-significant subtypes of each method using the current combination. Then each method had 11 (combinations) * 9(cancers) rankings which we used to compare these methods. The lower plot shows the cumulative number of significant p-values. We set 1.301 as the threshold which corresponded to 0.05 before the transformation to evaluate whether the current subtyping result had clinical significance and we counted the significant ones. (B) Clinical-based performance based on all the possible k. Two plots had the same meaning as (A) but the ways of calculation were a little different. Each data point in the box of the upper plot was calculated as follows. We fixed cancer, combination, and k to rank the -log10(p-value) among all methods. Therefore, each combination had 7 rankings corresponding to each possible k, and we then calculated the average of these 7 rankings to represent the ability of using the current combination. For the lower plot, we counted the number of significant p-values for each combination among all possible k and cumulated the average of each combination to draw the plot.