Research synthesis, eg, by meta-analysis, is more and more considered in the area of high-dimensional data from molecular research such as gene and protein expression data, especially because most studies and experiments are performed with very small sample sizes. In contrast to most clinical and epidemiological trials, raw data are often available for high-dimensional expression data. Therefore, direct data merging followed by a joint analysis of selected studies can be an alternative to meta-analysis by P value or effect-size merging or, more generally spoken, the merging of results. While several methods for meta-analysis of differential expression studies have been proposed, meta-analysis of gene set tests has very rarely been considered, although gene set tests are standard in the analysis of individual gene expression studies. We compare in this work the different strategies of research synthesis of gene set tests, in particularly the "early merging" of data cleaned from batch effects versus the "late merging" of individual results. In simulation studies and in examples of manipulated real-world data, we found that in most scenarios, the early merging has a higher sensitivity of detecting a gene set enrichment than the late merging. However, in scenarios with few studies, large batch effect, moderate and large sample sizes of late merging are more sensitive than early merging.
© 2018 John Wiley & Sons, Ltd.