A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval

Figure 5

Overview of the motif analysis pipeline.

The first step of the pipeline involves searching for motifs in each input set of DNA sequences, using complementary motif discovery algorithms. The motifs are filtered according to their abundance in the input set. In the second step the redundancy in the newly discovered set of motifs is reduced by clustering and merging the similar motifs. These steps are performed separately for each set (top boxes). Then, the motifs found in each input set are clustered and merged to create a global non-redundant set of motifs. These motifs are then associated with known motifs from pre-existing libraries. The refined motif set is ranked and filtered according to their abundance in each input set.

doi: https://doi.org/10.1371/journal.pcbi.1000010.g005