Abstract
Sequence alignment is a fundamental problem in computational biology, which is also important in theoretical computer science. In this paper, we consider the problem of aligning a set of sequences subject to a given constrained sequence. Given two sequences \(A=a_1a_2\ldots a_n\) and \(B=b_1b_2\ldots b_n\) with a given distance function and a constrained sequence \(C=c_1c_2\ldots c_k\), our goal is to find the optimal sequence alignment of A and B w.r.t. the constraint C. We investigate several variants of this problem. If \(C=c^k\), i.e., all characters in C are same, the optimal constrained pairwise sequence alignment can be solved in \(O(\min \{kn^2,(t-k)n^2\})\) time, where t is the minimum number of occurrences of character c in A and B. If in the final alignment, the alignment score between any two consecutive constrained characters is upper bounded by some value, which is called GB-CPSA, we give a dynamic programming with the time complexity \(O(kn^4/\log n)\). For the constrained center-star sequence alignment (CCSA), we prove that it is NP-hard to achieve the optimal alignment even over the binary alphabet. Furthermore, we show a negative result for CCSA, i.e., there is no polynomial-time algorithm to approximate the CCSA within any constant ratio.
Similar content being viewed by others
References
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Bonizzoni P, Vedova GD (2001) The complexity of multiple sequence alignment with sp-score that is a metric. Theor Comput Sci 259(1–2):63–79
Chin FYL, Santis AD, Ferrara AL, Ho NL, Kim SK (2004) A simple algorithm for the constrained sequence problems. Inf Process Lett 90:175–179
Chin FYL, Ho NL, Lam TW, Wong PWH (2005) Efficient constrained multiple sequence alignment with performance guarantee. J Bioinform Comput Biol 3(1):1–18
Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. The MIT Press, Cambridge
Garey M, Johnson D (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman and Company, San Francisco
Gusfield D (1993) Efficient methods for multiple sequence alignment with guaranteed error bounds. Bul Math Biol 55:141–154
Iliopoulos CS, Rahman MS (2008) Algorithms for computing variants of the longest common subsequence problem. Theor Comput Sci 395(2–3):255–267
Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG (2007) ClustalW and ClustalX version 2. Bioinformatics 23(21):2947–2948
Masek WJ, Paterson MS (1980) A faster algorithm computing string edit distances. J Comput Syst Sci 20(1):18–31
Mount DM (2004) Bioinformatics: sequence and genome analysis, 2nd edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor
Setubal J, Meidanis J (1997) Introduction to computational molecular biology (Chap. 3). PWS Publishing Company, Boston
Tang CY, Lu CL, Chang MD-T, Tsai Y-T, Sun Y-J, Chao K-M, Chang J-M, Chiou Y-H, Wu C-M, Chang H-T, Chou W-I (2003) Constrained multiple sequence alignment tool development and its application to rnase family alignment. J Bioinform Comput Biol 1(2):267–287
Wang L, Jiang T (1994) On the complexity of multiple sequence alignment. J Comput Biol 1(4):337–348
Acknowledgments
The authors thank the anonymous referees for their helpful comments to improve the presentation of this paper. This work was supported by NSFC (61433012, U1435215, 11171086), HK RGC Grant (HKU 7114/13E, HKU 7164/12E, HKU 7111/12E), HKU small project funding 201309176064, Natural Science Foundation of Hebei A2013201218, Chinese Academy of Sciences research Grant (No. KGZD-EW-103-5(9)), Fundamental Research Foundation of Northwestern Polytechnical University in China (Grant No. JC201164), Fundamental Research Funds for the Central Universities (Grant No. 3102015ZY081), and China Postdoctoral Science Foundation (Grant No. 2012M521803).
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of this paper appeared in the Proceedings of the 8th International Frontiers of Algorithmics Workshop (FAW 2014) Lecture Notes in Computer Science, Volume 8497, 2014, pp 309–319.
Rights and permissions
About this article
Cite this article
Zhang, Y., Chan, J.WT., Chin, F.Y.L. et al. Constrained pairwise and center-star sequences alignment problems. J Comb Optim 32, 79–94 (2016). https://doi.org/10.1007/s10878-015-9914-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-015-9914-6