Abstract
In the minimum common string partition (MCSP) problem two related input strings are given. “Related” refers to the property that both strings consist of the same set of letters appearing the same number of times in each of the two strings. The MCSP seeks a minimum cardinality partitioning of one string into non-overlapping substrings that is also a valid partitioning for the second string. This problem has applications in bioinformatics e.g. in analyzing related DNA or protein sequences. For strings with lengths less than about 1000 letters, a previously published integer linear programming (ILP) formulation yields, when solved with a state-of-the-art solver such as CPLEX, satisfactory results. In this work, we propose a new, alternative ILP model that is compared to the former one. While a polyhedral study shows the linear programming relaxations of the two models to be equally strong, a comprehensive experimental comparison using real-world as well as artificially created benchmark instances indicates substantial computational advantages of the new formulation.
Similar content being viewed by others
Notes
Note that we confirmed, in this context, that in all cases the values of the LP relaxations concerning \(\textsc {Ilp}_{\mathrm {cb}}\) and \(\textsc {Ilp}_{\mathrm {cs}}\) were equal.
References
Blum, C., Lozano, J.A., Pinacho Davidson, P.: Iterative probabilistic tree search for the minimum common string partition problem. In: Blesa, M.J., Blum, C., Voss, S. (eds.) Proceedings of HM 20104—9th International Workshop on Hybrid Metaheuristics. Lecture Notes in Computer Science, vol. 8457, pp. 154–154. Springer, Berlin (2014)
Blum, C., Lozano, J.A., Pinacho Davidson, P.: Mathematical programming strategies for solving the minimum common string partition problem. Eur. J. Oper. Res. 242(3), 769–777 (2015)
Chen, X., Zheng, J., Fu, Z., Nan, P., Zhong, Y., Lonardi, S., Jiang, T.: Assignment of orthologous genes via genome rearrangement. IEEE/ACM Trans. Comput. Biol. Bioinform. 2(4), 302–315 (2005)
Chrobak, M., Kolman, P., Sgall, J.: The greedy algorithm for the minimum common string partition problem. In: Jansen, K., Khanna, S., Rolim, J.D.P., Ron, D. (eds.) Proceedings of APPROX 2004—7th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems. Lecture Notes in Computer Science, vol. 3122, pp. 84–95. Springer, Berlin (2004)
Cormode, G., Muthukrishnan, S.: The string edit distance matching problem with moves. ACM Trans. Algorithms 3(2), 1–19 (2007)
Damaschke, P.: Minimum common string partition parameterized. In: Crandall, K.A., Lagergren, J. (eds.) Proceedings of WABI 2008—8th International Workshop on Algorithms in Bioinformatics. Lecture Notes in Computer Science, vol. 5251, pp. 87–98. Springer, Berlin (2008)
Ferdous, S.M., Sohel Rahman, M.: Solving the minimum common string partition problem with the help of ants. In: Tan, Y., Shi, Y., Mo, H. (eds.) Proceedings of ICSI 2013—4th International Conference on Advances in Swarm Intelligence. Lecture Notes in Computer Science, vol. 7928, pp. 306–313. Springer, Berlin (2013)
Ferdous, S.M., Sohel Rahman, M.: A MAX–MIN ant colony system for minimum common string partition problem (2014). arXiv:1401.4539
Fu, B., Jiang, H., Yang, B., Zhu, B.: Exponential and polynomial time algorithms for the minimum common string partition problem. In: Wang, W., Zhu, X., Du, D.Z. (eds.) Proceedings of COCOA 2011—5th International Conference on Combinatorial Optimization and Applications. Lecture Notes in Computer Science, vol. 6831, pp. 299–310. Springer, Berlin (2011)
Gallardo, J.E.: A multilevel probabilistic beam search algorithm for the shortest common supersequence problem. PLoS One 7(12) (2012)
Garey, M.R., Johnson, D.S.: Computers and intractability; a guide to the theory of NP-completeness. W. H. Freeman, San Francisco (1979)
Goldstein, A., Kolman, P., Zheng, J.: Minimum common string partition problem: Hardness and approximations. In: Fleischer, R., Trippen, G. (eds.) Proceedings of ISAAC 2004—15th International Symposium on Algorithms and Computation. Lecture Notes in Computer Science, vol. 3341, pp. 484–495. Springer, Berlin (2005)
Goldstein, I., Lewenstein, M.: Quick greedy computation for minimum common string partitions. In: Giancarlo, R., Manzini, G. (eds.) Proceedings of CPM 2011—22nd Annual Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 6661, pp. 273–284. Springer, Berlin (2011)
He, D.: A novel greedy algorithm for the minimum common string partition problem. In: Mandoiu, I., Zelikovsky, A. (eds.) Proceedings of ISBRA 2007—Third International Symposium on Bioinformatics Research and Applications. Lecture Notes in Computer Science, vol. 4463, pp. 441–452. Springer, Berlin (2007)
Hsu, W.J., Du, M.W.: Computing a longest common subsequence for a set of strings. BIT Numer. Math. 24(1), 45–59 (1984). doi:10.1007/BF01934514
Jiang, H., Zhu, B., Zhu, D., Zhu, H.: Minimum common string partition revisited. J. Comb. Optim. 23(4), 519–527 (2012)
Kaplan, H., Shafrir, N.: The greedy algorithm for edit distance with moves. Inf. Process. Lett. 97(1), 23–27 (2006)
Kolman, P.: Approximating reversal distance for strings with bounded number of duplicates. In: Jedrzejowicz, J., Szepietowski, A. (eds.) Proceedings of MFCS 2005—30th International Symposium on Mathematical Foundations of Computer Science. Lecture Notes in Computer Science, vol. 3618, pp. 580–590. Springer, Berlin (2005)
Kolman, P., Waleń, T.: Reversal distance for strings with duplicates: linear time approximation using hitting set. In: Erlebach, T., Kaklamanis, C. (eds.) Proceedings of WAOA 2007—4th International Workshop on Approximation and Online Algorithms. Lecture Notes in Computer Science, vol. 4368, pp. 279–289. Springer, Berlin (2007)
Meneses, C., Oliveira, C., Pardalos, P.: Optimization techniques for string selection and comparison problems in genomics. IEEE Eng. Med. Biol. Mag. 24(3), 81–87 (2005)
Mousavi, S., Babaie, M., Montazerian, M.: An improved heuristic for the far from most strings problem. J. Heuristics 18, 239–262 (2012)
Shapira, D., Storer, J.A.: Edit distance with move operations. In: Apostolico, A., Takeda, M. (eds.) Proceedings of CPM 2002—13th Annual Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2373, pp. 85–98. Springer, Berlin (2002)
Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Acknowledgments
C. Blum acknowledges support by grant TIN2012-37930-02 of the Spanish Government. In addition, support is acknowledged from IKERBASQUE (Basque Foundation for Science). Our experiments have been executed in the High Performance Computing environment managed by RDlab (http://rdlab.lsi.upc.edu) and we would like to thank them for their support.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Blum, C., Raidl, G.R. Computational performance evaluation of two integer linear programming models for the minimum common string partition problem. Optim Lett 10, 189–205 (2016). https://doi.org/10.1007/s11590-015-0921-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-015-0921-4