Application of a MAX-CUT Heuristic to the Contig Orientation Problem in Genome Assembly
Pages 476 - 483
Abstract
In the context of genome assembly, the contig orientation problem is described as the problem of removing sufficient edges from the scaffold graph so that the remaining subgraph assigns a consistent orientation to all sequence nodes in the graph. This problem can also be phrased as a weighted MAX-CUT problem. The performance of MAX-CUT heuristics in this application is untested. We present a greedy heuristic solution to the contig orientation problem and compare its performance to a weighted MAX-CUT semi-definite programming heuristic solution on several graphs. We note that the contig orientation problem can be used to identify inverted repeats and inverted haplotypes, as these represent sequences whose orientation appears ambiguous in the conventional genome assembly framework.
References
[1]
S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic acids research, 25(17):3389--3402, 1997.
[2]
F. Antonacci, J. M. Kidd, T. Marques-Bonet, M. Ventura, P. Siswara, Z. Jiang, and E. E. Eichler. Characterization of six human disease-associated inversion polymorphisms. Human molecular genetics, 18(14):2555--2566, 2009.
[3]
V. Bansal, A. Bashir, and V. Bafna. Evidence for large inversion polymorphisms in the human genome from hapmap data. Genome research, 17(2):219--230, 2007.
[4]
S. Batzoglou, D. B. Jaffe, K. Stanley, J. Butler, S. Gnerre, E. Mauceli, B. Berger, J. P. Mesirov, and E. S. Lander. Arachne: a whole-genome shotgun assembler. Genome research, 12(1):177--189, 2002.
[5]
P. Bodily, J. Price, M. Clement, and Q. Snell. Scaffoldscaffolder: An aggressive scaffold finishing algorithm. In Proceedings of the 2012 International Conference on Bioinformatics & Computational Biology, pages 385--390. CSREA Press, 2012.
[6]
J. Butler, I. MacCallum, M. Kleber, I. A. Shlyakhter, M. K. Belmonte, E. S. Lander, C. Nusbaum, and D. B. Jaffe. Allpaths: de novo assembly of whole-genome shotgun microreads. Genome research, 18(5):810--820, 2008.
[7]
A. Dayarian, T. P. Michael, and A. M. Sengupta. Sopra: Scaffolding algorithm for paired reads via statistical optimization. BMC bioinformatics, 11(1):345, 2010.
[8]
C. H. Ding, X. He, H. Zha, M. Gu, and H. D. Simon. A min-max cut algorithm for graph partitioning and data clustering. In Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, pages 107--114. IEEE, 2001.
[9]
N. Donmez and M. Brudno. Scarpa: scaffolding reads with practical algorithms. Bioinformatics, 29(4):428--434, 2013.
[10]
M. X. Goemans and D. P. Williamson. .879-approximation algorithms for max cut and max 2sat. In Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, pages 422--431. ACM, 1994.
[11]
M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 42(6):1115--1145, 1995.
[12]
W. Huang, L. Li, J. R. Myers, and G. T. Marth. Art: a next-generation sequencing read simulator. Bioinformatics, 28(4):593--594, 2012.
[13]
R. M. Karp. Reducibility among combinatorial problems. Springer, 1972.
[14]
S. Khot, G. Kindler, E. Mossel, and R. O'Donnell. Optimal inapproximability results for max-cut and other 2-variable csps? SIAM Journal on Computing, 37(1):319--357, 2007.
[15]
J. B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical society, 7(1):48--50, 1956.
[16]
B. Langmead and S. L. Salzberg. Fast gapped-read alignment with bowtie 2. Nature Methods, 9(4):357--359, 2012.
[17]
B. Langmead, C. Trapnell, M. Pop, S. L. Salzberg, et al. Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biol, 10(3):R25, 2009.
[18]
R. Li, H. Zhu, J. Ruan, W. Qian, X. Fang, Z. Shi, Y. Li, S. Li, G. Shan, K. Kristiansen, et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome research, 20(2):265--272, 2010.
[19]
M. W. M. Muskens, A. P. A. Vissers, J. N. M. Mol, and J. M. Kooter. Role of inverted dna repeats in transcriptional and post-transcriptional gene silencing. Plant Molecular Biology, 43(2-3):243--260, 2000.
[20]
J. Nijkamp, W. Winterbach, M. van den Broek, J.-M. Daran, M. Reinders, and D. de Ridder. Integrating genome assemblies with maia. Bioinformatics, 26(18):i433--i439, 2010.
[21]
N. Okuda. Hapmaker: Synthetic haplotype generator. preprint, 2013.
[22]
M. Pop, D. S. Kosack, and S. L. Salzberg. Hierarchical scaffolding with bambus. Genome research, 14(1):149--159, 2004.
[23]
F. Rendl, G. Rinaldi, and A. Wiegele. Solving max-cut to optimality by intersecting semidefinite and polyhedral relaxations. Mathematical Programming, 121(2):307--335, 2010.
[24]
D. C. Rio and G. M. Rubin. Identification and purification of a drosophila protein that binds to the terminal 31-base-pair inverted repeats of the p transposable element. Proceedings of the National Academy of Sciences, 85(23):8929--8933, 1988.
[25]
S. Sahni and T. Gonzalez. P-complete approximation problems. Journal of the ACM (JACM), 23(3):555--565, 1976.
[26]
D. R. Zerbino and E. Birney. Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome research, 18(5):821--829, 2008.
[27]
M. C. Zody, Z. Jiang, H.-C. Fung, F. Antonacci, L. W. Hillier, M. F. Cardone, T. A. Graves, J. M. Kidd, Z. Cheng, A. Abouelleil, et al. Evolutionary toggling of the mapt 17q21. 31 inversion region. Nature genetics, 40(9):1076--1083, 2008.
- Application of a MAX-CUT Heuristic to the Contig Orientation Problem in Genome Assembly
Recommendations
Recent duplication, evolution and assembly of the human genome
RECOMB '02: Proceedings of the sixth annual international conference on Computational biologyIt has been estimated that 5% of the human genome consists of interspersed duplicated material that has arisen over the last 30 million years of evolution. Two categories of recent duplicated segments can be distinguished: segmental duplications between ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In

September 2013
987 pages
ISBN:9781450324342
DOI:10.1145/2506583
Copyright © 2013 ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 22 September 2013
Check for updates
Qualifiers
- Tutorial
- Research
- Refereed limited
Conference
BCB'13
Sponsor:
Acceptance Rates
BCB'13 Paper Acceptance Rate 43 of 148 submissions, 29%;
Overall Acceptance Rate 254 of 885 submissions, 29%
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 96Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Reflects downloads up to 17 Feb 2025
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in