A Nearly Linear-Time General Algorithm for Genome-Wide Bi-allele Haplotype Phasing

Will Casey⁶ &
Bud Mishra^6,7,8

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2913))

Included in the following conference series:

International Conference on High-Performance Computing

416 Accesses
1 Citations

Abstract

The determination of feature maps, such as STSs (sequence tag sites), SNPs (single nucleotide polymorphisms) or RFLP (restriction fragment length polymorphisms) maps, for each chromosome copy or haplotype in an individual has important potential applications to genetics, clinical biology and association studies. We consider the problem of reconstructing two haplotypes of a diploid individual from genotype data generated by mapping experiments, and present an algorithm to recover haplotypes. The problem of optimizing existing methods of SNP phasing with a population of diploid genotypes has been investigated in [7] and found to be NP-hard. In contrast, using single molecule methods, we show that although haplotypes are not known and data are further confounded by the mapping error model, reasonable assumptions on the mapping process allow us to recover the co-associations of allele types across consecutive loci and estimate the haplotypes with an efficient algorithm. The haplotype reconstruction algorithm requires two stages: Stage I is the detection of polymorphic marker types, this is done by modifying an EM–algorithm for Gaussian mixture models and an example is given for RFLP sizing. Stage II focuses on the problem of phasing and presents a method of local maximum likelihood for the inference of haplotypes in an individual. The algorithm presented is nearly linear in the number of polymorphic loci. The algorithm results, run on simulated RFLP sizing data, are encouraging, and suggest that the method will prove practical for haplotype phasing.

Work reported in this paper is funded by grants from NSF Qubic program, DARPA, HHMI biomedical support research grant, US DOE, US Air Force, NIH, New York office of Science and Technology & Academic Research

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A general model for likelihood computations of genetic marker data accounting for linkage, linkage disequilibrium, and mutations

Article 26 November 2014

Accurate genome-wide phasing from IBD data

Article Open access 23 November 2022

Accurate, scalable and integrative haplotype estimation

Article Open access 28 November 2019

References

Anantharaman, T.S., Mishra, B., Schwartz, D.C.: Genomics via Optical Mapping II: Ordered Restriction Maps. Journal of Computational Biology 4(2), 91–118 (1997)
Article Google Scholar
Bafna, V., Gusfield, D., Lancia, G., Yooseph, S.: Haplotyping as Perfect Phylogeny, A Direct Approach. Technical Report UC Davis CSE–2002–21
Google Scholar
Casey, W., Mishra, B., Wigler, M.: Placing Probes on the Genome with Pairwise Distance Data. In: Gascuel, O., Moret, B.M.E. (eds.) WABI 2001. LNCS, vol. 2149, pp. 52–68. Springer, Heidelberg (2001)
Chapter Google Scholar
Clark, A.: Inference of Haplotypes from PCR-Amplified Samples of Diploid Populations. Mol. Biol. Evol. 7, 111–122 (1990)
Google Scholar
Dempster, A., Laird, N.N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J.R. Stat. Soc. 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Excoffier, L., Slatkin, M.: Maximum–Likelihood Estimation of Molecular Haplotype Frequencies in a Diploid Population. Mol. Biol. Evol. 12, 921–927 (1995)
Google Scholar
Gusfield, D.: Inference of Haplotypes from Samples of Diploid Populations: Complexity and Algorithms. Journal of Computational Biology 8(3), 305–323 (2001)
Article MathSciNet Google Scholar
Ma, J., Xu, L., Jordan, M.: Asymptotic Convergence Rate of the EM– Algorithm for Gaussian Mixtures. Neural Computation 12(12), 2881–2907 (2000)
Article Google Scholar
Mitra, R., Church, G.: In situ localized amplification and contact replication of many individual DNA molecules. Nucleic Acids Research 27(24), e34-e34 (1999)
Google Scholar
Niu, T., Qin, Z., Xu, X., Liu, J.: Bayesian Haplotype Inference for Multiple Linked Single-Nucleotide Polymorphisms. Am. J. Hum. Genet. 70, 156–169 (2002)
Article Google Scholar
Parida, L., Mishra, B.: Partitioning Single-Molecule Maps into Multiple Populations: Algorithms And Probabilistic Analysis. Discrete Applied Mathematics (The Computational Molecular Biology Series) 104(l-3), 203–227 (2000)
MATH MathSciNet Google Scholar
Roweis, S., Ghahramani, Z.: A Unifying Review of Linear Gaussian Models. Neural Computation 11(2), 305–345 (1999)
Article Google Scholar
Stephens, M., Smith, N., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68, 978–989 (2001)
Article Google Scholar
Tarjan, R.E.: Data Structures and Network Algorithms, CBMS 44. SIAM, Philadelphia (1983)
Google Scholar
Weir, B.: Genetic Data Analysis II. Sinauer Associates, Sunderland (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Courant Institute of Mathematical Sciences, 251 Mercer St., New York, New York, USA
Will Casey & Bud Mishra
Cold Spring Harbor Lab, Cold Spring Harbor, 1 Bungtown Rd, New York, USA
Bud Mishra
Tata Institute of Fundamental Research, Homi Bhabha Road, Mumbai, 400 005, India
Bud Mishra

Authors

Will Casey
View author publications
You can also search for this author in PubMed Google Scholar
Bud Mishra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Southern California, CA 90089-2562, Los Angeles
Timothy Mark Pinkston
Department of Electrical Engineering, University of Southern California, CA 90089-2562, Los Angeles, USA
Viktor K. Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Casey, W., Mishra, B. (2003). A Nearly Linear-Time General Algorithm for Genome-Wide Bi-allele Haplotype Phasing. In: Pinkston, T.M., Prasanna, V.K. (eds) High Performance Computing - HiPC 2003. HiPC 2003. Lecture Notes in Computer Science, vol 2913. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24596-4_22

Download citation

DOI: https://doi.org/10.1007/978-3-540-24596-4_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20626-2
Online ISBN: 978-3-540-24596-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

A Nearly Linear-Time General Algorithm for Genome-Wide Bi-allele Haplotype Phasing

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A general model for likelihood computations of genetic marker data accounting for linkage, linkage disequilibrium, and mutations

Accurate genome-wide phasing from IBD data

Accurate, scalable and integrative haplotype estimation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Nearly Linear-Time General Algorithm for Genome-Wide Bi-allele Haplotype Phasing

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A general model for likelihood computations of genetic marker data accounting for linkage, linkage disequilibrium, and mutations

Accurate genome-wide phasing from IBD data

Accurate, scalable and integrative haplotype estimation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation