Improvement of Performance of MegaBlast Algorithm for DNA Sequence Alignment

Guang-Ming Tan^1,2,
Lin Xu^1,2,
Dong-Bo Bu¹,
Sheng-Zhong Feng¹ &
…
Ning-Hui Sun¹

226 Accesses
4 Citations
Explore all metrics

Abstract

MegaBlast is one of the most important programs in NCBI BLAST (Basic Local Alignment Search Tool) toolkits. However, MegaBlast is computation and I/O intensive. It consumes a great deal of memory which is proportional to the size of the query sequences set and subject (database) sequences set of product. This paper proposes a new strategy for optimizing MegaBlast. The new strategy exchanges the query and subject sequences sets, and builds a hash table based on new subject sequences. It overlaps I/O with computation, shortens the overall time and reduces the cost of memory, since the memory here is only proportional to the size of subject sequences set. The optimized algorithm is suitable to be parallelized in cluster systems. The parallel algorithm uses query segmentation method. As our experiments shown, the parallel program which is implemented with MPI has fine scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HCudaBLAST: an implementation of BLAST on Hadoop and Cuda

Article Open access 16 November 2017

Massively Parallel Sequence Alignment with BLAST Through Work Distribution Implemented Using PCJ Library

Pairwise alignment of nucleotide sequences using maximal exact matches

Article Open access 21 May 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Gusfield D. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1999.
http://www.genome.washington.edu/uwgc.
Pearson W R. Searching protein sequences libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithm. Genomics, 1991, 11: 635–650.
Article Google Scholar
Smith T F, Waterman M S. Identification of common molecular subsequences. Journal of Molecular Biology, 1981, 147(1): 195–197.
Article Google Scholar
Altschul S F, Gish W, Miller W et al. Basic local alignment search tool. Journal of Molecular Biology, 1990, 215: 403–410.
Article Google Scholar
Galison F. The Fasta and Blast programs. 2000, http://bioweb.pasteur.fr/seqanal/blast/.
Tao J, Xu Y, Zhang M Q. Current Topics in Computational Molecular Biology. Tsinghua University Press, The MIT Press, 2002.
Google Scholar
Zheng Z, Schwartz S, Wagner L et al. A greedy algorithm for aligning DNA sequence. Jounal of Computational Biology, 2000, 7(12, 1/2): 203–214.
Article Google Scholar
Singh R K, Dettloff W D, Chi V L et al. BioSCAN: A network-sharable computational resource for searching biose-quence databases. Computer Applications in the Biosciences, 1996, 12(3): 191–196.
Google Scholar
http://www.timelogic.com/.
http://www.sun.com/products-n-solutions/edu/events/archive/hpc.
A E Darling, L Carey, Feng Wu-Chun. The design, implementation, and evaluation of mpiBlast. In ClusterWorld Conference & Expo in Conjunction with the 4th International Conference on Linux Clusters: The HPC Revolution, California, USA, June 2003, pp.1–14.
Bjornason R, Sherman A, Weston S et al. Turboblast: A parallel implementation of blast based on the turbohub process integration architecture. In Proc. the 1st IEEE Workshop on High-Performance Computational Biology, Florida, USA, April 2002, pp.164–171.
http://oprofile.sourceforge.net/news/.
http://mpiblast.lanl.gov.
Hwang K, Xu Z. Scalable Parallel Computing: Technology, Architecture, Programming. McGraw-Hill, 1999.
Grama A, Gupta A, Karypis K, Kumar V. Introduction to Parallel Computing. Second Edition, Addison Wesley, 2003.

Download references

Author information

Authors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100080, P.R. China
Guang-Ming Tan, Lin Xu, Dong-Bo Bu, Sheng-Zhong Feng & Ning-Hui Sun
Graduate University of Chinese Academy of Sciences, Beijing, 100039, P.R. China
Guang-Ming Tan & Lin Xu

Authors

Guang-Ming Tan
View author publications
You can also search for this author in PubMed Google Scholar
Lin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Bo Bu
View author publications
You can also search for this author in PubMed Google Scholar
Sheng-Zhong Feng
View author publications
You can also search for this author in PubMed Google Scholar
Ning-Hui Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guang-Ming Tan.

Additional information

Supported by the National Natural Science Foundation of China under Grant No. 60372040, Knowledge Innovative Project of Chinese Academy of Sciences under Grant No. KSCX2-SW-233 and 863 Grid Node of Hong Kong University under Grant No. 2002AA104530.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tan, GM., Xu, L., Bu, DB. et al. Improvement of Performance of MegaBlast Algorithm for DNA Sequence Alignment. J Comput Sci Technol 21, 973–978 (2006). https://doi.org/10.1007/s11390-006-0973-0

Download citation

Received: 14 May 2005
Revised: 21 February 2006
Issue Date: November 2006
DOI: https://doi.org/10.1007/s11390-006-0973-0

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

HCudaBLAST: an implementation of BLAST on Hadoop and Cuda

Massively Parallel Sequence Alignment with BLAST Through Work Distribution Implemented Using PCJ Library

Pairwise alignment of nucleotide sequences using maximal exact matches

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Improvement of Performance of MegaBlast Algorithm for DNA Sequence Alignment

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

HCudaBLAST: an implementation of BLAST on Hadoop and Cuda

Massively Parallel Sequence Alignment with BLAST Through Work Distribution Implemented Using PCJ Library

Pairwise alignment of nucleotide sequences using maximal exact matches

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation