Abstract
MegaBlast is one of the most important programs in NCBI BLAST (Basic Local Alignment Search Tool) toolkits. However, MegaBlast is computation and I/O intensive. It consumes a great deal of memory which is proportional to the size of the query sequences set and subject (database) sequences set of product. This paper proposes a new strategy for optimizing MegaBlast. The new strategy exchanges the query and subject sequences sets, and builds a hash table based on new subject sequences. It overlaps I/O with computation, shortens the overall time and reduces the cost of memory, since the memory here is only proportional to the size of subject sequences set. The optimized algorithm is suitable to be parallelized in cluster systems. The parallel algorithm uses query segmentation method. As our experiments shown, the parallel program which is implemented with MPI has fine scalability.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Gusfield D. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1999.
Pearson W R. Searching protein sequences libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithm. Genomics, 1991, 11: 635–650.
Smith T F, Waterman M S. Identification of common molecular subsequences. Journal of Molecular Biology, 1981, 147(1): 195–197.
Altschul S F, Gish W, Miller W et al. Basic local alignment search tool. Journal of Molecular Biology, 1990, 215: 403–410.
Galison F. The Fasta and Blast programs. 2000, http://bioweb.pasteur.fr/seqanal/blast/.
Tao J, Xu Y, Zhang M Q. Current Topics in Computational Molecular Biology. Tsinghua University Press, The MIT Press, 2002.
Zheng Z, Schwartz S, Wagner L et al. A greedy algorithm for aligning DNA sequence. Jounal of Computational Biology, 2000, 7(12, 1/2): 203–214.
Singh R K, Dettloff W D, Chi V L et al. BioSCAN: A network-sharable computational resource for searching biose-quence databases. Computer Applications in the Biosciences, 1996, 12(3): 191–196.
http://www.sun.com/products-n-solutions/edu/events/archive/hpc.
A E Darling, L Carey, Feng Wu-Chun. The design, implementation, and evaluation of mpiBlast. In ClusterWorld Conference & Expo in Conjunction with the 4th International Conference on Linux Clusters: The HPC Revolution, California, USA, June 2003, pp.1–14.
Bjornason R, Sherman A, Weston S et al. Turboblast: A parallel implementation of blast based on the turbohub process integration architecture. In Proc. the 1st IEEE Workshop on High-Performance Computational Biology, Florida, USA, April 2002, pp.164–171.
Hwang K, Xu Z. Scalable Parallel Computing: Technology, Architecture, Programming. McGraw-Hill, 1999.
Grama A, Gupta A, Karypis K, Kumar V. Introduction to Parallel Computing. Second Edition, Addison Wesley, 2003.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National Natural Science Foundation of China under Grant No. 60372040, Knowledge Innovative Project of Chinese Academy of Sciences under Grant No. KSCX2-SW-233 and 863 Grid Node of Hong Kong University under Grant No. 2002AA104530.
Rights and permissions
About this article
Cite this article
Tan, GM., Xu, L., Bu, DB. et al. Improvement of Performance of MegaBlast Algorithm for DNA Sequence Alignment. J Comput Sci Technol 21, 973–978 (2006). https://doi.org/10.1007/s11390-006-0973-0
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/s11390-006-0973-0