Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Improvement of Performance of MegaBlast Algorithm for DNA Sequence Alignment

  • Short Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

MegaBlast is one of the most important programs in NCBI BLAST (Basic Local Alignment Search Tool) toolkits. However, MegaBlast is computation and I/O intensive. It consumes a great deal of memory which is proportional to the size of the query sequences set and subject (database) sequences set of product. This paper proposes a new strategy for optimizing MegaBlast. The new strategy exchanges the query and subject sequences sets, and builds a hash table based on new subject sequences. It overlaps I/O with computation, shortens the overall time and reduces the cost of memory, since the memory here is only proportional to the size of subject sequences set. The optimized algorithm is suitable to be parallelized in cluster systems. The parallel algorithm uses query segmentation method. As our experiments shown, the parallel program which is implemented with MPI has fine scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Gusfield D. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1999.

  2. http://www.genome.washington.edu/uwgc.

  3. Pearson W R. Searching protein sequences libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithm. Genomics, 1991, 11: 635–650.

    Article  Google Scholar 

  4. Smith T F, Waterman M S. Identification of common molecular subsequences. Journal of Molecular Biology, 1981, 147(1): 195–197.

    Article  Google Scholar 

  5. Altschul S F, Gish W, Miller W et al. Basic local alignment search tool. Journal of Molecular Biology, 1990, 215: 403–410.

    Article  Google Scholar 

  6. Galison F. The Fasta and Blast programs. 2000, http://bioweb.pasteur.fr/seqanal/blast/.

  7. Tao J, Xu Y, Zhang M Q. Current Topics in Computational Molecular Biology. Tsinghua University Press, The MIT Press, 2002.

    Google Scholar 

  8. Zheng Z, Schwartz S, Wagner L et al. A greedy algorithm for aligning DNA sequence. Jounal of Computational Biology, 2000, 7(12, 1/2): 203–214.

    Article  Google Scholar 

  9. Singh R K, Dettloff W D, Chi V L et al. BioSCAN: A network-sharable computational resource for searching biose-quence databases. Computer Applications in the Biosciences, 1996, 12(3): 191–196.

    Google Scholar 

  10. http://www.timelogic.com/.

  11. http://www.sun.com/products-n-solutions/edu/events/archive/hpc.

  12. A E Darling, L Carey, Feng Wu-Chun. The design, implementation, and evaluation of mpiBlast. In ClusterWorld Conference & Expo in Conjunction with the 4th International Conference on Linux Clusters: The HPC Revolution, California, USA, June 2003, pp.1–14.

  13. Bjornason R, Sherman A, Weston S et al. Turboblast: A parallel implementation of blast based on the turbohub process integration architecture. In Proc. the 1st IEEE Workshop on High-Performance Computational Biology, Florida, USA, April 2002, pp.164–171.

  14. http://oprofile.sourceforge.net/news/.

  15. http://mpiblast.lanl.gov.

  16. Hwang K, Xu Z. Scalable Parallel Computing: Technology, Architecture, Programming. McGraw-Hill, 1999.

  17. Grama A, Gupta A, Karypis K, Kumar V. Introduction to Parallel Computing. Second Edition, Addison Wesley, 2003.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guang-Ming Tan.

Additional information

Supported by the National Natural Science Foundation of China under Grant No. 60372040, Knowledge Innovative Project of Chinese Academy of Sciences under Grant No. KSCX2-SW-233 and 863 Grid Node of Hong Kong University under Grant No. 2002AA104530.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tan, GM., Xu, L., Bu, DB. et al. Improvement of Performance of MegaBlast Algorithm for DNA Sequence Alignment. J Comput Sci Technol 21, 973–978 (2006). https://doi.org/10.1007/s11390-006-0973-0

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-006-0973-0

Keywords

Navigation