Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1345206.1345262acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
poster

Semantics-based distributed I/O for mpiBLAST

Published: 20 February 2008 Publication History

Abstract

BLAST is a widely used software toolkit for genomic sequence search. mpiBLAST is a freely available, open-source parallelization of BLAST that uses database segmentation to allow different worker processes to search (in parallel) unique segments of the database. After searching, the workers write their output to a filesystem. While mpiBLAST has been shown to achieve high performance in clusters with fast local filesystems, its I/O processing remains a concern for scalability, especially in systems having limited I/O capabilities such as distributed filesystems spread across a wide-area network. Thus, we present ParaMEDIC---a novel environment that uses application-specific semantic information to compress I/O data and improve performance in distributed environments. Specifically, for mpiBLAST, ParaMEDIC partitions worker processes into compute and I/O workers. Compute workers, instead of directly writing the output to the filesystem, the workers process the output using semantic knowledge about the application to generate metadata and write the metadata to the filesystem. I/O workers, which physically reside closer to the actual storage, then process this metadata to re-create the actual output and write it to the filesystem. This approach allows ParaMEDIC to reduce I/O time, thus accelerating mpiBLAST by as much as 25-fold.

References

[1]
A. Darling, L. Carey, and W. Feng. The Design, Implementation, and Evaluation of mpiBLAST. In International Conference on Linux Clusters: The HPC Revolution 2003, 2003.
[2]
W. Feng. Green destiny + mpiblast = bioinfomagic. In International Conference on Parallel Computing (ParCo), 2003.
[3]
M Gardner, W Feng, J Archuleta, H Lin, and X Ma. Parallel genomic sequence-searching on an ad-hoc grid: Experiences, lessons learned, and implications. In ACM/IEEE SC2006: The International Conference on High-Performance Computing, Networking, and Storage, 2006.

Cited By

View all
  • (2011)OrthoInspector: comprehensive orthology analysis and visual explorationBMC Bioinformatics10.1186/1471-2105-12-1112:1Online publication date: 10-Jan-2011
  • (2010)Data parallelism in bioinformatics workflows using HydraProceedings of the 19th ACM International Symposium on High Performance Distributed Computing10.1145/1851476.1851550(507-515)Online publication date: 21-Jun-2010
  • (2010)Global‐scale distributed I/O with ParaMEDICConcurrency and Computation: Practice and Experience10.1002/cpe.159022:16(2266-2281)Online publication date: 23-Apr-2010
  • Show More Cited By

Index Terms

  1. Semantics-based distributed I/O for mpiBLAST

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
    February 2008
    308 pages
    ISBN:9781595937957
    DOI:10.1145/1345206
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 February 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. distributed i/o
    2. mpiblast

    Qualifiers

    • Poster

    Conference

    PPoPP08
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 230 of 1,014 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2011)OrthoInspector: comprehensive orthology analysis and visual explorationBMC Bioinformatics10.1186/1471-2105-12-1112:1Online publication date: 10-Jan-2011
    • (2010)Data parallelism in bioinformatics workflows using HydraProceedings of the 19th ACM International Symposium on High Performance Distributed Computing10.1145/1851476.1851550(507-515)Online publication date: 21-Jun-2010
    • (2010)Global‐scale distributed I/O with ParaMEDICConcurrency and Computation: Practice and Experience10.1002/cpe.159022:16(2266-2281)Online publication date: 23-Apr-2010
    • (2009)Semantic enabled metadata management in PetaShareInternational Journal of Grid and Utility Computing10.1504/IJGUC.2009.0279171:4(275-286)Online publication date: 1-Aug-2009

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media