Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2063384.2063423acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

I/O streaming evaluation of batch queries for data-intensive computational turbulence

Published: 12 November 2011 Publication History

Abstract

We describe a method for evaluating computational turbulence queries, including Lagrange Polynomial interpolation, based on partial sums that allows the underlying data to be accessed in any order and in parts. We exploit these properties to stream data from disk in a single pass and concurrently evaluate batch queries. The combination of sequential I/O and data sharing improves performance by an order of magnitude when compared with direct evaluation of each query. The technique also supports distributed evaluation of queries in a database cluster, assembling the partial sums from each node at the query mediator. Interpolation is fundamental to computational turbulence, over 95% of queries use these routines, and the partial sums method allows the JHU Turbulence Database Cluster to realize scale and throughput for our scientists' data-intensive workloads.

References

[1]
P. Agrawal, D. Kifer, and C. Olston. Scheduling shared scans of large data files. Proceedings of VLDB, 1(1):958--969, August 2008.
[2]
P. A. Boncz, W. Quak, and M. L. Kersten. Monet and its geographic extensions: A novel approach to high performance GIS processing. In Extending Database Technology, 1996.
[3]
G. P. Copeland and S. N. Khoshafian. A decomposition storage model. In SIGMOD, 1985.
[4]
A. Deshpande and S. Madden. MauveDB: supporting model-based user views in database systems. In SIGMOD, 2006.
[5]
M. Erez, J. H. Ahn, J. Gummaraju, M. Rosenblum, and W. J. Dally. Executing irregular scientific applications on stream architectures. In International Conference on Supercomputing, 2007.
[6]
S. Grumbach, P. Rigaux, and L. Segoufin. Manipulating interpolated data is easier than you thought. In Conference on Very Large Data Bases, 2000.
[7]
Y. Li, E. Perlman, M. Wan, Y. Yang, C. Meneveau, R. Burns, S. Chen, A. Szalay, and G. Eyink. A public turbulence database cluster and applications to study Lagrangian evolution of velocity increments in turbulence. Journal of Turbulence, 9:31-+, 2008.
[8]
C. Moretti, H. Bui, K. Hollingsworth, B. Rich, P. Flynn, and D. Thain. All-pairs: An abstraction for data-intensive computing on campus grids. IEEE Trans. Parallel Distrib. Syst., 21:33--46, January 2010.
[9]
L. Neugebauer. Optimization and evaluation of database queries including embedded interpolation procedures. In SIGMOD, 1991.
[10]
E. Perlman, R. Burns, Y. Li, and C. Meneveau. Data exploration of turbulence simulations using a database cluster. In Supercomputing, 2007.
[11]
R. J. Purser and L. M. Leslie. An Efficient Interpolation Procedure for High-Order Three-Dimensional Semi-Lagrangian Models. Monthly Weather Review, 119:2492-+, 1991.
[12]
J. W. Romein, P. C. Broekema, E. van Meijeren, K. van der Schaaf, and W. H. Zwart. Astronomical real-time streaming signal processing on a Blue Gene/L supercomputer. In Symposium on Parallel Algorithms and Architectures, 2006.
[13]
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik. C-store: a column-oriented DBMS. In Conference on Very large Data Bases, 2005.
[14]
A. S. Szalay, G. Bell, J. Vandenberg, A. Wonders, R. Burns, D. Fay, J. Heasley, T. Hey, M. Nieto-Santisteban, A. Thakar, C. van Ingen, and R. Wilton. Graywulf: Scalable clustered architecture for data intensive computing. Hawai'i International Conference on System Sciences, 2009.
[15]
A. Thiagarajan and S. Madden. Querying continuous functions in a database system. In SIGMOD, 2008.
[16]
P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. Predictable performance for unpredictable workloads. Proceedings of Very Large Data Bases, 2(1), August 2009.
[17]
X. Wang, R. C. Burns, and T. Malik. Liferaft: Data-driven, batch processing for the exploration of scientific databases. In CIDR, 2009.
[18]
X. Wang, E. Perlman, R. Burns, T. Malik, T. Budavári, C. Meneveau, and A. Szalay. Jaws: Job-aware workload scheduling for the exploration of turbulence simulations. In Supercomputing, 2010.
[19]
R. H. Wolniewicz and G. Graefe. Algebraic optimization of computations over scientific databases. In Conference on Very Large Data Bases, pages 13--24, 1993.
[20]
H.-c. Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker. Map-reduce-merge: simplified relational data processing on large clusters. In SIGMOD, 2007.
[21]
X. Yang, J. Du, X. Yan, and Y. Deng. Matrix-based streamization approach for improving locality and parallelism on ft64 stream processor. J. Supercomput., 47(2):171--197, February 2009.
[22]
L. Yi, C. Moretti, S. Emrich, K. Judd, and D. Thain. Harnessing parallelism in multicore clusters with the all-pairs and wavefront abstractions. In High Performance Distributed Computing, 2009.
[23]
J.-B. Yu and D. J. DeWitt. Query pre-execution and batching in Paradise: A two-pronged approach to the efficient processing of queries on tape-resident raster images. In Conference on Scientific and Statistical Database Management, 1997.

Cited By

View all
  • (2015)Particle tracking in open simulation laboratoriesProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2807591.2807645(1-11)Online publication date: 15-Nov-2015
  • (2015)The Johns Hopkins Turbulence Databases: An Open Simulation Laboratory for Turbulence ResearchComputing in Science & Engineering10.1109/MCSE.2015.10317:5(10-17)Online publication date: Sep-2015
  • (2015)A Web services accessible database of turbulent channel flow and its use for testing a new integral wall model for LESJournal of Turbulence10.1080/14685248.2015.108865617:2(181-215)Online publication date: 2-Dec-2015
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
November 2011
866 pages
ISBN:9781450307710
DOI:10.1145/2063384
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. I/O streaming
  2. data-intensive computing
  3. database clusters
  4. query evaluation
  5. query optimization
  6. software for high-throughput computing

Qualifiers

  • Research-article

Funding Sources

Conference

SC '11
Sponsor:

Acceptance Rates

SC '11 Paper Acceptance Rate 74 of 352 submissions, 21%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)2
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Particle tracking in open simulation laboratoriesProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2807591.2807645(1-11)Online publication date: 15-Nov-2015
  • (2015)The Johns Hopkins Turbulence Databases: An Open Simulation Laboratory for Turbulence ResearchComputing in Science & Engineering10.1109/MCSE.2015.10317:5(10-17)Online publication date: Sep-2015
  • (2015)A Web services accessible database of turbulent channel flow and its use for testing a new integral wall model for LESJournal of Turbulence10.1080/14685248.2015.108865617:2(181-215)Online publication date: 2-Dec-2015
  • (2012)Data-intensive spatial filtering in large numerical simulation datasetsProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/2388996.2389078(1-9)Online publication date: 10-Nov-2012

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media