Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Free access

MapReduce: a flexible data processing tool

Published: 01 January 2010 Publication History

Abstract

MapReduce advantages over parallel databases include storage-system independence and fine-grain fault tolerance for large jobs.

References

[1]
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Silberschatz, A., and Rasin, A. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In Proceedings of the Conference on Very Large Databases (Lyon, France, 2009); http://db.cs.yale.edu/hadoopdb/
[2]
Aster Data Systems, Inc. In-Database MapReduce for Rich Analytics; http://www.asterdata.com/product/mapreduce.php.
[3]
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R.E. Bigtable: A distributed storage system for structured data. In Proceedings of the Seventh Symposium on Operating System Design and Implementation (Seattle, WA, Nov. 6--8). Usenix Association, 2006; http://labs.google.com/papers/bigtable.html
[4]
Dean, J. and Ghemawat, S. MapReduce: Simplified data processing on large clusters. In Proceedings of the Sixth Symposium on Operating System Design and Implementation (San Francisco, CA, Dec. 6--8). Usenix Association, 2004; http://labs.google.com/papers/mapreduce.html
[5]
Dewitt, D. and Stonebraker, M. MapReduce: A Major Step Backwards blogpost; http://databasecolumn.vertica.com/database-innovation/mapreduce-a-major-step-backwards/
[6]
Dewitt, D. and Stonebraker, M. MapReduce II blogpost; http://databasecolumn.vertica.com/database-innovation/mapreduce-ii/
[7]
Ghemawat, S., Gobioff, H., and Leung, S.-T. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (Lake George, NY, Oct. 19--22). ACM Press, New York, 2003; http://labs.google.com/papers/gfs.html
[8]
Google. Protocol Buffers: Google's Data Interchange Format. Documentation and open source release; http://code.google.com/p/protobuf/
[9]
Greenplum. Greenplum MapReduce: Bringing Next-Generation Analytics Technology to the Enterprise; http://www.greenplum.com/resources/mapreduce/
[10]
Hadoop. Documentation and open source release; http://hadoop.apache.org/core/
[11]
Hadoop. Users list; http://wiki.apache.org/hadoop/PoweredBy
[12]
Olston, C., Reed, B., Srivastava, U., Kumar, R., and Tomkins, A. Pig Latin: A not-so-foreign language for data processing. In Proceedings of the ACM SIGMOD 2008 International Conference on Management of Data (Auckland, New Zealand, June 2008); http://hadoop.apache.org/pig/
[13]
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., and Stonebraker, M. A comparison of approaches to large-scale data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference (Providence, RI, June 29--July 2). ACM Press, New York, 2009; http://database.cs.brown.edu/projects/mapreduce-vs-dbms/
[14]
Pike, R., Dorward, S., Griesemer, R., and Quinlan, S. Interpreting the data: Parallel analysis with Sawzall. Scientific Programming Journal, Special Issue on Grids and Worldwide Computing Programming Models and Infrastructure 13, 4, 227--298. http://labs.google.com/papers/sawzall.html

Cited By

View all
  • (2024)Clustering Social Networking Data With K-Means Algorithm Using R LanguageInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT2410410510:4(23-30)Online publication date: 1-Jul-2024
  • (2024)Apache TsFile: An IoT-Native Time Series File FormatProceedings of the VLDB Endowment10.14778/3685800.368582717:12(4064-4076)Online publication date: 1-Aug-2024
  • (2024)Streaming Graph Algorithms in the Massively Parallel Computation ModelProceedings of the 43rd ACM Symposium on Principles of Distributed Computing10.1145/3662158.3662770(496-507)Online publication date: 17-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 53, Issue 1
Amir Pnueli: Ahead of His Time
January 2010
142 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/1629175
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2010
Published in CACM Volume 53, Issue 1

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Popular
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,154
  • Downloads (Last 6 weeks)181
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Clustering Social Networking Data With K-Means Algorithm Using R LanguageInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT2410410510:4(23-30)Online publication date: 1-Jul-2024
  • (2024)Apache TsFile: An IoT-Native Time Series File FormatProceedings of the VLDB Endowment10.14778/3685800.368582717:12(4064-4076)Online publication date: 1-Aug-2024
  • (2024)Streaming Graph Algorithms in the Massively Parallel Computation ModelProceedings of the 43rd ACM Symposium on Principles of Distributed Computing10.1145/3662158.3662770(496-507)Online publication date: 17-Jun-2024
  • (2024)H-mrk-means: Enhanced Heuristic mrk-means for Linear Time Clustering of Big Data Using Hybrid Meta-heuristic AlgorithmJournal of Information & Knowledge Management10.1142/S021964922450054023:04Online publication date: 11-May-2024
  • (2024)Design of appearance patent retrieval system based on MapReduce cluster frameworkFourth International Conference on Telecommunications, Optics, and Computer Science (TOCS 2023)10.1117/12.3026172(33)Online publication date: 7-May-2024
  • (2024)Scalable Analysis of English Dictionary Files on HPCC Systems Big Data Platform2024 9th International Conference on Big Data Analytics (ICBDA)10.1109/ICBDA61153.2024.10607199(328-333)Online publication date: 16-Mar-2024
  • (2024)Efficient Data Structures and Algorithms for Cloud Computing Platforms2024 4th International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE)10.1109/ICACITE60783.2024.10617203(1717-1721)Online publication date: 14-May-2024
  • (2024)A Survey on Forensics and Compliance Auditing for Critical Infrastructure ProtectionIEEE Access10.1109/ACCESS.2023.334855212(2409-2444)Online publication date: 2024
  • (2024)A novel linear time clustering using heuristically improved mrk-medoids based on modified squirrel search algorithmAustralian Journal of Electrical and Electronics Engineering10.1080/1448837X.2024.2333670(1-16)Online publication date: 21-Apr-2024
  • (2024)Leveraging pervasive computing for ambient intelligenceComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2023.110156239:COnline publication date: 1-Feb-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media