Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1923947.1923990dlproceedingsArticle/Chapter ViewAbstractPublication PagescasconConference Proceedingsconference-collections
research-article

Web data processing on the cloud

Published: 01 November 2010 Publication History

Abstract

Cloud computing is emerging as a highly scalable, fault-tolerant, and cost-effective way to process large amounts of information on the Web. Thanks in part to new data processing paradigms designed with the Cloud in mind (such as MapReduce[1], HDFS[2], Cassandra[3], etc), it is quickly gaining acceptance as a viable platform for organizations that need to store, process, and publish large amounts of data. MapReduce is attractive for processing data on the Cloud, to a large extent, because of its simplicity and flexibility. Implementations of MapReduce usually include a simple API used to describe which part of the processing is done in parallel (Map phase), and which part of the processing is done after grouping data on a single machine (Reduce phase). It does not rely on a pre-existing data model, making it possible to process any kind of information independently of the model. Cloud applications have notably been used to perform off-line analytical processing such as analyzing web request logs, computing user recommendations, and understanding scenes in images [4].

References

[1]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Pages 137--150. OSDI'04.
[2]
Hadoop Distributed File System. http://hadoop.apache.org/hdfs/. Accessed August 13, 2010.
[3]
A. Lakshman and P. Malik. Cassandra - A Decentralized Structured Storage System. LADIS'09.
[4]
S. Chen and S. W. Schlosser. Map-Reduce Meets Wider Varieties of Applications. Intel Research Pittsburgh Tech Report, IRP-TR-08-05, May, 2008.
[5]
Jaql. http://code.google.com/p/jaql/. Accessed August 13, 2010.
[6]
MongoDB. http://www.mongodb.org/. Accessed August 13, 2010.
[7]
Xadoop. http://www.xadoop.org/. Accessed August 13, 2010.
[8]
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. Pages 1099-1110. SIGMOD '08.
[9]
A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Pages 922--933. PVLDB'09. http://db.cs.yale.edu/hadoopdb/hadoopdb.html. Accessed August 13, 2010.
[10]
S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla and P. J. Haas, and J. McPherson. Ricardo: integrating R and Hadoop. Pages 987--998. SIGMOD '10.
[11]
F. Ableson. Using XML and JSON with Android, Part 1: Explore the benefits of JSON and XML in Android applications. http://www.ibm.com/developerworks/xml/library/x-andbene1/. Accessed August 12, 2010.
[12]
T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. MapReduce Online. NSDI'10.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
CASCON '10: Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
November 2010
482 pages

Publisher

IBM Corp.

United States

Publication History

Published: 01 November 2010

Qualifiers

  • Research-article

Conference

CASCON '10
CASCON '10: Center for Advanced Studies on Collaborative Research
November 1 - 4, 2010
Ontario, Toronto, Canada

Acceptance Rates

Overall Acceptance Rate 24 of 90 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 338
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media