research-article

Web data processing on the cloud

Authors:

Shahan Khatchadourian,

Mariano Consens,

Jerome SimeonAuthors Info & Claims

CASCON '10: Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research

Pages 356 - 358

https://doi.org/10.1145/1923947.1923990

Published: 01 November 2010 Publication History

Get Access

Abstract

Cloud computing is emerging as a highly scalable, fault-tolerant, and cost-effective way to process large amounts of information on the Web. Thanks in part to new data processing paradigms designed with the Cloud in mind (such as MapReduce[1], HDFS[2], Cassandra[3], etc), it is quickly gaining acceptance as a viable platform for organizations that need to store, process, and publish large amounts of data. MapReduce is attractive for processing data on the Cloud, to a large extent, because of its simplicity and flexibility. Implementations of MapReduce usually include a simple API used to describe which part of the processing is done in parallel (Map phase), and which part of the processing is done after grouping data on a single machine (Reduce phase). It does not rely on a pre-existing data model, making it possible to process any kind of information independently of the model. Cloud applications have notably been used to perform off-line analytical processing such as analyzing web request logs, computing user recommendations, and understanding scenes in images [4].

References

[1]

J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Pages 137--150. OSDI'04.

Digital Library

Google Scholar

[2]

Hadoop Distributed File System. http://hadoop.apache.org/hdfs/. Accessed August 13, 2010.

Google Scholar

[3]

A. Lakshman and P. Malik. Cassandra - A Decentralized Structured Storage System. LADIS'09.

Google Scholar

[4]

S. Chen and S. W. Schlosser. Map-Reduce Meets Wider Varieties of Applications. Intel Research Pittsburgh Tech Report, IRP-TR-08-05, May, 2008.

Google Scholar

[5]

Jaql. http://code.google.com/p/jaql/. Accessed August 13, 2010.

Google Scholar

[6]

MongoDB. http://www.mongodb.org/. Accessed August 13, 2010.

Google Scholar

[7]

Xadoop. http://www.xadoop.org/. Accessed August 13, 2010.

Google Scholar

[8]

C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. Pages 1099-1110. SIGMOD '08.

Digital Library

Google Scholar

[9]

A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Pages 922--933. PVLDB'09. http://db.cs.yale.edu/hadoopdb/hadoopdb.html. Accessed August 13, 2010.

Digital Library

Google Scholar

[10]

S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla and P. J. Haas, and J. McPherson. Ricardo: integrating R and Hadoop. Pages 987--998. SIGMOD '10.

Digital Library

Google Scholar

[11]

F. Ableson. Using XML and JSON with Android, Part 1: Explore the benefits of JSON and XML in Android applications. http://www.ibm.com/developerworks/xml/library/x-andbene1/. Accessed August 12, 2010.

Google Scholar

[12]

T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. MapReduce Online. NSDI'10.

Digital Library

Google Scholar

Index Terms

Web data processing on the cloud

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

CASCON '10: Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research

November 2010

482 pages

Conference Chairs:
Joanna Ng
IBM Canada Lab, Toronto
,
Christian Couturier
National Research Council Canada
,
Editors:
Hausi A. Müller
University of Victoria
,
Arthur Ryman
IBM Canada
,
Anatol W. Kark
National Research Council Canada
,
Program Chairs:
Hausi A. Müller,
Arthur Ryman

Publisher

IBM Corp.

United States

Publication History

Published: 01 November 2010

Qualifiers

Research-article

Conference

CASCON '10

CASCON '10: Center for Advanced Studies on Collaborative Research

November 1 - 4, 2010

Ontario, Toronto, Canada

Acceptance Rates

Overall Acceptance Rate 24 of 90 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
338
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Big Data Processing Using Spark in Cloud

Processing Big Data with Azure HDInsight: Building Real-World Big Data Systems on Azure HDInsight Using the Hadoop Ecosystem

Managing and Processing Big Data in Cloud Computing