Web data processing on the cloud
Pages 356 - 358
Abstract
Cloud computing is emerging as a highly scalable, fault-tolerant, and cost-effective way to process large amounts of information on the Web. Thanks in part to new data processing paradigms designed with the Cloud in mind (such as MapReduce[1], HDFS[2], Cassandra[3], etc), it is quickly gaining acceptance as a viable platform for organizations that need to store, process, and publish large amounts of data. MapReduce is attractive for processing data on the Cloud, to a large extent, because of its simplicity and flexibility. Implementations of MapReduce usually include a simple API used to describe which part of the processing is done in parallel (Map phase), and which part of the processing is done after grouping data on a single machine (Reduce phase). It does not rely on a pre-existing data model, making it possible to process any kind of information independently of the model. Cloud applications have notably been used to perform off-line analytical processing such as analyzing web request logs, computing user recommendations, and understanding scenes in images [4].
References
[1]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Pages 137--150. OSDI'04.
[2]
Hadoop Distributed File System. http://hadoop.apache.org/hdfs/. Accessed August 13, 2010.
[3]
A. Lakshman and P. Malik. Cassandra - A Decentralized Structured Storage System. LADIS'09.
[4]
S. Chen and S. W. Schlosser. Map-Reduce Meets Wider Varieties of Applications. Intel Research Pittsburgh Tech Report, IRP-TR-08-05, May, 2008.
[5]
Jaql. http://code.google.com/p/jaql/. Accessed August 13, 2010.
[6]
MongoDB. http://www.mongodb.org/. Accessed August 13, 2010.
[7]
Xadoop. http://www.xadoop.org/. Accessed August 13, 2010.
[8]
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. Pages 1099-1110. SIGMOD '08.
[9]
A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Pages 922--933. PVLDB'09. http://db.cs.yale.edu/hadoopdb/hadoopdb.html. Accessed August 13, 2010.
[10]
S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla and P. J. Haas, and J. McPherson. Ricardo: integrating R and Hadoop. Pages 987--998. SIGMOD '10.
[11]
F. Ableson. Using XML and JSON with Android, Part 1: Explore the benefits of JSON and XML in Android applications. http://www.ibm.com/developerworks/xml/library/x-andbene1/. Accessed August 12, 2010.
[12]
T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. MapReduce Online. NSDI'10.
Index Terms
- Web data processing on the cloud
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
November 2010
482 pages
- Conference Chairs:
- Joanna Ng,
- Christian Couturier,
- Editors:
- Hausi A. Müller,
- Arthur Ryman,
- Anatol W. Kark,
- Program Chairs:
- Hausi A. Müller,
- Arthur Ryman
Publisher
IBM Corp.
United States
Publication History
Published: 01 November 2010
Qualifiers
- Research-article
Conference
CASCON '10
CASCON '10: Center for Advanced Studies on Collaborative Research
November 1 - 4, 2010
Ontario, Toronto, Canada
Acceptance Rates
Overall Acceptance Rate 24 of 90 submissions, 27%
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 338Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Reflects downloads up to 18 Nov 2024
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in