tutorial

Scalable Data Analytics Using R: Single Machines to Hadoop Spark Clusters

Authors:

Mengyue ZhaoAuthors Info & Claims

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Page 2115

https://doi.org/10.1145/2939672.2945398

Published: 13 August 2016 Publication History

Get Access

Abstract

R is one of the most popular languages in the data science, statistical and machine learning (ML) community. However, when it comes to scalable data analysis and ML using R, many data scientists are blocked or hindered by (a) its limitations of available functions to handle large datasets efficiently, and (b) knowledge about the appropriate computing environments to scale R scripts from desktop exploratory analysis to elastic and distributed cloud services. In this tutorial we will discuss solutions that demonstrate the use of distributed compute environments and end to end solutions for R. We will present the topics through presentations and worked-out examples with sample code. In addition, we will provide a public code repository that attendees will be able to access and adapt to their own practice. We believe this tutorial will be of strong interest to a large and growing community of data scientists and developers using R for data analysis and modeling.

Supplementary Material

Part 1 of 2 (kdd2016_tutorial_scalable_r_on_spark_01-acm.mp4)

Download
1248.45 MB

Part 2 of 2 (kdd2016_tutorial_scalable_r_on_spark_02-acm.mp4)

Download
1429.49 MB

Index Terms

Scalable Data Analytics Using R: Single Machines to Hadoop Spark Clusters

Recommendations

Educational data mining with Python and Apache spark: a hands-on tutorial
LAK '16: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge

Enormous amount of educational data has been accumulated through Massive Open Online Courses (MOOCs), as well as commercial and non-commercial learning platforms. This is in addition to the educational data released by US government since 2012 to ...
Scale-out beyond map-reduce
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

The amount and variety of data being collected in the enterprise is growing at a staggering pace. The default now is to capture and store any and all data, in anticipation of potential future strategic value, and vast amounts of data are being generated ...
Big Data Analysis with Interactive Visualization using R packages
BigDataScience '14: Proceedings of the 2014 International Conference on Big Data Science and Computing

Compared to the traditional data storing, processing, analyzing and visualization which have been performed, Big data requires evolutionary technologies of massive data processing on distributed and parallel systems, such as Hadoop system. Big data ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2016

2176 pages

ISBN:9781450342322

DOI:10.1145/2939672

General Chairs:
Balaji Krishnapuram
IBM
,
Mohak Shah
Bosch
,
Program Chairs:
Alex Smola
Amazon
,
Charu Aggarwal
IBM
,
Dou Shen
Baidu
,
Rajeev Rastogi
Amazon

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Check for updates

Author Tags

Qualifiers

Tutorial

Conference

KDD '16

Sponsor:

KDD '16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 13 - 17, 2016

California, San Francisco, USA

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
365
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Educational data mining with Python and Apache spark: a hands-on tutorial

Scale-out beyond map-reduce

Big Data Analysis with Interactive Visualization using R packages