Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2939672.2945398acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
tutorial

Scalable Data Analytics Using R: Single Machines to Hadoop Spark Clusters

Published: 13 August 2016 Publication History

Abstract

R is one of the most popular languages in the data science, statistical and machine learning (ML) community. However, when it comes to scalable data analysis and ML using R, many data scientists are blocked or hindered by (a) its limitations of available functions to handle large datasets efficiently, and (b) knowledge about the appropriate computing environments to scale R scripts from desktop exploratory analysis to elastic and distributed cloud services. In this tutorial we will discuss solutions that demonstrate the use of distributed compute environments and end to end solutions for R. We will present the topics through presentations and worked-out examples with sample code. In addition, we will provide a public code repository that attendees will be able to access and adapt to their own practice. We believe this tutorial will be of strong interest to a large and growing community of data scientists and developers using R for data analysis and modeling.

Supplementary Material

Part 1 of 2 (kdd2016_tutorial_scalable_r_on_spark_01-acm.mp4)
Part 2 of 2 (kdd2016_tutorial_scalable_r_on_spark_02-acm.mp4)

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Check for updates

Author Tags

  1. HADOOP
  2. R
  3. SQL
  4. advanced analytics
  5. distributed systems
  6. hierarchical time series
  7. learning curves
  8. machine learning
  9. parallel computing
  10. predictive analytics
  11. scalability
  12. spark
  13. statistical computing
  14. statistical modeling
  15. visualization
  16. yarn

Qualifiers

  • Tutorial

Conference

KDD '16
Sponsor:

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 365
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media