Session 20

Introduction to Big Data

Class Objective:

The goal here is to provide an overview of how data processes can be scaled with Spark.

Readings (To be done before class):

Create a DataBricks Community Edition Account
Gentle Introduction To Spark - Download ebook Review the Hadoop Ecosystem

In Class Exercises

On the DataBricks Platform you should execute both the Introduction to Apache Spark on Databricks and the Databricks for Data Scientists.

Concepts from these will be included in the final.

Click on DataBricks Import.

Notebooks.

01-intro-mapreduce.ipynb https://raw.githubusercontent.com/rpi-techfundamentals/spring2019-materials/master/10-big-data/01-intro-mapreduce.ipynb

02-intro-spark.ipynb https://raw.githubusercontent.com/rpi-techfundamentals/spring2019-materials/master/10-big-data/02-intro-spark.ipynb

Gentle Introduction to DataBricks https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/346304/2168141618055043/484361/latest.html

03-spark-questions.ipynb (Due 44 11:59 PM) https://raw.githubusercontent.com/rpi-techfundamentals/spring2019-materials/master/10-big-data/03-spark-questions.ipynb