RPI Analytics Dojo > mgmt6560-fa17-Tech Fundamentals > Schedule > Class 12

Class 12

Introduction to Big Data

Class Objective:

The goal here is to provide an overview of how data processes can be scaled with Spark.

Readings (To be done before class):

Create a DataBricks Community Edition Account
Gentle Introduction To Spark - Download ebook Review the Hadoop Ecosystem

In Class Activities

01-intro-mapreduce.ipynb https://raw.githubusercontent.com/jkuruzovich/techfundamentals-fall2017-materials/master/classes/12-big-data/01-intro-mapreduce.ipynb

02-intro-spark.ipynb https://raw.githubusercontent.com/jkuruzovich/techfundamentals-fall2017-materials/master/classes/12-big-data/02-intro-spark.ipynb

Python Gentle Introduction. https://docs.databricks.com/_static/notebooks/gentle-introduction-to-apache-spark.html

Apache Spark on Databricks for Data Engineers https://docs.databricks.com/_static/notebooks/databricks-for-data-engineers.html

Word Count This can help. Don’t need to install library through. Setup Instructions Spark https://raw.githubusercontent.com/jkuruzovich/techfundamentals-fall2017-materials/master/classes/12-big-data/03-spark-questions.ipynb