The goal here is to provide an overview of how data processes can be scaled with Spark.
Create a DataBricks Community Edition Account
Gentle Introduction To Spark - Download ebook
Review the Hadoop Ecosystem
01-intro-mapreduce.ipynb https://raw.githubusercontent.com/jkuruzovich/techfundamentals-fall2017-materials/master/classes/12-big-data/01-intro-mapreduce.ipynb
02-intro-spark.ipynb https://raw.githubusercontent.com/jkuruzovich/techfundamentals-fall2017-materials/master/classes/12-big-data/02-intro-spark.ipynb
Python Gentle Introduction. https://docs.databricks.com/_static/notebooks/gentle-introduction-to-apache-spark.html
Apache Spark on Databricks for Data Engineers https://docs.databricks.com/_static/notebooks/databricks-for-data-engineers.html
Word Count This can help. Don’t need to install library through. Setup Instructions Spark https://raw.githubusercontent.com/jkuruzovich/techfundamentals-fall2017-materials/master/classes/12-big-data/03-spark-questions.ipynb