Project 2

Project 2 is Due 12-14 at 11:59 PM. This is an individual project, but like the homework you can ask for advice from others directly and in the Slack Channel.

Project Objective

The goal is to develop an understanding of how individuals approach data science projects, seeing the entire process from exploratory data analysis to modeling and evaluation. This will extend on the details learned in part 1 and focus on modeling.

Project Selection

Ideally you will continue the work you have done on Project 1. Please see me if you have a desire to switch.

Deliverables

Your goal is to develop a 3-4 page (1 inch margins, single spaced) report of some modeling of the Kaggle project. The goal is to systematically determine the role of different independent variables in prediction and to compare multiple different algorithms.

  1. Executive Summary This should be 1 page summary in your own words of the problem, data, and findings.
  2. Analysis of relevance of independent variables (features).
  3. Analysis of performance of different model types (different algorithms).
  4. Appendix. A github repository of all modeling, well commented code.

For the Appendix please use this as a submission. Please submit Jupyter Notebooks with detailed commenting of your here https://classroom.github.com/a/k1Zpj47C.

Project Evaluation Metrics.

The description below describes an ideal project. Projects will be evaluated subjectively by the instructor according to this rubric.

  • Formatting (10 points). The student presented the report in a format that indicated professionalism and care in the organization, writing, and presentation of the overall report.

  • Executive summary (20 points, 1 page). The student was able to present the results of modeling in a way that is rich and interesting as well. There is clear representation of key predictors and key algorithms used.

  • Analysis of relevance of independent variables (25 points. 1.5-2 pages). The student was able to clearly present justification of the value of different independent variables. Where possible, exploration of feature creation is provided.

  • Analysis of performance of different model types (25 points, 1.5-2 pages). There are outside resources consulted in the description of specific algorithms if relevant. Outside sources give clarity and there is evidence of some model tuning.

  • Commented Code (20 points, as needed). Clearly commented code has been provided in the assigned Jupyter notebook.

Project Submission

  • *The project is to be submitted to the LMS.

NOTE: If you copy and paste from the Kaggle description that is plagiarism and you will be reported to the Associate Dean’s office and receive a 0 on the project grade.