The goal is to develop an understanding of how individuals approach data science projects, seeing the entire process from exploratory data analysis to modeling and evaluation.
Ongoing.
If you feel there is another Kaggle contest you would like to do, just ask in the #projects slack channel. You should make sure that there are existing solutions (kernels) and make sure that there is business relevance. You should avoid image based data or projects that only require visualizations.
NOTE: If you copy and paste from the Kaggle description that is plagiarism and you will be reported to the Associate Dean’s office and receive a 0 on the project grade.
The description below describes an ideal project. Projects will be evaluated subjectively by the instructor according to this rubric.
Formatting (10 points). The student presented the report in a format that indicated professionalism and care in the organization, writing, and presentation of the overall report.
Executive summary (20 points, 1 page). The student was able to present the results of modeling in a way that is rich and interesting as well. There is clear representation of key predictors and key algorithms used.
Data description and initial processing (40 points. 3 pages). The student was able to clearly present an overall picture of the data using techniques presented in the class. This includes basic structure field by field descriptions as well as visualization and basic statistics. Where necessary they have adequately used techniques for cleaning the data or generating new features.
Modeling and Evaluation(30 points, 2 pages). There is a clear insightful comparison of approaches, and he predictive characteristics of the different models are clearly compared in a table with appropriate conclusions. There are outside resources consulted in the description of specific algorithms if relevant.
Analysis of relevance of independent variables (25 points. 1.5-2 pages). The student was able to clearly present justification of the value of different independent variables. Where possible, exploration of feature creation is provided.
Analysis of performance of different model types (25 points, 1.5-2 pages). There are outside resources consulted in the description of specific algorithms if relevant. Outside sources give clarity and there is evidence of some model tuning.
Commented Code (20 points, as needed). Clearly commented code has been provided in the assigned Jupyter notebook.
NOTE: If you copy and paste from the Kaggle description that is plagiarism and you will be reported to the Associate Dean’s office and receive a 0 on the project grade.