Course Home | Syllabus | Assignments | Schedule | Notes | Downloads | [print]
CS 4320: Machine Learning
Assignment: Support Vector Classification
Use the
heart_failure_clinical_records_dataset
data set at Kaggle. You’ll need to identify which features are categorical and
which are numerical.
Use hyper parameter search with cross validation to
create a decision tree classification model and a support vector classification model
to obtain the best F1
scores possible.
It is expected that you will use the Titantic hyper parameter search with cross validation decision tree source code as a starting point for your code development.
Create a report that includes the data exploration plots and analysis.
The report will also include for each type of model (decision tree and svc)
which hyper parameters were used in the search, the range or set of values
used for each hyper parameter, the hyper parameters selected, the
number of cross validation sets, the F1
cross-validation score obtained,
the training F1
score of the model when trained on all training data,
and finally, the F1
score of the model on the testing data.
Include a comparison of the cross validation and full training F1
scores
between the two models, and which model you would select, based only on those
scores. Finally, discuss whether the F1
scores on the testing data support
your decision or not.
Required Steps
- Download your data.
- Explore and analyze your data.
- Split the data 80%/20%, for training/testing.
- Write (or modify) a Python program using sklearn to process and the training data and fit classification models with a hyper parameter search and cross validation.
- AFTER finding your best fit models, measure each model’s
F1
score on your test data. - Create a report with the contents mentioned above.
- Commit and push your code in the git repository.
- Submit the report (as PDF) to Canvas.
Last Updated 01/16/2023