Week 3 – Friday: Enhancing Predictive Model Accuracy:- A Deep Dive into 5-Fold Cross-Validation with CDC Data

In today’s blog, I worked on the K-fold Cross-Validation method and implemented in my analysis to the CDC data set. I have done the process on my same predictive model where I have chosen % Diabetic a dependent variable, % Obesity and % Inactivity as independent variable. With the data set of 354 rows, I have gone through the necessary libraries where I found to implement the cross-validation.

The analysis is performed on the linear regression model and to fulfil the value of ‘K’ I have taken 5 in my analysis. By this I am implementing 5-fold cross-validation in my analysis for the given data frames.

Later on, I have calculated the mean squared error (MSE), the average squared difference between the observed or actual values and the projected values in a regression issue is commonly measured using the mean squared error metric. It rates the overall effectiveness or precision of a predictive model.

Lastly, I performed the process of 5-fold cross-validation and in result I have received the value for Cross Validation MSE scores are 0.6616154, 0.49679872, 0.54263043, 0.6213607, 0.65976824 and the value for Mean MSE is 0.5964346993680371.

In my next blog, I will do more analysis on this process and will find the distribution of the 5 folds of the data sets.

Leave a Reply

Your email address will not be published. Required fields are marked *