Today, I discussed with my group members regarding the making of project report and the task which we have performed individually so that we can assemble our work in order of requirements and that will provide the complete guidance and analysis for our 2018 CDC Data Analysis.
As far in the last blog, I successfully done the Breusch-Pagan Test and the plot the residuals against the fitted values to find the heteroskedasticity in the linear regression model. In the scatter plot, one can see the residuals are spread out at higher values in the plot and this is the indication that the heteroskedasticity is present. To determine the significant evidence of heteroskedasticity if it is present or not, the statistical analysis could be use and to fulfil this I have done the Breusch-Pagan test.
After that, I have implemented the K-Fold Cross-Validation machine learning technique to assess the performance of the predictive model. In this process the model is trained and tested ‘K’ times (K=5), each time using a different fold as the test and the remaining four as the training set. This concludes that in each iteration, a different fold is used for testing while the other four folds are used for training.
In the end, to evaluate the performance of the linear regression model, I have calculated the R-squared value. Regression analysis frequently uses the R-squared metric, which measures how well the model fits the data. It shows how much of the variance in the dependent variable can be accounted for by the model’s independent variables. The value which I obtained in the analysis shows how well a regression model fits the data when the dataset is divided into five subsets (or folds).