Week 2 – Friday: Analysing Heteroskedasticity and Predictive Linear Models for Diabetes, Obesity, and Inactivity

As far in the blog, I have made two predictive linear models as %Diabetic-%Obesity and %Diabetic-%Inactivity and done the analysis by plotting them on histogram.

Now, I have performed the regression analysis and the Breusch-Pagan test for the updated excel sheet in which we can analyse the statistical test for our data frames. To determine whether heteroskedasticity exists in a regression model, the Breusch-Pagan test is a statistical test used in econometrics and regression analysis. When the residuals (also known as variance of the errors) in a regression model is not consistent across all levels of the independent variables, this is referred to as heteroskedasticity. In plainer language, it signifies that when the values of the independent variables vary, so does the dispersion of residuals.

With the same concept by using matplotlib library I used to plot the residuals on a scatter plot. By analysing the plot, we can conclude that for all the variables %Diabetic, %Obesity, and %Inactivity there are more comprehensive data at some point where we can predict the values for an independent variable.

Moreover, I wrote a code to find out the value for Breusch-Pagan test and the p-value. Basically, the Breusch-Pagan test can be used to determine whether a regression model contains heteroskedasticity. This is accomplished by contrasting the alternative hypothesis, heteroskedasticity, with the null hypothesis, homoskedasticity, that the variance of the residuals is constant. As we can see in the plot which shows the residuals against the fitted values.

Furthermore, I will emphasize the dataset with respect to the geographic regions and also, I am going to gather some more information for the CDC data in 2018 in which the obesity and inactivity will become the cause for diabetic.Project MTH522

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *