In today’s blog I will talk about my progress report about the CDC data analysis. I have successfully performed the correlation analysis between the three data frames: Diabetes, Obesity, and Inactivity. I found some impactful information after the analysis of the datasets by using a common factor, FIPS code in between them. I wrote a code to merge these three datasets and created a new excel sheet for more thorough analysis.
After doing so I found 354 rows of data that contain information on all 3 variables: % diabetes, % obesity, and % inactivity. For this, I wrote a code to calculate the mean, median, standard deviation, skewness, and kurtosis by using the libraries pandas, numpy, scripy.stats, and matplotlib. Indeed, I got the values for Mean and Standard Deviation and also Median, Skewness, and Kurtosis for the % Diabetic, % Obesity, and % Inactivity respectively.
Additionally, I want to compare the dataset for % Obesity and % Inactivity which can leads to the increase or decrease in number for % Diabetic in future. As per the CDC data we have, it stated the value only for the year 2018, after the analysis between the obesity and inactivity it can be predictable and determinable whether the number for %Diabetic will increase or decrease.
Moreover, I am going to analyse the % Obesity and % Inactivity with the % Diabetic data frame and will calculate the R-squared value.
Project MTH522