Hello everyone, From past few days I was working on a project which is based on the data mining for the personal loan acceptance in the field of banking sector. The data which I have collected is of the Thera Bank and the data shows the major attributes such as age, income category, mortgage, securities Read More…
BOSTON’S ECONOMIC DIARY Tracing the Contours of Growth and Challenge: A Detailed Account from 2013-2019
Title: BOSTON’S ECONOMIC DIARY: Tracing the Contours of Growth and Challenge (2013-2019) In “Boston’s Economic Diary,” we embark on a comprehensive journey through the economic heartbeat of Boston from 2013 to 2019. This detailed study utilizes a spectrum of data to paint a vivid picture of the city’s economic growth and challenges during this period. Read More…
Week 13 Friday – Building the AR+I+MA Model in the Time Series
An essential component of the analysis is applying the Moving Average (MA) and Autoregressive (AR) models to the time series. Autoregressive (AR) and moving average (MA) models are necessary for time series analysis because they provide a framework for understanding and forecasting temporal trends in data. Using the AR model primarily involves using the modeling Read More…
Week 13 Wednesday – Making the Time Series Stationary by implementing Differencing Technique
Moving forward in analysis, after performing the transformation and used the differencing technique by using the lags or shit to make the time series stationary. Below the first plot shows the original time series data to get the plot like that firstly, I differenced the data with a lag of 1 but I didn’t achieve Read More…
Week 13 Monday – Analysis and Visualization on ACF and PACF to build the pre-model for AR and MA
Working on time series analysis in the economy indicators dataset, I probably uncovered the patterns of Logan Passengers and Logan International Flights variables. Till now in my analysis I have successfully done the Autocorrelation Function (ACF) and Partial Autocorrelation (PACF). Autocorrelation Function (ACF): The correlation between a time series and its lagged values, or earlier data, Read More…
An Updated report for: BENEATH THE BADGE An Insightful Exploration of Police Shooting Incidents in the USA
BENEATH THE BADGE An Insightful Exploration of Police Shooting Incidents in the USA Below is an updated report for the Police Shooting incident happened in the USA as per The Washington Post.
Week 12 Friday – Deep down into the Time Series Analysis and Unveil the patters for the Logan International Flights
As far as in the blog, I have given the explanation and an overview of the Time Series Analysis. Now I am going to perform the time series on the Logan Passengers and Logan International Flights variables and try to build the predictive models and will find out forecasting with respect to the analysis. The Read More…
Week 12 Wednesday – An Overview of Time Series Analysis
In this blog, I will talk about an overview of the time series analysis and their objective, how to analyze the time series, and what are the limitations of it. What is Time Series Analysis? One particular method of examining a collection of data points gathered over time is time series analysis. Instead than only Read More…
Week 12 Monday – Time Series approach for the analysis and predictive model
Moving deep down into the time series analysis, I have gone through the different topics such as autocorrelation, forecasting, and cyclical analysis. Also, while thinking about the regression analysis, I came to a decision that conventional linear regression may not be the best option for time series research, particularly if the data shows seasonality, trends, Read More…
Week 11 Monday – Unveiling the Economy Indicator data in the Boston City
In this blog, I am ready to put my analysis on the Economy Indicator analysis of dataset for the Boston city as per the data provided by the Analyze Boston. The dataset in which I am searching for to get the variables in such a way that I can apply the time series analysis accordingly Read More…
Week 10 Friday – Unraveling the Topics on the Analyze Boston website
As far as in the blog, I am exploring the website Analyze Boston which is the data hub for the different topics. I’ve made the decision to look at a few environmental data sets while I consider mother nature. There, I discovered nine datasets, including “Clough House Archaeology,” “Blue Bike System Data,” “Building Energy Reporting,” Read More…
Week 10 Wednesday – Exploration of the Analyze Boston Data hub and Time Series Analysis
I am currently in the process of choosing the data, trying to choose based on my desired outcomes. The last time, though, I considered choosing the economic route, but there is a drawback: there are fewer data points. I checked the “approved building permits” in this dataset today in an effort to verify that construction Read More…
Week 10 Monday – Hovering on “Analyze Boston” Open Data Hub
I followed the website for Analyze Boston where it gives the information about the data hub of the Boston city between January 2013 and December 2019 from the Boston Planning and Development Authority (BPDA). Examine Boston is the open data hub for the City of Boston, where you can obtain statistics, data, and maps about Read More…
BENEATH THE BADGE: An Insightful Exploration of Police Shooting Incidents in the USA
In “Beneath the Badge,” we delve deep into the critical and often contentious subject of police shooting incidents in the United States. This comprehensive study leverages detailed data analysis to uncover the underlying patterns and key insights into these incidents. Key Highlights of the Analysis: Demographic Trends: A closer look at the age profiles of Read More…
Week 9 Friday – Clustering Analysis Across Variables and Exploring City-Based Encounters
In toady’s blog, I tried to analyse the variables as Race, Armed, City, and State. Firstly, I applied the clustering technique to find out the accurate datapoints to make the clusters easy to understand. I grouped the data in 3 clusters by assessing the total shootings happened in the different state and city, and while Read More…
Week 9 Wednesday – Unveiling Patterns: Clustering Analysis of Police Encounters by Officer’s ID
In today’s blog I will explain my work on the second dataset in which I applied the clustering technique to find out the clusters in the data as per the police officer’s and the encounters which has been done by them. I grouped the data in K clusters by assessing the ID of the cops Read More…
Week 9 Monday – Visualizing Police Encounters: Correlation Analysis, Heat Maps, and Clusters Based on Officer IDs
Before moving forward in the blog, I showed the the killings by the police on the basis of gender in my second last blog from this date. Later on, I worked on in creating a heat map to show the remaining variables and the ANOVA in between them and correlation between the two datasets. While Read More…
Week 8 Friday – Uncovering Patterns and Relationships between the Variables in the Police shooting Dataset
As far in the blog, I performed the K-means clustering for the first dataset. In this blog I have performed the heat-map technique to find out the correlation between the variables as age, body camera, and sign of mental illness. To perform the heat-map technique I used the correlation method of assigning the annotation between Read More…
Week 8 Wednesday – Unveiling Gender Disparities in Police-Involved Shootings
In today’s blog, I have performed the K-means clustering for both the given data set. While performing the technique, I got stuck in the first dataset where I need to find out the correlation between the variance for gender, race, city, and state and that too of the police departments where we can link the Read More…
Week 8 Monday – How T-Tests, ANOVA, and Bayes’ Theorem Uncover Statistical Insights
In today’s blog I would mainly talk about the topics which is related to the uncovering the data interpretations for the police shooting in the different states of the United States based on the variables in the datasets such as T-Test, analysis of the variance (ANOVA), and Bayes’ Theorem. t-test – The means of two Read More…
Week 7 Friday – How Data Analysis Unveils Insights: Correlations of Variables and Clustering Techniques
As far in the blog, I talked about the analysis of the correlations between variables in the data and the shooting happened in the states depends upon the factors and the age group of the people. Later I worked on the Monte Carlo approximation technique as we can analyse the data to the approximate numerical Read More…
Week 7 Monday – A Deep Dive into Monte Carlo Estimation and K-Means Clustering
Till now in my project I have done the analysis of the different variables in the dataset and formally doing the Monte Carlo Approximation method to check the estimation for the people killed in the different age group and later we can define the race also by the analysis. Moreover in the project, I have Read More…
Week 6 Friday – Unraveling Patterns in Police Shootings: Exploring Data Clustering Techniques
In this blog, I mainly talk about the clustering in the analysis of the data. Clustering is a technique, in the field of machine learning that aims to group data points that share similarities. There exist approaches to clustering each possessing characteristics and applications. I followed the data to find out the pattern of the Read More…
Week 6 Wednesday – Unveiling Disparities: Analysing Racial Police Shootings on the Eastern Coast Using Monte Carlo Estimation
As moving forward in the blog, I assumed that the analysis for the Police Shooting as per the Washington Post is rely on the killing of people with the armed or unarmed and the race between the black and white. Moving forward in the project I am working on the analysis of the race based Read More…
Week 6 Monday – Uncovering Patterns in Police Shooting Data: Exploring Geospatial Analysis and Clustering Techniques
Today I have gone through the topics by which I can do my analysis for the Police shooting data. Moreover, I followed the pdf file in which the geolocations have been given to locate the killings in the United States with the help of latitude and longitude. Geo-positions, also known as geographic coordinates, are a Read More…
Week 5 Friday – Connecting the Dots: Understanding the Correlation Between Variables in Police Shootings Data Sets
In this blog I mainly talk about the different variables and the correlation between them in the given dataset by the Washington Post. There are two datasets which are giving the information about the fatal police shootings. The second data set is providing the information of the cops which of them are named for an Read More…
Week 5 Wednesday – The Fatal Police Shootings in the United States: A Glimpse over the data analysis
As in the first blog of project 2, I would like to emphasise on the datasets which is based on the police shootings published by The Washington Post. On an average, it is clearly mentioned that police in the United States shoot and kill more than 1000 people every year. As far as I looked Read More…
Final Report on 2018 CDC Dataset for Diabetic, Obesity, and Inactivity as per U.S. Counties/FIPS
In this blog, I am concluding all my project work along with my team members which we had done in four weeks. I am excited to share my final report and the contributions of an individuals. This project is titled as: “UNLOCKING PUBLIC HEALTH : An Analysis of the CDC Data on Diabetes, Obesity and Read More…
Week 4 – Friday: Analysing Predictive Models and Uncovering Insights through Regression Analysis
As of now in my blog, I have successfully created a predictive model by using Linear Regression Model in the analysis and found the accuracy of the model that well the model fits the data. Today I thought to analyse my predictive model by exchanging the variables where I can assume and persist that my Read More…
Week 4 – Wednesday: Glimpse of my project analysis and in the process of making report.
Today, I discussed with my group members regarding the making of project report and the task which we have performed individually so that we can assemble our work in order of requirements and that will provide the complete guidance and analysis for our 2018 CDC Data Analysis. As far in the last blog, I successfully Read More…
Week 4 – Monday: Assessing the Performance of Linear Regression Model & Uncovering the Power of R-squared and Training Error by using 5-Fold Cross-Validations
In my last blog, I successfully implemented the 5-Fold Cross-Validations and get the scores in the linear regression model which I have considered in my analysis. In this blog, I have computed the training error to fit the entire data set in the model and plot the training data on the scatter plot. To check Read More…
Week 3 – Friday: Enhancing Predictive Model Accuracy:- A Deep Dive into 5-Fold Cross-Validation with CDC Data
In today’s blog, I worked on the K-fold Cross-Validation method and implemented in my analysis to the CDC data set. I have done the process on my same predictive model where I have chosen % Diabetic a dependent variable, % Obesity and % Inactivity as independent variable. With the data set of 354 rows, I Read More…
Week 3 – Wednesday: Assessing Model Performance: Cross-Validation, Training Error, and Test Error in CDC Data Analysis
As far in the project progress, I have done with the Breusch-Pagan test and sorted that there is no significant evidence of heteroskedasticity in the data analysis. In today’s blog, I will talk about the Cross-Validation, the Training error, and Test error. An essential tool in data analysis and machine learning, cross-validation is used to Read More…
Week 3 – Monday: Assessing Heteroskedasticity in Regression Analysis for the CDC dataset
This blog mainly provides the information about the heteroskedasticity in the statistical analysis. While analysing the data with three variables % Diabetic, % Obesity, and % Inactivity on the basis of the common factor i.e. FIPS. I came to conclusion that are significant evidence of heteroskedasticity is detected in the statistical analysis of the given data Read More…
Week 2 – Friday: Analysing Heteroskedasticity and Predictive Linear Models for Diabetes, Obesity, and Inactivity
As far in the blog, I have made two predictive linear models as %Diabetic-%Obesity and %Diabetic-%Inactivity and done the analysis by plotting them on histogram. Now, I have performed the regression analysis and the Breusch-Pagan test for the updated excel sheet in which we can analyse the statistical test for our data frames. To determine Read More…
Week 2 – Wednesday: Exploring the link between %Diabetic – %Inactivity and %Diabetic – %Obesity
For today’s analysis, I have compared the two variables i.e. % Inactivity and % Obesity to the third variable % Diabetic. I have successfully plotted the smooth histogram for the predictive models (% Diabetic to % Inactivity and % Diabetic to % Obesity). The data shows the variation in between both the plots and there Read More…
Week 2 – Monday: New excel sheet with the common factor FIPS and comparing %Obese and %Inactive by scatter plot
In today’s blog I will talk about my progress report about the CDC data analysis. I have successfully performed the correlation analysis between the three data frames: Diabetes, Obesity, and Inactivity. I found some impactful information after the analysis of the datasets by using a common factor, FIPS code in between them. I wrote a Read More…
Week 1 – Friday
Today’s work was all about the analysis of the %Obesity and % Inactivity of CDC datasets. I have gone through the same concepts for understanding the common factors related to the linear regression and the behaviour of the plot for both the datasets obesity and inactivity. Similarly, I have written a code to find out Read More…
Week 1 – Wednesday
As we discussed the CDC datasets into the class today, my work towards the project has come to the analysis of the data from the different counties in U.S. and the correlation between the %diabetes and %inactivity. Moreover, for the further analysis of the data and to understand the residual and scatter plot I have Read More…
Week 1 Monday
In the today’s class I have learned the new topics related to the linear regression and some of the concepts which is related to the Centers for Disease Control and Prevention (CDC) data sets such as skewness, kurtosis, and Heteroskedasticity. Though, linear regression is an important tool in data analysis and statistics which can used Read More…