Data limitation
In this project, there are more than 10 big cities in the US have been considered for crime analysis. Unlike demographics and transportation data, most of the features used for predictive modeling in this project are utilized in this project are not city-related and policy-related. In this way, this project has limited ability to be a reference for policy-making department to see which policy promotes the crime or improves safety. In this case, other datasets are related to crime ,such as demographics data, transportation data , are supposed to be utilized in future analysis.
Model limitation
From the predictive modeling, there is lack of analysis for investigating the relationship between these features , in other words, the correlation between each features should be considered in future for improving the accuracy. Furthermore, the problem of imbalanced labels for output feature has not been solved in this project, in this case, the method for dealing with imbalanced labels is supposed to apply for avoiding overfitting.
Moreover, the spatial temporal pattern analysis was performed only for LA city, for both of location and time are extremely crucial for crime , the spatial temporal analysis need to be extend to various cities. Also, some other factors distributed by space and location are supposed to me considered for spatial temporal analysis.
For the relationship between education/health and crime, it is hard to conclude an overall pattern which could perfectly fit for all cities. Since each city may have its own characteristics, and in order to come up with a proper model, all factors that could make an effect should be taken into consideration. But, at the same time it is hard to consider and unify all the factors. Thus, a more comprehensive method should be considered.