Exploratory Analysis
Crime Occurrence vs. Rainy Weather
The above plot shows the number of different crime types happen in either rainy or non-rainy weather. And from the histogram, it is clear that overall, not rainy weather has higher crime occurrences than that in not rainy weather. Further analysis on how rainy weather would affect different crime types will be in the hypothesis testing section of the predictive analysis.
Different Crime Types vs Moonphase
This plot contains the occurrences of different crime type during different moon phases. Crime types such as Assault, Burglary, Drug, Robbery, Sexual and Theft can be roughly visualized the most occurrence during New Moon. Death crime happens the most during Last Quarter. And Fraud happens the most during Waxing Crescent.
2017 US City Crime Counts vs. Education
The geographic map above shows the crime occurrence in one of the major cities of interests (San Francisco) in 2017. The red color represents the locations where the crime happened the most frequent, and the lighter color represents the locations where crime happens less frequent. In most of the cases, the red areas are the downtown area for one city. The labels on the map are the rankings for education level per zip code. An interesting pattern is that in most downtown areas, the education level tends to be lower compared to other areas. Hypothesis testing will be made in order to check if there is statistical evidence to support such pattern.
Different Crime Counts vs Day In A Week
The above histograms are the total crime type counts vs. day in a week. Interesting patterns can be found in this plot. For instance, there seem to have a correlation between drug and sexual crime, as well as between burglary and theft. Further statistical test will be conducted in the additional analysis section. In addition, for certain crime type like assault and death, weekends tend to have higher crime occurrence, while for fraud weekdays seem to have higher occurrence. Furthermore, for drug crime, Wednesday seems to have the most crime occurrence, further analysis about day and crime types will also be conducted in the additional analysis section.
Scatter plot of Crime types
The above scatter plot shows the relationship between different crime types. Among the 8 crime types, there seems to exist some crime types that have a positive correlation between each other. For example, Assault and Burglary, Assault and Drug, Drug and Fraud.
There are several attributes showing correlation to some extent, for example, Burglary and Theft has a correlation of 0.34, Robbery and Theft has a correlation of 0.39, and Theft and Assault have a correlation of 0.47. Other correlations are relatively close to zero and thus are less interested.
Cluster Analysis
K-means: Moon Illumination vs. Assault
The x-axis is the mean Assault occurrence everyday, and the y-axis is the Moon Illumination. There are two clusters in the plot, one with a higher level of moon illumination, and one with a relatively lower moon illumination. The assault occurrence range of most points are between 0.008 to 0.02.
The score of k-means method is 0.625.
DBScan: Robbery vs. School Rate
From the centroid of two estimated clusters, the information is quite clear. That is, the higher the School Rate is, the lower the Robbery Occurrence is. Also, the cluster on the right is much more dense than that on the left, which indicates that the group on the right represents common situations, and the group on the left represents some unusual condition(where in some areas the robbery rate is extremely low). The School Rate ranges of two groups do not have much difference.
The score of DBScan is 0.840 which is relatively high.
Hierarchical: Drug & Max Temperature
The x-axis is Drug crime, and the y-axis is the Max Temperature. There are roughly 6 small clusters and they can be further grouped into two bigger clusters, where each bigger cluster has three subgroups.