Precisely CrimeIndex (US) is a block group level product produced by Precisely data scientists. Context CrimeIndex is created by applying Precisely CrimeIndex (US) block group unit data to other relevant Precisely Boundaries, through aggregation, summarization and a weighted percentage of block group overlap to assign accurate values to greater geographies.
The most robust, publicly available crime dataset in the US is the annual Uniform Crime Reporting Program (UCR) data produced by the FBI. The FBI’s Uniform Crime Reporting Program is a nationwide, cooperative statistical effort of more than 18,000 city, university and college, county, state, tribal, and federal law enforcement agencies voluntarily reporting data on crimes brought to their attention. The UCR administrative records are the most nationally representative, small area crime data that are publicly available. Some limitations of the UCR data may include a coverage bias in both the quantity of agencies reporting, and the number of reported crimes.
Precisely data scientists combined the UCR, incident data, and Precisely Location Intelligence data and employed a multi-level statistical model to estimate crime rates by crime type at the block group unit of analysis.
First, the UCR data were extracted from the FBI website and the incidents data were sourced from multiple agencies. Second, the names associated with the UCR crime reporting entities were matched to Precisely’s inventory of geographic data using exact and probabilistic matching techniques. Disparate incident datasets were analyzed and combined, and incident descriptions matched to UCR crime types using multiple string-matching techniques. Once matched, incident data was aggregated to block group level. Outlier analysis of both UCR and incident datasets was carried out to detect and remove erroneous data.
After these data were linked to Precisely’s Location Intelligence data and erroneous data were removed, crime rates per capita were calculated. Next, statistical techniques were used to impute crime rates for a small number of areas of the USA where UCR crime statistics were unavailable.
The prepared data were next combined with Precisely proprietary datasets. These datasets were used to understand and predict the relationship between crime (using UCR aggregate data and incident data) and geodemographic location data. A series of regression models were developed to determine the most relevant features for different crime types and different geographies. These models were then deployed at the block group level to predict where crime rates are likely to be higher or lower.
No Census-based race, ethnicity, gender, or language datasets were used to develop the Precisely CrimeIndex (US) or Context CrimeIndex products.
The final estimates were derived by combining the macro level UCR crime statistics and the regression models built with both the UCR and incident data. The final composite crime score reflects the linear combination of state specific crime distributions by crime type data from the UCR data and the combined macro-and-regression models. The qualitative categories were derived from the final quantitative scores using the percentile distribution above and below the national average to acknowledge the non-normal distribution of crime rates. State-wise indices have also been calculated for the user to perform within-state comparison of block groups.