This section describes the methodology used to create PSYTE™ US Geodemographic Data.
Data selection and processing
PSYTE™ US Geodemographic Data is developed using Precisely's GroundView and Property Attributes data products, along with data from US Census Bureau's decennial census (2020) Demographic and Housing Characteristics (DHC) data, and the latest available American Community Survey (ACS) data.
DHC data inputs offer a range of variables at the Census Block geography level; many ACS variables are available at the Block Group level, while some ACS variables are only published at the Census Tract level or higher. For ACS inputs, a variety of statistical methods are used to model select variables down to the Block level, using data from Precisely's existing products combined with ACS geodemographic and Public Use Microdata Sample (PUMS) from the Census Bureau, with adjustments implemented to control values to higher-level data checks. Selected data from Precisely's Property Attributes products are tabulated to Census Blocks.
Exploratory data analysis and data reduction processes are employed to aid in the selection of key variables that describe and differentiate American society. These key variables include, but are not limited to, life stage variables such as age and household composition, property variables (dwelling type, tenure, property value), socioeconomic variables (income, education, occupation), and population density.
Cluster methods
Prior to clustering, processed data is standardized and a series of processes run to detect and manage outliers. Exploratory k-means cluster analysis was carried out using different combinations of data, standardization methods, and sample sizes. Each iteration is analyzed to determine cluster suitability using statistical methods measuring intra-cluster similarity and inter-cluster variance, along with cluster population size, to test if each cluster represents a meaningful portion of the US population. Spatial analysis at the state and county level is conducted to ensure clusters are not geographically specific. These iterations are repeated numerous times in a "champion-challenger" process.
The ultimate cluster solution minimizes within-cluster variance and maximizes between-cluster variance, with each cluster representing a subset of the US population sharing similar attributes in terms of income, wealth, life stage, household composition, and property type. Detailed analysis of cluster-level statistics across hundreds of variables is performed to determine each cluster's key features, and to build the order and structure of the PSYTE™ US geodemographic segments.
The resulting segmentation system provides cluster code assignments to over 5 million of the approximately 8 million Census Blocks that cover the United States. A large majority of non-coded blocks represent unpopulated or sparsely populated blocks with insufficient data. Blocks composed primarily of institutional group quarters are not part of the cluster analysis and are therefore classified as not coded.
PSYTE™ US Geodemographic Data combines proven, rigorous, quantitative methods and experience-driven qualitative methods that optimize the interconnectivity of US demographics, socioeconomics, location, lifestyle, and property attributes.
Groups
Each segment is associated with one of 12 groups, with Group 01 representing segments with the highest income and wealth, and Group 12 representing segments with the lowest income and wealth. Clusters within each group are sequenced according to relative age/life stage, from younger to older clusters. A two-digit group code (01 to 12) is used as part of the cluster code identifier. Refer to the PSYTE™ US geodemographic segment labels section of this document for additional information.
Drill-down variables
Drill-down variables categorize Census Blocks across several demographic and socioeconomic themes, using a set of qualitative labels to contextualize the relative scoring of certain characteristics. This enables users to build smaller, case-specific sub-segments of the major PSYTE™ US geodemographic groups and segments, if desired. For example, a user wishing to identify the population locations of PSYTE™ US segment 03.4 (Professional Urbanites) most likely to own their property outright could do so using the drill-down variable dv_tenure, value OWN.
Refer to the PSYTE™ US drill-down variables section of this document for additional information.
Conclusion
PSYTE™ US Geodemographic Data goes beyond a standard census-based segmentation system, drawing on regularly refreshed data inputs such as Property Attributes and annually updated ACS data to evolve clusters annually, in line with shifting property and demographic characteristics.
Sixty-three clusters, grouped into 12 groups, provide users with a total solution for understanding the core traits of US consumer households. The quantitative and qualitative methodology used to create this product results in a data-driven and identifiable geodemographic segmentation system that increases business decision-making capacity by providing actionable, realistic market intelligence.