This section describes the methodology used to create PSYTE™ HD Canada geodemographic data.
Introduction
PSYTE™ HD geodemographic data is a small-area, data-driven geodemographic segmentation system for Canada. It was developed to empower users with the ability to quickly and effectively identify, understand, and target the unique characteristics of customers, custom trade areas, markets, and neighbourhoods.
PSYTE™ HD geodemographic data categorizes the demographic, economic, geographic, and household characteristics of Canadian society into 56 clusters within 12 major groups. PSYTE™ HD geodemographic data was originally developed for all Census 2011 Dissemination Areas (DAs). DAs are the smallest area for which robust census data is published – a de facto neighbourhood base. In 2018, clusters were translated from the 2011 DA geographic grid to the 2016 DA grid. They were then validated and updated with Census 2016 DA-level information (for 56,000 DAs), resulting in the dissolution of six clusters and the creation of six new clusters. PSYTE™ HD geodemographic data is refined at the 6-digit postal code unit of analysis (approximately 850,000+ postal codes). Each postal code is associated with one representative DA. A postal code may have a different cluster assignment than its parent DA.
Data selection
PSYTE™ HD geodemographic data was initially developed using input data from Statistics Canada's 2011 Census and National Household Survey (NHS; a national self-reported household survey), IHS Global Canada vehicle data, and Precisely's annual demographic data products. These inputs were combined in the 2014 version of PSYTE™ HD, standardized, and analyzed to determine key variables that describe and differentiate Canadian society. Key variables include, but are not limited to:
- Life-stage variables (age, marital status, etc.)
- Housing variables (dwelling type, tenure, shelter costs, etc.)
- Socioeconomic variables (income, wealth, home value, etc.)
Variables were statistically combined and weighted, where appropriate, to optimize cluster formation.
The Canadian Census is conducted every five years, so the Census 2016 data series was used to update clusters in 2018. The Census 2016 program consisted of both short- and long-form questionnaires, covering topics such as age, family structure, as well as a range of social and economic information (immigration, citizenship, ethnic origin, visible minorities education, and occupation). The mandatory long-form questionnaire was reintroduced in 2016. Also, for the first time in 2016, income data was gathered by the census program solely from administrative data sources, specifically Canada Revenue Agency's tax and benefit records. This data was used to verify the continued validity of most 2014 PSYTE™ HD geodemographic segments and to identify clusters that were no longer represented in a statistically significant way. Six new clusters were built to replace the dissolved clusters from the former PSYTE™ HD geodemographic segmentation system.
Cluster methods
Geodemographic segmentation systems are developed using quantitative and qualitative methods and techniques. The initial exploratory data analysis, data mining, and data reduction process involved computing and analyzing principal components analysis (PCA), correlation analysis, and factor analysis. Before clusters were created, input data were partitioned from the vantage point of how to identify key drivers and metrics of differentiation, or the comprehensive set of variables that explain the most variance in Canadian society. This allowed for clusters that were mutually exclusive and distinct. This data process was undertaken by an expert team of demographers, economists, statisticians, geographers, and data strategy consultants with extensive Census data and geodemographic segmentation systems development experience.
Core quantitative technologies employed were k-means and Wards hierarchical cluster analysis. The first stage involved clustering dissemination areas into atoms, or mini-clusters, using k-means cluster analysis. Next, atoms were aggregated, or combined, into clusters using Wards hierarchical clustering.
Precisely used a big-data, iterative approach to identify the optimal solution conditional on available input data. From atoms to clusters, Precisely simulated hundreds of cluster solutions and evaluated each solution individually using data mining and mathematical techniques such as logistic regression, cubic clustering criteria, and combinatorics. The most-promising (spatially) solutions were thematically mapped and geostatistics were analyzed to understand the spatial pattern of each solution. Each solution was analyzed across the Canadian geographical spectrum, from provinces to Census Subdivisions (CSDs).
The final cluster solution minimized within-cluster variance and maximized between-cluster variance. Qualitative techniques were used throughout the development process to understand why and how various cluster simulations differed from one another, to determine which set of clusters were most practical in terms of sociodemographics – accounting most importantly for end-user use cases – and to determine the most optimal solution.
Many of the features and benefits of the original PSYTE™ HD geodemographic system (built in 2014 using 2011 Census and other data) have been maintained. Most of the original clusters were validated with Census 2016 inputs. However, six clusters were determined to be statistically inadequate given the updated data and were dissolved. Six new clusters were developed to better reflect changes to the demographic and socioeconomic landscape of their neighbourhoods. Note that, due to updated income data in the 2016 Census, some original clusters took on new cluster numbers based on the new cluster sort, which was based on a household income ranking at the time of development.
To improve the accuracy of PSYTE™ HD geodemographic cluster assignments for FSALDUs, a cluster reassignment process was employed to more effectively differentiate FSALDU cluster assignments whose geodemographic characteristics were statistically different from their parent DAs. However, because census data is not published by Statistics Canada at the FSALDU unit of analysis, a cluster assignment model was developed using multivariate statistical techniques, including household-level data aggregated to the 6-digit postal code.
PSYTE™ HD geodemographic data combines proven, rigorous quantitative methods and experience-driven qualitative methods that optimize the interconnectivity of Canadian demographics, location, lifestyle, and household consumption.
Cluster validation
Primary use cases for PSYTE™ HD geodemographic data include customer profiling, site selection, physical and digital marketing, identifying untapped and underutilized markets, helping to determine customer potential, and site selection. Validating a geodemographic cluster solution is an important part of the geodemographic cluster solution development process. When a set of clusters are finalized, the solution must be tested and benchmarked using the same type of data that customers use with PSYTE™ HD geodemographic data. A precise geodemographic segmentation system will accurately depict and discriminate the input data that it is compared against. In developing PSYTE™ HD geodemographic data, Precisely's data development team profiled customer data against PSYTE™ HD geodemographic clusters. This data- and statistically intensive approach both helped to identify and solidify the final product and prove that PSYTE™ HD geodemographic data is an accurate geodemographic segmentation system that users can rely on to improve the business decision-making process.
Major groups
Each PSYTE™ HD geodemographic cluster is associated with one of 12 major groups. Each major group is associated with one of 3 predominant settlement types and one of 5 socioeconomic status-based identifiers. Major groups provide users with a macro-level approach to understanding the relationship between human settlement and affluence.
The major groups are:
- P1 – Primary-Metropolitan Affluent
- P2 – Primary-Metropolitan Comfortable
- P3 – Primary-Metropolitan Mid-Scale
- P4 – Primary-Metropolitan Lower Middle
- P5 – Primary-Metropolitan Downscale
- S1 – Secondary-Metropolitan & Suburban Affluent
- S2 – Secondary-Metropolitan & Suburban Comfortable
- S3 – Secondary-Metropolitan & Suburban Mid-Scale
- S4 – Secondary-Metropolitan & Suburban Lower Middle
- S5 – Secondary-Metropolitan & Suburban Downscale
- T1 – Rural & Other Comfortable
- T2 – Rural & Other Downscale
Conclusion
Precisely continues its well-established and well-respected geodemographic systems development excellence with this version of PSYTE™ HD geodemographic data, drawn from demographics and consumer behaviours. PSYTE™ HD Canada geodemographic data is a 56-cluster solution (including one unclassified cluster with mostly low household and population counts). Compared to the original PSYTE™ HD geodemographic data – built in 2014 on 2011 Census and other data – 50 clusters retained their names and general characteristics, although their descriptions were updated with new data in 2018. PSYTE™ HD geodemographic clusters in 12 major groups provide users with a total solution for understanding the core dynamics of Canadian consumer households. The precise, proven and robust quantitative and qualitative methodology used to create PSYTE™ HD geodemographic data results in a reliable, accurate, and identifiable geodemographics segmentation system that increases business decision-making capacity by providing actionable, realistic market intelligence.