Methodology statement - complete - Latest

GroundView: Complete US Demographic Estimates and Projections Data Suite Product Guide

Product type
Data
Portfolio
Enrich
Product family
Enrich Demographics > Demographic Estimates and Projections
Product
GroundView Demographics > Complete
Version
Latest
Language
English
Product name
GroundView: Complete
Title
GroundView: Complete US Demographic Estimates and Projections Data Suite Product Guide
Copyright
2024
First publish date
2016
Last updated
2024-09-24
Published on
2024-09-24T14:32:40.312524

The methodologies employed in the build process of Estimates and Projections are a combination of traditional demographic techniques, and innovative proprietary processes and geostatistical algorithms which enhance data accuracy and relevance. If information beyond what is included in this document is required, contact your Precisely account manager.

Overview

The basic methodology for the demographic estimates and projections combines top-down and bottom-up phases. The top-down phase begins at the national, state, and county units of analysis. National, state, and county estimates and projections serve as control totals for the bottom-up sub-county estimates and projections. Generally, the unit of analysis for variables is either population or households, or some combination of these variables.

Input data sources used for estimates in the top-down phase include Census Bureau national population projections by age, sex, race, and Hispanic origin, along with the latest Census Bureau county population estimates and population estimates by demographic characteristics.

The bottom-up phase of the estimation and projection methodology begins with the most recent decennial census block level data. To estimate current household counts, Precisely uses a combination of Precisely updated address and postal products along with US Postal Service statistics.

The objective of the bottom-up phase is to produce a preliminary estimate of total households for the current and projected year that accurately reflects both growth and decline at the block group unit of analysis. The results of the bottom-up household estimates and projections are then reconciled with the top-down results using a mathematical technique employed to ensure that block group household counts, or bottom-up households, sum to their parent county, or top-down household counts.

Once the estimate and projection for total households is established, estimates of average household size, household population and group quarters are combined and modeled with census data to produce population estimates based on changes in households and household characteristics. The demographic estimation methodology employs time-series data at multiple levels of geography from national to neighborhood.

Population and household characteristics

Estimates of population characteristics such as the population distributions by age, sex, race, and Hispanic origin follow a similar top-down/bottom-up process. At the block group level, the latest information from the US Census Bureau’s American Community Survey (ACS) – the largest household survey in the U.S. – is integrated with census data to produce timely, accurate small-area estimates. Population characteristics are estimated using demographic estimation and projection models to achieve a preliminary set of estimates which are then reconciled to top-down estimates using iterative proportion fitting.

Note: To maintain the comparability of demographic variables over time, Precisely has structured these estimates and projections to align with available 2020 Census inputs. Also, some variables related to ancestry groups in the ACS are published by the Census Bureau at the census tract level only. This includes detailed origin groups, as defined by specific countries, for Asians and Hispanics. For these data sets, census tract proportional distributions have been applied to the underlying block group populations. While this method assumes that the distribution holds constant, change over time for trade areas and higher geographies is reflected in the overall differential growth among block groups.

2020 Base Year Estimates

Census 2020 – The US Census Bureau conducted its latest decennial Census in 2020, to count the residents of the United States and its territories. The Census gathers information from households across a limited number of Census questions related to the topics of age, sex, race, Hispanic or Latino origin, household type, family type, relationship to householder, group quarters population, housing occupancy, and housing tenure. The Census 2020 results are tabulated by the Census Bureau into data products at various levels of geography, including the Demographic and Housing Characteristics File (DHC) product. The Disclosure Avoidance System (DAS) introduced complexities in the DHC that required the Census Bureau to re-envision how they designed and implemented the new confidentiality protections to the extensive set of tabulations included in the DHC and subsequent data processing and quality assurance. After considerable delays (mainly because of COVID-19 impacts on data collection, as well as the need for meeting more stringent confidentiality requirements), the 2020 DHC data was released with data attributes which were roughly equivalent to the Census Summary File 1 product from the 2010 Census.

The American Community Survey (ACS) is a continuously updated sample survey of the US population. The ACS collects critical socioeconomic data, including income and home value, as well as housing and demographic data, on an annual basis. Each year generally in December, the Census Bureau produces block group data with a five-year reference period.

Precisely has utilized the DHC files and 2022 American Community Survey results to create estimates of the Census 2020 Base Year, with a mid-2020 reference date.

To address inconsistencies between the ACS and the Census 2020 variables and to overcome the estimation of multidimensional data and sparse samples, Precisely developed the Base Year (2020) Estimate dataset. This Base Year 2020 dataset evolved, starting with the use of 2020 ACS (2016-2020) data inputs in GroundView 2022 and replacing the base year estimates over the subsequent two releases of GroundView with 2021 ACS (2017-2021) in GroundView 2023 and then 2022 ACS (2018-2022) base year inputs in GroundView 2024. The mid-year of the five-year period estimate for 2022 ACS (2018-2022) is 2020; we expect that the 2022 ACS will remain as the primary five-year ACS input that will inform 2020 Base Year Estimates variables going forward.

The basic methodology to produce the 2020 Base Year Estimates data is:
  • Running proprietary geospatial enhancement routines to improve the data from neighboring or higher geographic levels for select variables not published in the survey data.
  • Normalizing results from the five-year period estimates to 2020 Census base counts. This step assumes that the distributions and summary measures from the ACS five-year period estimates fairly represent characteristics in 2020.
  • The 2020 estimates at the BG level are controlled to official County level July 1 estimates from the Census Bureau.

Current year demographic characteristics

Current year estimates of population and household characteristics are updated annually using normalized most-current ACS data, similar to the methodology described in the above section. This process produces an updated count implied by distribution of the characteristics and the current year estimated base population and households.

Household income

Precisely’s income estimates and projections are based on the five-year American Community Survey (ACS) and the ACS Public Use Microdata Sample (ACS_PUMS) data. The ACS_PUMS lowest level of geography is the Public Use Microdata Areas (PUMAs). They cover areas with 100,000 or more population (not exceeding 200,000), as defined by Census Tracts that represent either one county, multiple counties, or a portion of a county. For each block group, ACS household income data is published as sixteen income categories in four age cohorts.

The Census Bureau defines income in the past twelve months as the sum of the amounts reported separately for wage or salary income; net self-employment income; interest, dividends, or net rental or royalty income or income from estates and trusts; Social Security or Railroad Retirement income; Supplemental Security Income (SSI); public assistance or welfare payments; retirement, survivor, or disability pensions; and all other income. Household income “includes income of the householder and all other people 15 years and older in the household, whether or not they are related to the householder” (www.census.gov).

To estimate and project household income and household income by age of householder, the latest available time-series from ACS data is used for the 5-year data. Precisely employs a top-down (PUMA) and then bottom-up approach (block group level) to estimate household income and household income by age of householder. All PUMAs – regardless of population size - have ACS 5-year estimates. For block groups, only 5-year ACS data are available.

The most recent ACS data is 2022 data and Precisely estimates income for 2024 and projects income for 2029. Our income estimates and projections are presented in current year US dollars. For example, 2024 median household income represents the money received as if it were 2024; 2029 median household income represents the money received as if it were 2029. Household income by age of householder estimates and projections include sixteen income categories for seven age cohorts.

Precisely calculates median and mean (average) income based on the age of householder by household income distributions. Aggregate household income is calculated as average household income multiplied by the total households. Two types of per capita income are calculated. The population approach is calculated as total aggregate income divided by total population. The household approach is calculated as total aggregate income divided by total household population.
Note: Total population is larger than household population.

Consumer Spend Potential

The Consumer Spend Potential (CSP) dataset provides estimates of aggregate household expenditures for consumer goods including food, automobiles and insurance. Each year, the US Bureau of Labor Statistics (BLS) conducts the Consumer Expenditure Survey (CES). The Consumer Expenditure Survey is the largest nationally representative survey of consumer expenditures data by demographic characteristics, including income, tenure and region. The CES consists of two surveys – the Diary Survey and the Interview Survey. The Diary Survey collects data on everyday purchases such as food or gasoline. The Interview Survey collects data about large expenditures and regular purchases. The CES represents most household expenditures categories and is updated continuously to reflect changes in consumer preferences and habits.

CSP data is partitioned via a hierarchical schema with implicit nesting. For example, the categorization of expenditures for apples might be:

Food → Food at Home → Fruits and Vegetables → Fresh Fruit → Apples

For the majority of consumer expenditure variables, Precisely’s CSP dataset follows sequentially in alignment with the item hierarchy of the CES. Please refer to the CSP descriptions within the groundview_complete_usa_vyyyy_variables.xlsx worksheet, which explicitly documents the hierarchical relationship between the hundreds of consumer expenditure variables.

The most recent CES survey data is combined with Precisely’s data and a conditional probability model (based on various geodemographic characteristics) to produce estimated average household expenditures by expenditure type and by block group. The average block group estimates are adjusted for inflation to current-year levels using inflation estimates. Users should note that one consistent inflation adjustment is made to all expenditure categories because forecasting inflation by expenditure type is problematic. For example, the inflation factor for computers (technology) may be different than the inflation of apples (agriculture). To obtain aggregate expenditures by block group, the average household expenditures are multiplied by the total, current year households. For each CES survey, Precisely provides current year estimates for all expenditure items provided by CES so long as the data is not missing or too small to yield valuable insight. Precisely follows BLS definitions for expenditures. Please refer to the groundview_complete_usa_data_schema.xlsx workbook for a listing of the CSP category definitions.

Household Wealth (Net Worth) and Financial Assets

The estimation process for the wealth and financial assets begins with an analysis of the Federal Reserve Board’s latest triennial Survey of Consumer Finances. Household income and home value at the block group level from the latest GroundView estimates and projections were used to model and estimate average wealth and financial assets for each block group. The models were calibrated to reflect trends as derived from the Survey of Consumer Finances. Distributions were derived from mathematical functions that project the likely probability distributions based on levels of average wealth and financial assets at the local level. Wealth results are presented as mean and median estimates, as well as household distributions. Similarly, results of financial assets are presented as means, medians, and household distributions. The following are definitions of concepts and component parts:

Definitions

Household wealth or net worth is the difference between total assets and total liabilities at the household level. Assets include financial assets, vehicles, primary residence, investment real estate, business assets, and a residual category of non-financial assets.

Financial assets include transaction accounts (for example, checking and savings accounts), certificates of deposit, savings bonds, bonds, stocks, mutual funds, retirement accounts, cash value of life insurance, and a residual category of other managed financial assets. The concept of financial assets is a subset of household wealth.

The following are the components of Financial Assets (FA):

  • Transaction accounts
  • Certificates of deposit
  • Savings bonds
  • Stocks
  • Bonds
  • Mutual funds
  • Retirement accounts
  • Cash value of life insurance
  • Other managed assets
  • All other financial assets

The following are components of Non-Financial Assets:

  • Vehicles
  • Primary Residence
  • Investment Real estate
  • Business Assets
  • Other non-financial assets

The following are components of Liabilities:

  • Home mortgage
  • Home equity loan
  • Lines of credit (secured by home)
  • Installment loans
  • Other lines of credit
  • Credit card balances
  • All other debt

The calculation for total assets is Financial Assets + Non-Financial Assets.

The calculation for wealth (net worth) is Total Assets - Total Liabilities. Negative values are set to zero.

Daytime population

Daytime population is the estimated number of people that are in a given area during the daytime. Daytime population has two components: at-home population and at-work population (total employees). At-home population is the current estimate of the number of persons aged 16+ that are not in the labor force and therefore presumed to be at home during the day, as well as population under 16 years of age. At-work population is based on Precisely’s Business Summary Data, which contains estimates of the number of persons who work in the given Block Group. The sum of the at-home population and the at-work population yields an estimate of the number of persons in the Block Group during the day.

Socio-economic score (SES)

The socio-economic score is a comparative index value ranging from 1 to 100, which indicates the overall socio-economic status of an area. Four variables were used to produce the SES score: Median Household Income, Median Home Value, Occupational Level (percent white collar), and Educational Attainment – the percent of the population aged 25+ with educational degrees earned beyond a high school diploma.

Each block group was given a score for each of these categories based on how it ranked against all other block groups nationwide. Once these scores were determined, an overall score for each block group was calculated by combining the individual scores using an un-weighted average. Finally, the overall scores were indexed on the 100-point scale.

ZIP Code data

ZIP Codes are represented as polygons and point locations in the ZIP geography layer.

GroundView demographic data are aggregated to the ZIP Code polygons as represented by Precisely’s current ZIP Code boundary product. Polygon ZIP Codes generally represent areas served by the US Postal Service and are defined for the purpose of efficient mail delivery. The GroundView demographic data are applied to polygon ZIP Codes (as represented by Precisely’s boundary definitions) using Census Block-level apportionment of GroundView Block Group data.

Point ZIP Codes may represent a business location or a Post Office with P.O. Boxes used by residential or business customers. Some point ZIP Codes are defined by the US Postal Service as a residential post office (RPO), where residents pick up their mail at the Post Office because it is efficient or because mail delivery to the home may not be possible or for other reasons. Prior to 2022 GroundView demographic data, Precisely had previously assigned households to RPOs based on USPS delivery counts to those RPOs, for the purpose of providing demographic data for as many ZIP Codes as possible. In some cases, these were rural areas which may have represented a significant proportion of households. The physical location of RPO households had been previously assumed to be the enclosing ZIP Code, a polygon ZIP Code which contains the RPO Post Office. The population and household characteristics of RPO households were previously assumed to mirror those of the population and households of the enclosing ZIP Code.

Starting in GroundView 2022 and continuing with subsequent GroundView releases, the point ZIP Codes have their demographics data values set to zero, as the true areal extent serviced by these point ZIP Codes are not considered to be accurately known. Only the polygon ZIP Codes have been attributed with demographic data (i.e. population and households).

The ZIP Code geography layer used to join with the Update Profile dataset includes both point and polygon spatial objects, resulting in a full roster representation of ZIP Codes across the U.S.

Puerto Rico demographic estimates and projections

Puerto Rico estimates are based on the most current data from the Census Bureau. Themed datasets and variables for Puerto Rico are identified in the groundview_complete_usa_vyyyy_variables.xlsx workbook.

Additional information

Due to updates in the source data, improvements made to methodologies, and geographic changes, users are urged to use caution when making year-over-year comparisons. In general, census year (base year) to current year average annual change is more stable than year-over-year change.