What is the average area/size Hex 9 (Uber H3 Level 9) polygons?
The following table shows a comparison of all H3 levels. H3 Level 9 is highlighted:
H3 Resolution |
Average Hexagon Area (km2) |
Average Hexagon Edge Length |
Number of Unique Cells |
0 | 4,250,546.8477000 | 1,107.712591000 | 122 |
1 | 607,220.9782429 | 418.676005500 | 842 |
2 | 86,745.8540347 | 158.244655800 | 5,882 |
3 | 12,392.2648621 | 59.810857940 | 41,162 |
4 | 1,770.3235517 | 22.606379400 | 288,122 |
5 | 252.9033645 | 8.544408276 | 2,016,842 |
6 | 36.1290521 | 3.229482772 | 14,117,882 |
7 | 5.1612932 | 1.220629759 | 98,825,162 |
8 | 0.7373276 | 0.461354684 | 691,776,122 |
9 | 0.1053325 | 0.174375668 | 4,842,432,842 |
10 | 0.0150475 | 0.065907807 | 33,897,029,882 |
11 | 0.002149643 | 0.02491056 | 237,279,209,162 |
(Source: https://h3geo.org/docs/core-library/restable/)
Is there a correlation between polygon size and data volatility?
Level 9 hexagons cover an area of approximately 0.1 km2. The obvious upside to such a granular spatial resolution is that it allows for identification of patterns on a detailed level or isolation for small areas of particular interest. The downside is that in a smaller spatial area, less data is collected per spatial unit.
A smaller sample size (compared to larger administrative areas) can lead to more volatile metrics. It also means that metrics such as dwell time can be shorter and less insightful when what is, in reality, one long dwell time crosses into neighboring hexagons and is split into several shorter dwell times.
Another downside of small hexagons is that time series analysis might suffer from data gaps. If a particular hexagon gets data during some – but not all – periods, comparisons over time become increasingly complicated. OAs increase in size with decreasing population density, ensuring that they have enough data during all periods to make time series analysis possible.
Smaller hexagons work best where data is dense, mostly in urban areas and areas with high visitation rates. Generally, whenever a metric is observed as very volatile in a smaller hexagon, it's advisable to use an OA for analysis.
What does the value 9999999 in the ORIGIN_AREA_ID and rank fields represent?
In the ORIGIN_AREA_ID field, 9999999 is used to aggregate origins with very low PERCENT_POP_T values, in order to reduce a long tail of origins. The value is applied only after the cumulative sum of PERCENT_POP_T surpasses 90% and the remaining origins contribute less than 1% each.
In rank columns, the value 9999999 is used in two cases. The first is to set the rank for the aggregate origin described above, which should not be included in the ranking of regular origins. The second case is when an origin does not show flows to the destination for some day parts, week parts, or ORIGIN_AREA_TYPE values. Whenever such data is present, the origin is ranked in the normal manner, but will receive a rank of 9999999 in cases where there are no flows to be ranked.
Is there a minimum number of mobility data transactions for an area that can be used in a dataset?
Yes. Every aggregation in the dataset must contain a minimum of 10 data points. However, it is important to understand that while there is a minimum number of individual transactions in each batch of 11, that doesn't necessarily mean that this data is from 11 individual devices. By utilizing 12 months of data in all calculations (except seasonality) and applying the >10 point minimum filter, Precisely reduces the risk of individual device origin-destination results for a location being introduced into the dataset.