library(readxl)
library(tidyverse)
library(sf)
library(janitor)
library(magrittr)
library(kableExtra)
library(viridis)
library(httr)
Day 30: The Final Map
I am using today as an excuse to look at some new data (to me at least) that popped into my social media feed a month or so back, but I have not had chance to look at them yet.
These concern UK small area gross value added (GVA) estimates - but something that interested me was that there were a lot of caveats when using these data - specifically:
The building blocks statistics are not directly comparable across nations because the levels of composition can vary hugely. This is because some small areas contain mainly (or exclusively) households, and others contain heavy industries.
Further, the building blocks geographies are defined differently, which calls for caution when comparing and/or interpreting the statistics.
The small areas statistics can appear quite volatile, but are more stable when aggregated to form larger geographic areas.
These all sounded like interesting challenges to me!
Exploring the Data
Load packages
This reads the data for England and cleans up the column names.
# URL
<- "https://www.ons.gov.uk/file?uri=/economy/grossvalueaddedgva/datasets/uksmallareagvaestimates/1998to2022/uksmallareagvaestimates1998to2022.xlsx"
url
# Download the file to a temporary location
<- tempfile(fileext = ".xlsx")
temp_file GET(url, write_disk(temp_file, overwrite = TRUE))
Response [https://www.ons.gov.uk/file?uri=/economy/grossvalueaddedgva/datasets/uksmallareagvaestimates/1998to2022/uksmallareagvaestimates1998to2022.xlsx]
Date: 2024-11-14 10:21
Status: 200
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;charset=utf-8
Size: 13.9 MB
<ON DISK> /tmp/RtmpvWl2Xb/file28179bc44.xlsx
# Read the Excel file
<- read_excel(temp_file, sheet = "Table 1", skip = 1) %>% clean_names()
gva
#Subset
%<>%
gva select(lsoa_code,x1998:x2022)
We can then create a function to produce an index calibrated against the earliest year of data; in this case 1998.
<- function(df) {
calculate_index
# Get the base year (1998) values
<- df %>%
base_values select(x1998) %>%
pull()
<- df %>%
df_index mutate(
across(
starts_with("x"),
~ (.x / base_values) * 100
)
)
return(df_index)
}
<- calculate_index(gva) gva_index
Make a Map
First we read in the LSOA polygons and remove unwanted attributes. It is worth noting that the codes supplied on the GVA data are for the 2011 version of the codes! It would be fantastic if the formal code names were used on all government data, as you often have to discover this later when lots of matches fail!
# Download data
<- st_read("https://services1.arcgis.com/ESMARspQHYMw9BZ9/arcgis/rest/services/LSOA_Dec_2011_Boundaries_Generalised_Clipped_BGC_EW_V3/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson") lsoa_sf
Reading layer `OGRGeoJSON' from data source
`https://services1.arcgis.com/ESMARspQHYMw9BZ9/arcgis/rest/services/LSOA_Dec_2011_Boundaries_Generalised_Clipped_BGC_EW_V3/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson'
using driver `GeoJSON'
Simple feature collection with 34753 features and 11 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -6.418622 ymin: 49.86474 xmax: 1.763571 ymax: 55.81107
Geodetic CRS: WGS 84
# Subset to England and remove unwanted columns
%<>%
lsoa_sf filter(startsWith(LSOA11CD, "E")) %>%
select(LSOA11CD)
Next we can Join the GVA to the polygons.
# Join
%<>%
lsoa_sf left_join(gva_index, by = c("LSOA11CD" = "lsoa_code"))
Deleting source `output_file.gpkg' using driver `GPKG'
Writing layer `output_file' to data source `output_file.gpkg' using driver `GPKG'
Writing 32844 features with 26 fields and geometry type Multi Polygon.
And then create a map.
<- c(-Inf, 27, 43, 60, 76, 92, 108, 124, 140, 157, 173, Inf)
breaks <- c("11 - 27", "27 - 43", "43 - 60", "60 - 76",
labels "76 - 92", "92 - 108", "108 - 124", "124 - 140",
"140 - 157", "157 - 173", "173 - 4845")
# Create a categorized variables
$category <- cut(lsoa_sf$x2022,
lsoa_sfbreaks = breaks,
labels = labels,
right = FALSE) # left-inclusive intervals
# Define colors corresponding to each range
<- c(
colors "11 - 27" = "#d7191c",
"27 - 43" = "#e65538",
"43 - 60" = "#f59053",
"60 - 76" = "#fdbe74",
"76 - 92" = "#fedf99",
"92 - 108" = "#ffffbf",
"108 - 124" = "#ddf1b4",
"124 - 140" = "#bce4a9",
"140 - 157" = "#91cba8",
"157 - 173" = "#5ea7b1",
"173 - 4845" = "#2b83ba"
)# Plot with manual colors
ggplot(data = lsoa_sf) +
geom_sf(aes(fill = category), color = NA) +
scale_fill_manual(values = colors) +
theme_minimal() +
labs(
fill = "2022 Index (base 1998)"
+
) labs(size = "Proximity") + # Change "New Legend Title" to your desired title
coord_sf(crs = st_crs(27700)) +
theme_minimal() +
theme(
axis.text = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
The patterns are quite noisy, and when you explore some of the more extreme patterns there are clearly areas where these may be statistical anomaly; which follows some of the warnings in how to use these data. However, in some areas this appears to not be the case. The following area shows negative GVA relative to 1998, and is the location of Thoresby Colliery which closed in 2015, so within the comparison period.
GVA and Retail Centres
Next I thought it would be interesting to use the GVA data to explore some aspects of retail. First we import the CDRC Retail Centre definitions. This is all quite rough, so with more time I would do these analysis a little more thoroughly!
# Get retail centres
<- st_read("Retail_Boundaries_UK.gpkg") retail_sf
Reading layer `Retail_Boundaries_UK' from data source
`/home/rstudio/alexsingleton.github.io/content/blog/2024-11-30-30DMC_The_Final_Map/Retail_Boundaries_UK.gpkg'
using driver `GPKG'
Simple feature collection with 6423 features and 8 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 33363.7 ymin: 10471.87 xmax: 655167.5 ymax: 1142036
Projected CRS: OSGB36 / British National Grid
Then we use a spatial join to look at the intersection of retail centres and the LSOA.
# Ensure both are in the same coordinate reference system (CRS)
<- st_transform(lsoa_sf, st_crs(retail_sf))
lsoa_sf # Join
<- st_join(retail_sf,lsoa_sf, join =st_intersects)
points_with_retail
# Filter to England
%<>% filter(Country=="England") points_with_retail
# Perform spatial intersection to get overlapping areas
<- st_intersection(retail_sf, lsoa_sf %>% st_make_valid())
overlap_sf
# Calculate the area of the intersected geometries
$overlap_area <- st_area(overlap_sf) overlap_sf
We can then analyse the changes in GVA by retail center type to identify which types have experienced the most growth since 1999. Because our classification segments by more traditional types of retail agglomeration and those which are designed to be concentrated, such as within a retail park or shopping centre, these are reflected in the statistics and represent a general evolution in retail since 1999.
%>%
overlap_sf filter(!is.na(Classification)) %>% # Remove rows with NA
group_by(Classification) %>%
summarise(
Weighted_Average_Index = as.numeric(sum(x2022 * overlap_area, na.rm = TRUE) / sum(overlap_area), na.rm = TRUE)
%>%
) arrange(desc(Weighted_Average_Index)) %>%
st_drop_geometry() %>%
kable(align = 'lcc') %>%
kable_styling(bootstrap_options = "responsive", full_width = FALSE)
Classification | Weighted_Average_Index |
---|---|
Small Shopping Centre | 652.1828 |
Regional Centre | 359.3353 |
Large Shopping Centre | 321.3384 |
Small Retail Park | 270.9027 |
Large Retail Park | 252.0647 |
District Centre | 246.5589 |
Major Town Centre | 244.6297 |
Small Local Centre | 243.0592 |
Town Centre | 237.2312 |
Local Centre | 236.3928 |
Market Town | 226.3048 |
We can then have a look at some of these patterns by retail centre. These were restricted to the two largest types of traditional retail centre. Some interesting patterns emerge.
<- overlap_sf %>%
results filter(!is.na(RC_Name)) %>% # Remove rows with NA
filter(Classification %in% c("Regional Centre","Major Town Centre")) %>%
group_by(RC_Name) %>%
summarise(
Weighted_Average_Index = as.numeric(sum(x2022 * overlap_area, na.rm = TRUE) / sum(overlap_area), na.rm = TRUE)
%>%
) arrange(desc(Weighted_Average_Index))
#Display the table
%>%
results st_drop_geometry() %>%
kable(align = 'lcc') %>%
kable_styling(bootstrap_options = "responsive", full_width = F) %>%
scroll_box(height = "300px")
RC_Name | Weighted_Average_Index |
---|---|
Bradford; Bradford (Yorkshire and The Humber; England) | 603.7709 |
Bristol City; Bristol (South West; England) | 458.7708 |
Worthing; Worthing (South East; England) | 454.0608 |
Manchester City; Manchester (North West; England) | 439.3316 |
Leeds City, Leeds (Yorkshire and The Humber; England) | 418.5523 |
St Albans; St Albans (East of England; England) | 408.0652 |
Nottingham; Nottingham (East Midlands; England) | 399.1771 |
London; London (London; England) | 378.7947 |
Reading; Reading (South East; England) | 370.4064 |
Oxford; Oxford (South East; England) | 354.4344 |
Warrington; Warrington (North West; England) | 342.4695 |
Brighton and Hove; Brighton and Hove (South East; England) | 325.4165 |
Sheffield City; Sheffield (Yorkshire and The Humber; England) | 324.4665 |
Hounslow; Hounslow (London; England) | 318.0803 |
Bournemouth; Bournemouth, Christchurch and Poole (South West; England) | 304.0225 |
Leicester; Leicester (East Midlands; England) | 302.4936 |
Stoke; Stoke-on-Trent (West Midlands; England) | 301.3301 |
Newcastle City; Newcastle upon Tyne (North East; England) | 296.0757 |
Lancaster; Lancaster (North West; England) | 289.9517 |
Southampton; Southampton (South East; England) | 287.6918 |
Birmingham City; Birmingham (West Midlands; England) | 284.9352 |
Liverpool City; Liverpool (North West; England) | 281.1173 |
Exeter; Exeter (South West; England) | 280.9092 |
Wigan; Wigan (North West; England) | 278.9805 |
Knightsbridge; Kensington and Chelsea (London; England) | 276.0378 |
Northampton; Northampton (East Midlands; England) | 274.4235 |
Plymouth; Plymouth (South West; England) | 274.1781 |
Hull; Kingston upon Hull (Yorkshire and The Humber; England) | 272.6131 |
Chester; Cheshire West and Chester (North West; England) | 269.3055 |
Lincoln; Lincoln (East Midlands; England) | 266.6141 |
Guildford; Guildford (South East; England) | 265.0940 |
Stockport; Stockport (North West; England) | 264.3075 |
Canterbury; Canterbury (South East; England) | 261.6701 |
Bath; Bath and North East Somerset (South West; England) | 258.9825 |
Doncaster; Doncaster (Yorkshire and The Humber; England) | 257.4575 |
Ipswich; Ipswich (East of England; England) | 257.2136 |
Middlesbrough; Middlesbrough (North East; England) | 256.9389 |
Worcester; Worcester (West Midlands; England) | 256.2160 |
Watford; Watford (East of England; England) | 253.4625 |
Bolton; Bolton (North West; England) | 252.8540 |
Hastings; Hastings (South East; England) | 248.2206 |
Barnsley; Barnsley (Yorkshire and The Humber; England) | 246.2507 |
Cheltenham; Cheltenham (South West; England) | 246.2328 |
Truro; Cornwall (South West; England) | 244.5830 |
York; York (Yorkshire and The Humber; England) | 242.5009 |
Derby; Derby (East Midlands; England) | 239.0582 |
Blackpool; Blackpool (North West; England) | 237.6552 |
Maidstone; Maidstone (South East; England) | 236.2865 |
Royal Leamington Spa; Warwick (West Midlands; England) | 233.6452 |
Sutton; Sutton (London; England) | 225.4427 |
Preston; Preston (North West; England) | 225.3978 |
Kingston upon Thames; London (London; England) | 223.7936 |
Peterborough; Peterborough (East of England; England) | 219.9911 |
Cambridge; Cambridge (East of England; England) | 219.5252 |
Romford; London (London; England) | 215.6034 |
Scarborough; Scarborough (Yorkshire and The Humber; England) | 214.0865 |
Huddersfield; Kirklees (Yorkshire and The Humber; England) | 213.8794 |
Norwich; Norwich (East of England; England) | 212.4205 |
Croydon; London (London; England) | 210.9787 |
Colchester; Colchester (East of England; England) | 210.7075 |
Royal Tunbridge Wells; Tunbridge Wells (South East; England) | 206.5099 |
Hereford; County of Herefordshire (West Midlands; England) | 205.7771 |
Ealing; Ealing (London; England) | 205.1098 |
Wood Green; London (London; England) | 204.8174 |
Loughborough; Charnwood (East Midlands; England) | 201.8812 |
Bedford; Bedford (East of England; England) | 201.6262 |
Ilford; London (London; England) | 200.4038 |
Southport; Sefton (North West; England) | 195.6721 |
Coventry; Coventry (West Midlands; England) | 195.0813 |
Shrewsbury; Shropshire (West Midlands; England) | 194.3616 |
Wakefield; Wakefield (Yorkshire and The Humber; England) | 192.3007 |
Milton Keynes; Milton Keynes (South East; England) | 191.0835 |
Carlisle; Carlisle (North West; England) | 188.4729 |
Sunderland; Sunderland (North East; England) | 183.6281 |
Darlington; Darlington (North East; England) | 181.8393 |
Eastbourne; Eastbourne (South East; England) | 181.5299 |
King's Lynn; King's Lynn and West Norfolk (East of England; England) | 177.0160 |
Taunton; Somerset West and Taunton (East of England; England) | 174.7752 |
Wolverhampton; Wolverhampton (West Midlands; England) | 159.6873 |
Bromley; London (London; England) | 151.8955 |
Salisbury; Wiltshire (South West; England) | 134.6113 |
Swindon; Swindon (South West; England) | 127.8705 |
Luton; Luton (East of England; England) | 104.4664 |
We can also map these index values. My takeaway from these data are that they are potentially very interesting and I suspect after this fairly rough and ready exploration may well make their way into a retail paper over the next couple of months!
# Calculate centroids of the polygons
$centroid <- st_centroid(results$geom)
results
# Extract the coordinates of centroids into separate columns
<- results %>%
results mutate(
centroid_x = st_coordinates(centroid)[, 1],
centroid_y = st_coordinates(centroid)[, 2]
)
# Create a ggplot map with a combined legend
ggplot() +
# Plot the polygons
geom_sf(data = lsoa_sf, fill = "lightgray", color = NA) +
# Plot the centroids with combined size and color
geom_point(
data = results,
aes(
x = centroid_x,
y = centroid_y,
size = Weighted_Average_Index,
color = Weighted_Average_Index
)+
) # Apply a combined scale for color and size with a unified legend
scale_size_continuous(range = c(1, 8)) +
scale_color_viridis_c() +
guides(
color = guide_legend(
title = "Weighted Average Index",
override.aes = list(size = 5)
)+
) theme_minimal() +
theme(
axis.text = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.spacing = unit(0, "mm"),
plot.margin = unit(rep(0, 4), "mm"),
legend.position = "right"
)