Learning Objectives

By the end of this practical lab you will be able to:

Making maps in R

Like most analysis task within R there are a few different ways in we can make maps. In this practical lab we will introduce various ways in which we can map the attribute of areas and contextualize these to build a richer cartography. We will illustrate this through base R, and a series of packages including ggplot2 and tmap.

Choropleth mapping in base R

Before we can create a map in R, we first need to import some spatial data. We will read in two shapefiles, the first containing polygons that will later be used to create a choropleth map, and the second some street segments that will be used to provide context.

#Load required package
library(rgdal)
#Read polygons (creates a SpatialPolygonsDataFrame object)
LSOA <- readOGR("./data", "E06000042",verbose = FALSE)
#Read lines (creates a SpatialLinesDataFrame object)
roads <- readOGR("./data", "Road",verbose = FALSE)

As was shown in the previous practical (see 2. Data Manipulation in R), we can use the plot() function which is built into base R to show the outlines of the polygons contained within the “LSOA” object.

plot(LSOA)

This map shows the Lower Layer Super Output Area boundaries for the city of Milton Keynes, UK. The attributes of the data frame are the overall and domain scores for the 2015 Index of Multiple Deprivation.

We will shade in this map using the overall IMD score which is stored in the column “imd_score”. There are a total of 152 values.

LSOA@data$imd_score
##   [1] 23.67  8.87 11.27 29.07 25.04 24.39 23.05 44.18 13.58 31.45 28.06
##  [12] 11.24  9.27 21.73 24.61 35.13 25.02 13.16 34.23 18.52 15.09 30.68
##  [23] 11.35  7.11  4.06 16.97 29.42 11.23 12.75 33.90 62.07 46.73 47.05
##  [34] 15.33 27.67 10.17  6.02  9.99 12.46 14.92 17.61  9.97 19.87  9.48
##  [45]  8.59 10.58  9.27  5.42 20.20  9.98 11.37 21.20 20.41  4.09 20.14
##  [56]  7.20 11.93 18.29 23.38  7.42  8.90  9.85  3.58 12.60 11.10  9.39
##  [67]  2.56 11.16 24.20  7.49 18.56  7.00  9.68  6.21  7.36  4.46  6.59
##  [78]  8.62  9.41  3.34  9.28  1.90  7.35  6.53  9.01 12.12 16.47 30.96
##  [89]  9.48 39.29 17.60  8.71 22.82 47.38 51.00 12.55 16.56  7.58  9.23
## [100]  5.17 13.96 12.66 35.89 12.82 13.10 20.41 10.89 13.41 12.90  3.04
## [111] 11.83  6.18 25.41 10.79 12.34 15.24  6.99 27.73 37.39 18.43 22.97
## [122] 20.46 42.72 12.65 19.01 61.15 58.57 61.85 54.43 26.96 41.06 38.49
## [133] 29.50 34.99 37.55 13.66  9.73  7.59 16.12  9.08  5.45 13.51 12.00
## [144]  6.24  7.75  7.58  5.22  9.11  6.42 18.70 13.01 16.09

The first step is to find suitable breakpoints for the data contained in the imd_score column. The continuous data needs to be assigned into categories so different colors can be applied on a choropleth map. There are numerous ways of doing this such as jenks, standard deviations or equal intervals. In this example we use a new function classIntervals() from the “classInt” package to find the ranges needed to divide the the imd_score into five categories. In this example we use the style “fisher” to specify jenks as the break point.

install.packages("classInt")
# Load package
library(classInt)
#Create breaks
breaks <- classIntervals(LSOA@data$imd_score, n = 5, style = "fisher")

If we print the object created by the classIntervals() function, a summary is printed showing you what breaks have been assigned, and how many areas are within these ranges.

breaks
## style: fisher
##   one of 18,671,940 possible partitions of this variable into 5 classes
##   [1.9,10.685) [10.685,19.44) [19.44,32.675) [32.675,49.19)  [49.19,62.07] 
##             55             47             29             15              6

We need to choose some colors that we will assign to each of the break points in the data. We will now use another package called “RColorBrewer” which provides a series of color pallets that are suitable for mapping. You can have a look at the color pallets online: http://colorbrewer2.org. Each of these pallets are named; and you can see all the available pallets as follows.

install.packages("RColorBrewer")
#Load package
library(RColorBrewer)
#Display all pallets
display.brewer.all()

We will then choose six colors from the pallet “YlOrRd”, and print them to the terminal. You will see that the colors are stored as hex values.

my_colours <- brewer.pal(6, "YlOrRd")
my_colours
## [1] "#FFFFB2" "#FED976" "#FEB24C" "#FD8D3C" "#F03B20" "#BD0026"

We can then use the function findColours() to select the appropriate color for each of the numbers we intend to map, depending on where these fit within the break points we calculated.

colours_to_map <- findColours(breaks, my_colours)

We can then create a basic map using this list of colors and the plot() function again.

plot(LSOA,col=colours_to_map)

We might also want to create a map without the borders, and this can be controlled with an additional parameter which is set to “NA”

plot(LSOA,col=colours_to_map,border = NA)

We can also add additional layers onto the map using a further parameter (“add”) which is set to “TRUE”. Without the “add=TRUE”, every time plot() is called, the previous plot is replaced. Two further parameters are used, “col” to specify the line color, and “lwd” the line width.

plot(LSOA,col=colours_to_map,border = NA)
plot(roads,add=TRUE, col="#6B6B6B",lwd=0.3)

Another feature that is very common to see on a map is a legend which tells you what values the colors used on the map correspond to. This combines the legend() function with a further function leglabs() (from the maptools package) to create a legend:

install.packages("maptools")
library(maptools)
# Plot choropleth
plot(LSOA,col=colours_to_map,border = NA)
# Plot roads
plot(roads,add=TRUE, col="#6B6B6B",lwd=0.3)
# Add legend
legend("bottomleft" ,legend = leglabs(breaks$brks, between = " to "), fill = my_colours, bty = "n",cex=0.6)

We will now add some points to the map by creating a new Spatial Points Data Frame as we demonstrated in the previous practical (see 2. Data Manipulation in R):

#Read in the location of schools
schools <- read.csv("./data/Milton_Keynes_SS_13.csv")
schools
##       URN                       SCHNAME    PCODE TOTPUPS AC5EM13 Easting
## 1  136468                Denbigh School  MK5 6EX    1388      80  483107
## 2  136844           The Hazeley Academy  MK8 0PT    1461      71  481650
## 3  110531              Lord Grey School  MK3 6EW    1464      54  485839
## 4  135665     The Milton Keynes Academy  MK6 5LA    1267      53  486120
## 5  136454               Oakgrove School MK10 9JQ    1249      65  488622
## 6  137052               Ousedale School MK16 0BJ    2195      78  486822
## 7  110532          The Radcliffe School MK12 5BT    1145      59  480805
## 8  110517     St Paul's Catholic School  MK6 5EN    1770      56  485623
## 9  136730      Shenley Brook End School  MK5 7ZT    1521      68  483160
## 10 138439      Sir Herbert Leon Academy  MK2 3HQ     676      49  487516
## 11 110526            Stantonbury Campus MK14 6BN    1984      53  484254
## 12 136842                   Walton High  MK7 7WH    1496      64  490091
## 13 110567 The Webber Independent School MK14 6DP     140      70  484790
##    Northing
## 1    236985
## 2    236327
## 3    234073
## 4    237569
## 5    238624
## 6    243183
## 7    240797
## 8    237130
## 9    235011
## 10   232076
## 11   241007
## 12   236608
## 13   241024

The new object “schools” contains 13 secondary schools within Milton Keynes, including their Easting and Northing, which are the coordinates of the schools in the British National Grid projection. Before you can make a spatial data frame, you need to check that there are no records with blank spatial references (i.e. Easting and Northing) - in this example, we use the subset() function.

The SpatialPointsDataFrame() function is then specified with the coordinates of the school “coords”, which are specified as matrix of values - cbind() is used to “column bind” the Easting and Northing lists together, i.e. so each row is a location. The “data” parameter specifies any attribute data - in this case we just use the original data frame. Finally, the “proj4string” is specified using the CRS() function. These are standard lookups to known as coordinate systems.

# Remove those schools without Easting or Northing
schools <- subset(schools, Easting != "" | Northing != "")

# Create the SpatialPointsDataFrame
schools_SDF <- SpatialPointsDataFrame(coords = cbind(schools$Easting, schools$Northing), data = schools, proj4string = CRS("+init=epsg:27700"))

We can now plot these locations on our map.

plot(LSOA,col=colours_to_map,border = NA)
plot(roads,add=TRUE, col="#6B6B6B",lwd=0.3)
plot(schools_SDF, pch = 19, cex = 0.4, col = "#442200",add=TRUE)

Plotting a subset of the map

Much in the same way that we can subset a data frame, we can also create subsets of the plot. For example, suppose we just wanted to view a map for a a specific Ward in Milton Keynes.

First we will read in the Ward boundaries.

WARD <- readOGR("./data", "england_cmwd_2011Polygon")
## OGR data source with driver: ESRI Shapefile 
## Source: "./data", layer: "england_cmwd_2011Polygon"
## with 23 features
## It has 4 fields

There are 22 wards, which are named as follows.

WARD$name
##  [1] Bletchley and Fenny Stratford Bradwell                     
##  [3] Campbell Park                 Danesborough                 
##  [5] Denbigh                       Eaton Manor                  
##  [7] Emerson Valley                Furzton                      
##  [9] Hanslope Park                 Linford North                
## [11] Linford South                 Loughton Park                
## [13] Middleton                     Newport Pagnell North        
## [15] Newport Pagnell South         Olney                        
## [17] Sherington                    Stantonbury                  
## [19] Stony Stratford               Walton Park                  
## [21] Whaddon                       Wolverton                    
## [23] Woughton                     
## 23 Levels: Bletchley and Fenny Stratford Bradwell ... Woughton

We can plot and label these as follows. The text() function applies text to the map, specifying three lists including the Easting, Northing and the text labels. The Easting and Northing are derived using the coordinates() function, which for spatial polygons takes the centre of the polygon extent.

plot(WARD)
text(coordinates(WARD)[, 1], coordinates(WARD)[, 2], labels = WARD@data$name, cex = 0.4)

In the following example we can use the attributes of the spatial data frame to plot just a small area of the total spatial polygons data frame object. For example, to plot just Loughton Park we can use the square brackets to just select a single row that matches the name “Loughton Park”

plot(WARD[WARD@data$name == "Loughton Park",])

This is a useful technique to create a more limited extent. We can use this as follows with some of the previous plots to create a “zoomed in map”. Note how we are building the map up as a series of layers.

plot(WARD[WARD@data$name == "Loughton Park",],border=NA) #creates the extent, note that border = NA to make this polygon invisible
plot(LSOA,col=colours_to_map,border = NA,add=TRUE) #plot the choropleth for the IMD
plot(roads,add=TRUE, col="#6B6B6B",lwd=0.3) #plot the roads
plot(schools_SDF, pch = 19, cex = 0.4, col = "#442200",add=TRUE) #plot the schools
plot(WARD,border="#6B6B6B",add=TRUE)#Add the ward boundaries, however this time they have a colour assigned
text(coordinates(schools_SDF)[, 1], coordinates(schools_SDF)[, 2], labels = schools_SDF@data$SCHNAME, cex = 0.8,col="#442200",pos=2)

Choropleth mapping in ggplot2

In the previous practical we introduce ggplot2 (see 5. Data Viz 1: Charts and Graphs), which can also be used to plot maps. The first stage is the extract the polygon boundaries from the spatial polygons object. We do this using the fortify() function; the “region” parameter is the attribute used to split the polygon - in this case the unique ID for each LSOA.

# Load the ggplot2 package
library(ggplot2)
# Fortify
LSOA_FF <- fortify(LSOA, region="LSOA11CD")

We can now have a look at the new data frame object this created.

head(LSOA_FF)
##       long      lat order  hole piece        id       group
## 1 487797.2 233640.4     1 FALSE     1 E01016710 E01016710.1
## 2 487795.9 233642.7     2 FALSE     1 E01016710 E01016710.1
## 3 487794.6 233644.9     3 FALSE     1 E01016710 E01016710.1
## 4 487794.4 233645.5     4 FALSE     1 E01016710 E01016710.1
## 5 487780.5 233672.9     5 FALSE     1 E01016710 E01016710.1
## 6 487779.6 233675.2     6 FALSE     1 E01016710 E01016710.1

You will see that the polygons have been split up into “groups” which refer to each of the LSOA codes - i.e. what you specified in the “region” attribute. The long and lat are unfortunately named as they are in fact Easting and Northings of the co-ordinates making up the polygon. However, by using the fortify() function we have lost the attribute information, which we can add back onto the object using the merge() function.

LSOA_FF <- merge(LSOA_FF, LSOA@data, by.x = "id", by.y = "LSOA11CD")

We can now use these attributes to create a choropleth map. First we setup the map using ggplot(). We can then tell ggplot how the map should look; firstly stating that the objects are polygons (“+ geom_polygon()”), second that the coordinate system is scaled equally, thus, one coordinate unit north is the same as one unit east for example (“+ coord_equal()”), and then we add some adjustment to the x and y labels, and alter the legend title (“+labs”).

Map <- ggplot(LSOA_FF, aes(long, lat, group = group, fill = imd_score)) +  geom_polygon() + coord_equal() + labs(x = "Easting (m)", y = "Northing (m)", fill = "IMD")
Map

We can also add layers to the plot as we did in the previous example. First we will create a fortified version of the roads object.

roads_FF <- fortify(roads)

We can then add this to the “Map” object we just created, however we will build this up as layers.

# Create choropleth
plot1<- c(geom_polygon(data=LSOA_FF, aes(long, lat, group = group, fill = imd_score)))
# Create road plot
plot2<-c(geom_path(data=roads_FF,aes(x=long, y=lat, group=group),size=0.1))
# Combine the plots
ggplot()+plot1+plot2+coord_equal()

We can add a further layer for the locations of the schools, and also adjust the color ramp.

# Create school plot
plot3 <- c(geom_point(data=schools, aes(Easting, Northing,colour='school')))
# Create combined plot
ggplot()+plot1+plot2+plot3 + coord_equal() + scale_fill_gradient( low="#473B31", high="#FFFFFF")

It is also possible to control the other elements of the plot using “theme_bw()” which removes many of the visible elements.

#Create a clean plot
ggplot()+plot1+plot2+plot3 +coord_equal() + scale_fill_gradient( low="#473B31", high="#FFFFFF")  + theme_bw()

However, the plot is still a little cluttered and we can turn off many of the elements using “theme()”

ggplot()+plot1+plot2+plot3 +coord_equal() + scale_fill_gradient( low="#473B31", high="#FFFFFF")  + theme_bw() +
  theme(axis.line = element_blank(),
    axis.text = element_blank(),
    axis.title=element_blank(),
    axis.ticks = element_blank(),
    legend.key = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    panel.background = element_blank()) + labs(fill = "IMD Rank",colour="") 

Making basic maps using tmap

A recent addition to R for creating maps is the tmap package. This is quite flexible and can be used to create various types of maps and geographic representations with relative simplicity. For someone starting with R, this is probably the simplest entry point. First we will install and load the package:

install.packages("tmap")
library("tmap")

We will first load a shapefile for the Leeds, UK.

#Read Shapefile
Leeds <- readOGR("./data","E08000035",verbose = FALSE)

# Have a look at the attributes
head(Leeds@data)
##    LSOA11CD imd_rank imd_score income employment education health crime
## 0 E01011264    18951     14.69   0.09       0.09     23.87   0.09 -0.48
## 1 E01011265    22149     11.71   0.08       0.06      7.89   0.12  0.05
## 2 E01011266    29793      5.46   0.02       0.03      1.01  -1.54 -0.35
## 3 E01011267     9622     27.18   0.16       0.13     31.45   0.91  0.48
## 4 E01011268     5862     35.74   0.19       0.19     49.35   1.03  0.51
## 5 E01011269    15025     19.03   0.12       0.10     22.35   0.22  0.30
##   housing living_env idaci idaopi
## 0   16.54      22.99  0.14   0.08
## 1    9.92      29.35  0.08   0.14
## 2   27.38      17.77  0.04   0.03
## 3   23.35      24.62  0.21   0.18
## 4   22.10      32.85  0.19   0.26
## 5   13.25      30.23  0.12   0.18

We will store the map in the “M” object - first we setup the map and projection - note that we use the CRS 27700 which is the code for the British National Grid:

m <- tm_shape(Leeds, projection=27700)

Next we can add a style to the map - this includes the variable to use for coloring (“col=”), in this case, the rank of the area by the Index of Multiple Deprivation. The style is set as “equal” which relates to how the IMD ranks are broken up into color bins; the border color (“border.col=”) and transparency (" border.alpha =“); title and also pallet for the choropleth (”palette = “) which uses the colorbrewer pallets. Finally, the”showNA" option can be set as true or false and relates to whether missing values are shown in the legend.

m <- tm_shape(Leeds, projection=27700) +
    tm_polygons(col="imd_score", style="equal",n=5,border.col = "grey50",  border.alpha = .5, title="IMD Quintile", showNA=FALSE,palette="Blues")
#Print plot
m

We can then add some final options with tm_layout() to remove the frame around the plot:

m <- tm_shape(Leeds, projection=27700) +
    tm_polygons(col="imd_score", style="equal",n=5,border.col = "grey50",  border.alpha = .5, title="IMD Quintile", showNA=FALSE,palette="Blues") +
  tm_layout(legend.position = c("left", "bottom"), frame = FALSE)
#Print plot
m

As with the ggplot2 and base R maps, it is also possible to build up layers and use lines or points to add context to the maps. Here we replicate the map shown earlier:

# Note, we don't store the results in m this time so the result returned is a plot.
# Add areas and style
tm_shape(LSOA, projection=27700) +
    tm_polygons(col="imd_score", style="cont",n=5,border.col = "grey50",  border.alpha = .5, showNA=FALSE,palette="Greys",title="IMD Rank") +
# Add roads
tm_shape(roads) +
    tm_lines(lwd=0.1, col="#E8E3E2", scale=5)  +
    tm_add_legend(type="line", col="#E8E3E2", title="Roads") +
#Add schools
tm_shape(schools_SDF) +
    tm_symbols(size=0.5,shape=20, col="#f7756d") +
    tm_add_legend(type="symbol", shape=20, col="#f7756d", title="Schools") +
#Remove the frame and set legend position
    tm_layout(legend.position = c("right", "bottom"), frame = FALSE, legend.outside=TRUE)

Alternative types of map

So far we have only considered creating contextual choropleth maps, however, tmap is also useful to create a number of alternative map types including proportional symbol maps and cartograms.

First we will import a file of ward centroids which has a “Crimes” attribute that records the number of recorded crimes within the ward during August 2016.

# Read in crime data; and set crimes as a numeric
Crimes_SP <- readOGR("./data/","crimes",verbose = FALSE)
Crimes_SP@data$Crimes <- as.numeric(as.character(Crimes_SP@data$Crimes))

The following code first creates the choropleth with tm_polygons(), and then appends the bubbles using tm_bubbles() based on the “Crimes” attribute.

# Plot
m<- tm_shape(Leeds, projection=27700) +
    tm_polygons(col="imd_score", style="equal",n=5,border.col = "grey50",  border.alpha = .5, title="IMD Quintile", showNA=FALSE,palette="Greys") +
  
  # Add scaled bubbles
  tm_shape(Crimes_SP) +
    tm_bubbles("Crimes", title.size = "Crimes",col="#0EBFE9") +

  # Add legend and remove borders
    tm_layout(legend.position = c("left", "bottom"), 
              frame = FALSE,legend.text.size = 0.65)

# Plot the map
m

A further type of map that tmap can create is a cartogram; which is actually imported from the cartogram package. This type of representation scales each of the zones by an absolute measure such as population. Thus, zones with low counts become smaller and high counts larger. There are a lot of algorithms to make cartograms, however, the implementation within tmap is the commonly used continuous area cartogram.

We will therefore import some population data and append these to the spatial polygons data frame created earlier called “Leeds”:

#Read population data
Leeds_pop <- read.csv("./data/Leeds_Census_2011.csv")

#Join to spatial polygon dataframe
Leeds <- merge(Leeds, Leeds_pop, by.x = "LSOA11CD", by.y = "Code")

The following code creates the cartogram, however, takes a long time to run. If using this function, then this time increases the more complex and large the set of polygons. An alternative but free non R solution is ScapeToad.

Leeds_Cartogram <- cartogram(Leeds, "Usual_Residents_2011", itermax = 15, maxSizeError = 1.0001,prepare = "adjust", threshold = 0.05) # Takes a long time to process

We can now create a map as we did a basic choropleth, however, you will see that the zones are distorted:

#Map
m <- tm_shape(Leeds_Cartogram) +
    tm_polygons(col="imd_scr", style="equal",n=5,border.col = "grey50",  border.alpha = .5, title="IMD Quintile", showNA=FALSE,palette="Blues") +
        tm_layout(legend.position = c("left", "bottom"), 
              frame = FALSE)

# Plot map
m

Further resources / training