Category: R

2011 Census Open Atlas Project

CensusAtlasThis month has seen the release of the 2011  census data for England and Wales at Output Area Level.

This offers the possibility to map various attributes about people and places for very small geographic areas. Output Areas represent the most detailed geography for which Census data are released and are the building blocks for many popular products such as geodemographic classifications.

Because the data and boundaries are available under an open government licence, and that these data have been usefully placed online as direct downloads (data, boundaries), it makes it  possible to create maps for England and Wales in a highly automated way.

As such, since launch of the Output Area level data I have been busy writing (and then running – around 4 days!) a set of R code that would map every Key Statistics variable for all local authority districts. The code for doing this is fully reproducible, and I have dropped this on my Rpubs blog.

I have generated a PDF atlas for each local authority district, for example:

IF YOU THINK ANY OF THE INFORMATION I HAVE CREATED IS USEFUL, INTERESTING OR OF VALUE, THEN PLEASE  READ THIS BLOG POST AND HELP PROTECT THE NEXT CENSUS!

Why have I created these atlases?

  1. To demonstrate the value of the 2011 census
  2. Provide a free 2011 static Census atlas to anyone who wants one
  3. Because I do not believe web maps should necessarily be the default way of distributing geographic data
  4. To illustrate how open data and software can be used in creative ways to generate insight
  5. An attempt to save local authorities money who might be thinking of doing these type of analyses themselves
  6. To provide reproducible code that enable others to generate similar maps at Output Area level
  7. For fun!
  8. Because R is awesome!
  9. Because R really is awesome!

What is in each atlas?

Each atlas contains a series of vector PDF maps for each Key Statistics variable. The following is a map from the Liverpool Atlas and shows the percentage of “White: English/Welsh/Scottish/Northern Irish/British” for each Output Area in Liverpool.

white

About the data and maps

Almost every non count variable (apart from Hectares) was mapped from the  Key Statistics data disseminated by Nomis, and are either percentage scores or some type of ratio / average. Maps were excluded where there were only a few scores within a local authority district – you can see further explanation of this on the Rpubs page accompanying the analysis. A couple of further points…

  • The variables mapped were based on the calculations that were part of the Nomis data.
  • I have always been a fan of blue choropleth maps which was why the particular colour scheme was chosen.
  • The cartography was automated for all the maps – this means it is more successful for some local authority districts than in others. Some issues I have noted;
    • Those local authorities with many wards appear a little busy with labels (e.g. Cornwall)
    • Cardiff  appears to have a rogue polygon which may be issue with the OA to higher geography lookup table. I will investigate this in a future release…. [Power of the crowd reveals that this is in fact Flat Holm island - thanks to @geospacedman]
    • It would be nice to add scale bars and north arrows to the maps, however, this was proving to be problematic when outputting to PDF. Again, I will try and fix this in a future release.
  • The boundaries used are the generalised files to increase mapping speed and reduce file size – these could be supplemented for the full resolution boundaries in the future
  • These maps are without guarantee or warranty / feel free to fix my code!

View the maps

All maps are available after clicking continue reading….

Continue reading

Creating 2011 Census Output Area Change Maps Using R

E08000003The 2001 Census used a different set of Output Areas (OA) than the current 2011 boundaries; reflecting changes in the spatial distribution of the underlying population. For example, if an area has become more heavily populated since 2001, it makes sense that a previous OA might be split into multiple new segments.

The ONS have provided both the Shapefiles and lookup tables for these changes, however, as yet, I haven’t seen any maps of these changes.

I have had a go at creating these in a reproducible way using R – the code with links to all the data (which is public domain) can be found on my Rpubs page. At the base of the Rpubs post are links to downloadable PDF maps of all local authority districts in England and Wales.

A recurring pattern that will become clearer when the high resolution census data become available in 2013, is the splitting of OA in the centre of many large urban areas, typically as a result of increased population density. A couple of direct links to maps are as follows:

For the remaining maps and R code, see the Rpubs page.

Using R with Routino to provide road network paths between random Tweets and an iconic Smiths landmark

A couple of days ago I posted how you can go about installing Routino on OSX; and now I have just finished writing a quick post over on my Rpubs blog about how you go about using it from within R. I also wanted to know a bit more about how R and Twitter play together so this is woven in too. Oh, and I was also listening to the Smiths back catalogue today – thus; you end up with:

Using R with Routino to provide road network paths between random Tweets and an iconic Smiths landmark

For those who don’t know what the connection between the Salford Lads Club and the Smiths is; then have a look at this video:

How Scenic is the HS2 Route?

It is fairly clear from the duration between this and my last post that various other things have been getting in the way of updates. Anyway, I shall try and post a few updates on news and things I have been working on recently in the coming weeks before getting back to regular posting!

Back in January I had a student working on a dissertation about the High Speed 2 railway. This got me thinking about what sort of data could be used to characterise the route. As it transpired there wasn’t a publicly available Shapefile of the route at the time, however, an ex-colleague (Daryl Lloyd) who by chance now works for the Department for Transport, had almost in unison realised the same thing; and indeed, on the day I had contacted him was negotiating with HS2 Ltd to release the file. This is now available to download from here.

One unusual dataset that I thought would provide interesting context is the My Society project ScenicOrNot. This application enables users to rate the level of “scenic”[ness] of a series of random georeferenced photographs taken from the Geograph project. The raw scores are available to download here. For each picture lat / lon, multiple votes were concatenated in single line. As such, the records were split up, so one each vote appeared as a single line in the exported CSV. This was done using the following R code.

#Read in Scenic Data from http://scenic.mysociety.org/
Scenic <- read.delim2("http://scenic.mysociety.org/votes.tsv", header = TRUE, sep = "\t", quote="\"")
AllVotes <- NULL
list <- for(x in 1:nrow(Scenic)) {
row <- Scenic[x,]
Lat <- row$Lat
Lon <- row$Lon
ID <- row$ID
Votes <- as.data.frame(strsplit(as.character(row$Votes),",")) # Gets the votes as a dataframe list
Votes$Lat <- Lat #Add Lat
Votes$Lon <- Lon #Add Lon
Votes$ID <- ID # Add ID
names(Votes)[1] <- "Votes" #Rename Votes list
AllVotes <- rbind(AllVotes,Votes)
rm(Votes,ID,Lat,Lon,row)
print(x)
}
AllVotes_test <- AllVotes
AllVotes_test$Lat <- as.numeric(AllVotes_test$Lat)
AllVotes_test$Lon <- as.numeric(AllVotes_test$Lon)
write.csv(AllVotes_test, file = "scenic_final_out.csv", row.names = FALSE)

The resulting CSV can be downloaded here. This relates to an extract from January 22nd 2012.

These data were then converted into OSGB and imported into a PostGIS database. A point in polygon operation was used to create average scores for a 5km grid over England. The shapefile with average votes can be downloaded here.

Created using QGIS, the following maps show the output of these analyses…




When we overlay the HS2 route onto these data we can see that this passes through areas with varying degrees of “scenic”ness.




Although these data are interesting in themselves, there is obvious utility if this sort of information was combined with other indicators such as population density and characteristics. The assumption being that all other things being equal, then people may object to disruption in those areas which they consider more “scenic”… perhaps something for further work!

Extracting all Crime Data for England and Wales using R and MYSQL

Last week I started creating some data extraction code for the new England and Wales crime maps website using the R software / language. Although there is an API, a more efficient way of accessing all of the data (and without causing stress to their API server) is to download the CSV files located here for each police force. To download these manually, extract the data and process in R would take a very long time, not to mention be very dull. BUT….

With some R magic, all is not lost, and the data can even be easily imported into a MYSQL database with ease using a relatively small amount of code.

You can use the code to download data by street, or by “neighbourhood” (I am still not sure what these are?). And, with luck, if the server / naming conventions do not change, the code should be re-usable each time new data is released.

You need both R and MYSQL installed – see here and here.

The only things which you need to specify in the code are:

?View Code RSPLUS
1
2
3
4
5
con < - dbConnect(MySQL(), user="root", password="password", dbname="Police", host="localhost")
#and
ym &lt;- '2010-12' #yyyy-mm
level &lt;- 'street' #'street or neighbourhood'
downloaddir &lt;- '/home/alex/Desktop/' #where you will download the files

Continue reading

Ubuntu – Installing R and RGedit Plugin

R is a fantastic bit of software which I have been using on and off for a number of years since I gave up on SAS due to their (free for academics) annoying licenses and limited support for the things I wanted to do – R is infinitely flexible and totally free. Installing R on Ubuntu is very easy:

?Download code.txt
1
sudo apt-get install r-base

You can then start R from the terminal by typing R and pressing enter… or…

If you prefer, I have found the RGedit plugin for the pre-installed text editor Gedit to be very good. This can be installed as follows:

?Download code.txt
1
2
3
4
wget http://www.kaduk.net/~mateusz/gedit-r-plugin/kaduk.asc cat kaduk.asc | apt-key add -
echo "deb http://www.kaduk.net/~mateusz/gedit-r-plugin ./" &gt;&gt; /etc/apt/source.list
sudo apt-get update
sudo apt-get install gedit-r-plugin

After install the only thing you need to do is fire up Gedit and select “Preferences” from the “Edit” menu. Then on the plugins tab check the box “R integration”. This should open an R console at the base of Gedit and load R (Click on the picture to see how this looks).