Open Data companion
Last updated: 08 July 2020
Governments and other organisations often make open data available through Web service Application Programming Interfaces or APIs. The World Bank, UK Police, and Transport for London are just a few well-known examples. This document details the steps required to request data from these different Web service APIs using R.
Several R packages1 have been developed as clients for Web service APIs. These don’t assume any knowledge of API endpoints, HTTP requests, or data formats like XML and JSON. These are really convenient but sometimes you want to break into the ‘black box’ of APIs because you want to learn more or because there isn’t an API wrapper package available.
A quick introduction to APIs
APIs or Application Programming Interfaces are a set of rules that allow one software application to interact with another either in the same location or over a network. Inputs and outputs will vary between APIs but the process is the same: a ‘request’ that follows certain programmatic rules is submitted and a ‘response’ containing content in an expected format is returned.
There are many types of API including library-based (e.g. leafletJS) and class-based (e.g. Java) but one of the most common are Web service APIs. A client (browser) submits a Hypertext Transfer Protocol (HTTP) request to a server and the server returns a response to the client. The response contains status information about the request and may also contain the requested content.
The parameters of an HTTP request are typically contained in the URL. For example, to return a map of Manchester using the Google Maps Static API we would submit the following request:
The request contains:
- a URL to the API endpoint (https://maps.googleapis.com/maps/api/staticmap?) and;
- a query containing the parameters of the request (center=Manchester,England&zoom=13&size=600x300&maptype=roadmap). In this case, we have specified the location, zoom level, size and type of map.
Web service APIs use two key HTTP verbs to enable data requests: GET and POST. A GET request is submitted within the URL with each parameter separated by an ampersand (&
). A POST request is submitted in the message body which is separate from the URL. The advantage of using POST over GET requests is that there are no character limits and the request is more secure because it is not stored in the browser’s cache.
There are several types of Web service APIs (e.g. XML-RPC, JSON-RPC and SOAP) but the most popular is Representational State Transfer or REST. RESTful APIs can return output as XML, JSON, CSV and several other data formats.
Each API has documentation and specifications which determine how data can be transferred. Unfortunately, the specifications tend to be different and the documentation can be hard to follow.
An example API request
Querying a Web service API typically involves the following steps:
- submit the request
- check for any server error
- parse the response
- convert to a data frame
In the following example we will submit a request for police reported crime data from the UK Police API. The API uses both HTTP GET and POST requests and provides content in JSON data format.
The two key R packages for submitting HTTP requests to Web service APIs and parsing the content of the response are httr and jsonlite. Let’s load them into our R session. The tidyverse package is also loaded because it contains a suite of useful functions.
library(tidyverse) ; library(httr) ; library(jsonlite)
We would like to retrieve street level crimes within a mile radius of a specific location so we need to use https://data.police.uk/api/crimes-street/all-crime? as our API endpoint (see API documentation). Rather than retrieving ‘all-crime’ lets narrow our request to retrieve only reports of burglary. This will change our path to: https://data.police.uk/api/crimes-street/burglary?
path <- "https://data.police.uk/api/crimes-street/burglary?"
Next we need to build our API request and submit it. We will use the GET
function from the httr package. First we supply the path to the API endpoint and provide search parameters in the form of a list to the query
argument. There are three parameters available to us:
lat
= latitudelng
= longitudedate
= and optional date in YYYY-MM format
request <- GET(url = path,
query = list(
lat = 53.421813,
lng = -2.330251,
date = "2018-05")
)
Let’s check if the API returned an error. If the request fails the API will return a non-200 status code.
request$status_code
Next we parse the content returned from the server as text using the content
function.
response <- content(request, as = "text", encoding = "UTF-8")
Then we’ll parse the JSON content and and convert it to a data frame.
df <- fromJSON(response, flatten = TRUE) %>%
data.frame()
Finally, we might strip out some of the variables and rename the remaining.
df <- select(df,
month, category,
location = location.street.name,
long = location.longitude,
lat = location.latitude)
month | category | location | long | lat |
---|---|---|---|---|
2018-05 | burglary | On or near Washway Road | -2.328411 | 53.422967 |
2018-05 | burglary | On or near Glenthorn Grove | -2.323696 | 53.418287 |
2018-05 | burglary | On or near Waterside | -2.317188 | 53.427410 |
2018-05 | burglary | On or near Leicester Road | -2.322199 | 53.427361 |
That’s it. We’ve submitted a request to the Police UK API and parsed the response into a data frame ready for use in R.
Setup
The following API requests all rely on the tidyverse, httr, and jsonlite R packages. Make sure that you have installed them.
install.packages("tidyverse", "httr", "jsonlite")
Food Standards Agency
The Food Standards Agency provide food hygiene rating data for the United Kingdom.
Example: fast food outlets in Trafford
- HTTP verb: GET
- API endpoint URL: http://api.ratings.food.gov.uk/Establishments
- Selected parameters: name, address, longitude, latitude, businessTypeId, ratingKey, localAuthorityId
- Headers: “x-api-version”, 2
- Data format(s): JSON, XML
- Documentation: http://api.ratings.food.gov.uk/help
# load the necessary R packages
library(tidyverse) ; library(httr) ; library(jsonlite)
# submit the request
path <- "http://api.ratings.food.gov.uk/Establishments"
request <- GET(url = path,
query = list(
localAuthorityId = 188,
BusinessTypeId = 7844,
pageNumber = 1,
pageSize = 5000),
add_headers("x-api-version" = "2"))
# check for any server error
# request$status_code
# parse the response and convert to a data frame
response <- content(request, as = "text", encoding = "UTF-8") %>%
fromJSON(flatten = TRUE) %>%
pluck("establishments") %>%
as_tibble()
# tidy the data
df <- response %>%
mutate_all(funs(replace(., . == '', NA))) %>%
select(name = BusinessName,
type = BusinessType,
address1 = AddressLine1,
address2 = AddressLine2,
address3 = AddressLine3,
address4 = AddressLine4,
postcode = PostCode,
long = geocode.longitude,
lat = geocode.latitude) %>%
unite(address, address1, address2, address3, address4, remove = TRUE, sep = ", ") %>%
mutate(address = str_replace_all(address, "NA,", ""),
address = str_replace_all(address, ", NA", ""),
long = as.numeric(long),
lat = as.numeric(lat))
name | type | address | postcode | long | lat |
---|---|---|---|---|---|
Aminah Tandoori | Takeaway/sandwich shop | 2 Peter Street, Altrincham | WA14 2DS | -2.352011 | 53.38352 |
Barburrito | Takeaway/sandwich shop | 2 The Orient, Trafford Park, Manchester | M17 8EH | -2.349020 | 53.46619 |
Bei Jing | Takeaway/sandwich shop | 79 Great Stone Road, Stretford, Manchester | M32 8GR | -2.286284 | 53.45302 |
Canadian Charcoal Pit | Takeaway/sandwich shop | 6 Church Street, Altrincham, Cheshire | WA14 4DW | -2.349928 | 53.38981 |
Canadian Charcoal Pit | Takeaway/sandwich shop | 143 Barton Road, Stretford, Manchester | M32 8DN | -2.314259 | 53.44645 |
Chinese Kitchen | Takeaway/sandwich shop | 12 Deansgate Lane, Timperley, Altrincham | WA15 6SB | -2.340111 | 53.39759 |
Nomis
Nomis provide labour market, benefit and census data for the United Kingdom.
Example: Claimant count in Trafford for the last 13 months
- HTTP verb: GET and POST
- API endpoint URL: https://www.nomisweb.co.uk/api/v01/dataset/
- Selected parameters: date, geography, gender, age, measure and measures
- Headers: NA
- Data format(s): JSON, CSV, xls
- Documentation: https://www.nomisweb.co.uk/api/v01/help
- R package: nomisr
# load the necessary packages
library(tidyverse) ; library(httr) ; library(jsonlite)
# retrieve the name, id and available parameters for all Nomis datasets with 'claimant' as a keyword
datasets <- fromJSON("https://www.nomisweb.co.uk/api/v01/dataset/def.sdmx.json", flatten = TRUE) %>%
map("keyfamilies") %>%
map_df(bind_rows) %>%
unnest(components.dimension) %>%
select(id, name = name.value, description = description.value, parameter = conceptref) %>%
filter(stringr::str_detect(name, regex('^.*?\\b(claimant(s)*)\\b.*?\\bage\\b.*?$', ignore_case = T))) %>%
distinct(id, name, parameter)
# alternatively try the following link in your browser: https://www.nomisweb.co.uk/api/v01/dataset/def.htm?search=*claimant*
# retrieve the codelists for all the available parameters for the 'NM_162_1' dataset
parameters <- c("https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1/gender/def.sdmx.json",
"https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1/age/def.sdmx.json",
"https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1/measure/def.sdmx.json",
"https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1/measures/def.sdmx.json") %>%
map_df(~fromJSON(., flatten = TRUE) %>%
as.data.frame() %>%
unnest() %>%
select(parameter = structure.codelists.codelist.name.value,
description = description.value, value))
# submit the request
# note that .data.json is appended to the path because we want data in JSON format
path <- "https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1.data.json?"
request <- GET(url = path,
query = list(
date = "latestMINUS12-latest",
geography = "E08000009",
gender = 0,
age = 0,
measure = 1,
measures = 20100))
# check for any server error
# request$status_code
# parse the response and convert to a data frame
response <- content(request, as = "text", encoding = "UTF-8") %>%
fromJSON(flatten = TRUE) %>%
pluck("obs") %>%
as_tibble()
# tidy the data
df <- response %>%
mutate(date = as.Date(paste0(time.value, '-01'), format = '%Y-%m-%d')) %>%
select(date,
area_name = geography.description,
area_code = geography.geogcode,
measure = measure.description,
n = obs_value.value)
date | area_name | area_code | measure | n |
---|---|---|---|---|
2020-05-01 | Trafford | E08000009 | Claimant count | 7785 |
2020-04-01 | Trafford | E08000009 | Claimant count | 6590 |
2020-03-01 | Trafford | E08000009 | Claimant count | 3615 |
2020-02-01 | Trafford | E08000009 | Claimant count | 3600 |
2020-01-01 | Trafford | E08000009 | Claimant count | 3415 |
2019-12-01 | Trafford | E08000009 | Claimant count | 3380 |
2019-11-01 | Trafford | E08000009 | Claimant count | 3510 |
2019-10-01 | Trafford | E08000009 | Claimant count | 3620 |
2019-09-01 | Trafford | E08000009 | Claimant count | 3470 |
2019-08-01 | Trafford | E08000009 | Claimant count | 3505 |
2019-07-01 | Trafford | E08000009 | Claimant count | 3455 |
2019-06-01 | Trafford | E08000009 | Claimant count | 3440 |
2019-05-01 | Trafford | E08000009 | Claimant count | 3385 |
Stat-Xplore
Stat-Xplore provides access to a wide variety of statistics published by the Department of Work and Pensions including Universal Credit, Carers Allowance and Disability Living Allowance.
Example: Housing benefit claimants in Trafford’s LSOAs in August 2018.
- HTTP verb: POST
- API endpoint URL: https://stat-xplore.dwp.gov.uk/webapi/rest/v1/table
- Selected parameters: database, measures, dimensions
- Headers: API key
- Data format(s): JSON
- Documentation: Stat-Xplore : Open Data API
- R package: stat-xplore-R
You will need an API key to access the Stat-Xplore API. Just create a free account and copy the API key from your Account information. For more details visit: https://stat-xplore.dwp.gov.uk/webapi/online-help/Open-Data-API.html
# load the necessary R packages
library(tidyverse) ; library(httr) ; library(jsonlite)
# add your API key
api_key <- ""
# identify the API endpoint
path <- "https://stat-xplore.dwp.gov.uk/webapi/rest/v1/table"
# build your query
query <- list(database = unbox("str:database:hb_new"),
measures = "str:count:hb_new:V_F_HB_NEW",
dimensions = c("str:field:hb_new:V_F_HB_NEW:COA_CODE",
"str:field:hb_new:F_HB_NEW_DATE:NEW_DATE_NAME") %>% matrix(),
recodes = list(
`str:field:hb_new:V_F_HB_NEW:COA_CODE` = list(
map = as.list(paste0("str:value:hb_new:V_F_HB_NEW:COA_CODE:V_C_MASTERGEOG11_LSOA_TO_MSOA:E0", seq(1006074, 1006211, 1)))),
`str:field:hb_new:F_HB_NEW_DATE:NEW_DATE_NAME` = list(
map = list("str:value:hb_new:F_HB_NEW_DATE:NEW_DATE_NAME:C_HB_NEW_DATE:201808"))
)) %>% toJSON()
# submit the API request
request <- POST(
url = path,
body = query,
config = add_headers(APIKey = api_key),
encode = "json")
# check for any server error
# request$status_code
# parse the response
response <- fromJSON(content(request, as = "text"), flatten = TRUE)
# extract list items and convert to a dataframe
dimnames <- response$fields$items %>% map(~.$labels %>% unlist)
values <- response$cubes[[1]]$values
dimnames(values) <- dimnames
df <- as.data.frame.table(values, stringsAsFactors = FALSE) %>%
as_tibble() %>%
set_names(c(response$fields$label,"value"))
National - Regional - LA - OAs | Month | value |
---|---|---|
Trafford 025A | 201808 (Aug-18) | 137 |
Trafford 025B | 201808 (Aug-18) | 36 |
Trafford 025C | 201808 (Aug-18) | 26 |
Trafford 027A | 201808 (Aug-18) | 77 |
Trafford 024A | 201808 (Aug-18) | 72 |
Trafford 025D | 201808 (Aug-18) | 36 |
UK Police
The data.police.uk website provides incidents of police recorded crime and anti-social behaviour in England, Wales and Northern Ireland.
Example: Robberies within the borough of Trafford
- HTTP verb: GET or POST. However, use POST for large / high resolution polygons because of a 4094 character limit with GET requests.
- API endpoint URL: https://data.police.uk/api/crimes-street/all-crime?
- Selected parameters: poly, date
- Headers: NA
- Data format(s): JSON
- Documentation: https/://data.police.uk/docs/
- R package: ukpolice
# load the necessary R packages
library(tidyverse) ; library(httr) ; library(jsonlite) ; library(sf)
# download a vector boundary of Trafford
bdy <- st_read("https://opendata.arcgis.com/datasets/fab4feab211c4899b602ecfbfbc420a3_3.geojson", quiet = TRUE) %>%
filter(lad17nm == "Trafford")
# extract the coordinates and format for inclusion in the API request parameter
coords <- bdy %>%
st_coordinates() %>%
as.data.frame() %>%
select(X, Y) %>%
unite(coords, Y, X, sep = ',') %>%
mutate(coords = sub("$", ":", coords)) %>%
.[["coords"]] %>%
paste(collapse = "") %>%
str_sub(., 1, str_length(.)-1)
# sumbit the API request
path <- "https://data.police.uk/api/crimes-street/robbery"
request <- POST(url = path,
query = list(poly = coords, date = "2018-04"))
# check for any server error
# request$status_code
# parse the response and convert to a data frame
response <- content(request, as = "text", encoding = "UTF-8") %>%
fromJSON(flatten = TRUE) %>%
as_tibble()
# convert to a data frame
df <- data.frame(
month = response$month,
category = response$category,
location = response$location.street.name,
long = as.numeric(as.character(response$location.longitude)),
lat = as.numeric(as.character(response$location.latitude)),
stringsAsFactors = FALSE
)
month | category | location | long | lat |
---|---|---|---|---|
2018-04 | robbery | On or near New William Close | -2.424867 | 53.41983 |
2018-04 | robbery | On or near Parking Area | -2.352642 | 53.38522 |
2018-04 | robbery | On or near Crescent Road | -2.343565 | 53.37457 |
2018-04 | robbery | On or near Padbury Close | -2.396574 | 53.45163 |
2018-04 | robbery | On or near Stretford Road | -2.346899 | 53.44621 |
2018-04 | robbery | On or near Petrol Station | -2.352497 | 53.46765 |
Examples include eurostat, fingertipsR, and WHO ↩︎