Trial #35: Querying the gov.uk website for COVID-19 Tier by PostCode

1 minute read

Problem:

The UK Government has a special COVID-19 Data Website.

It has a handy tool to pull up local information given a particular.

closeup of tool on the https://coronavirus.data.gov.uk/ website
Look up a COVID-19 summary by postcode
closeup of custom dashboard on the https://coronavirus.data.gov.uk/ website
You are presented with a custom dashboard for the region the postcode falls within

However, if you want to look up a collection of postcodes this process might be time consuming. There is an API and docs provided. However, I could not see an endpoint that took a postcode.

Solution:

Fortunately, the custom postcode dashboards are accessible by a query string. So we can work through a set of postcodes by making a series of get requests.

closeup of custom dashboard on the https://coronavirus.data.gov.uk/ website
Note the end ?postcode=DL10+6DN

I put together a minimal viable powershell script. I will not explain every step here but it uses the Invoke-WebRequest Cmcmdlet to GET and parse the webpage, drill into the required page elements and interpret the human readable text into structured data. There is quite a lot of Regex for validation and data scraping which I plan to cover in another post.

Pitfalls:

Invoke-WebRequest and the automatic parser are not especially quick or optimised. For heavy use it would be better to write a custom tool using a common .NET HttpClient with parallel execution and an optimised scraper.

As my address list contained duplicate postcodes, I first made a list of unique postcodes and then constructed a Hashtable with the results of my above script per postcode. I then join this Hashtable to the original set to avoid repeated lookups of the same postcode.

Another advantage is that you can investigate missing values without iterating over the entire set of postcodes.

$problemPostCode = $dict.GetEnumerator() |? {$_.Value -eq $null} 
$problemTier = $dict.GetEnumerator() |? {$_.Value.covidTier -eq $null} 

# Attempt to correct missing values
$problemTier |% {$dict[$_.Key] = .\Get-CoronovirusGovData.ps1 $_.Key }

There are other data points on the page that could be scraped. However, at this point we have a geographic area for each postcode. We would be far better served using this to query the API and obtain the structured data directly.

Updated: