Doing Science Using Open Data – Part 1

If you’re a budding scientist but you don’t have the resources to run a big study there is another way to do science – by using readily available datasets. There are various datasets published online that are very accessible. Data.gov.uk is the UK Government’s portal for making government data available under an Open Government License with the intention of producing a transparent government. This is a very flexible license that enables the reuse of the data for many purposes subject to the license conditions. To get the reader started I will walk you through this step-by-step (you will need either Excel or an Excel Viewer)

1. First of all go to data.gov.uk

2. Click on the link to the data

3. The Office for National Statistics (ONS) appears at the top with 847 datasets at the time of writing

4. We’re going to go directly to the ONS site

5. Now click on the data link tab

6. Scanning through the ONS datasets on the 9th October there is a dataset ‘Weekly provisional figures on deaths registered in England and Wales, week ending 28/9/12 Excel sheet 111Kb

7. Right click on this link and save the file

8. Go to the file and open it

9. In the lower tabs, select the workbook for ‘Figures for week 39’

10. Select Persons, Deaths by age group for all ages and generate a graph. I’ve used the 3d Cluster Column bar chart style. Here are the results.

This is where the hypothesis generation begins. A cursory glance at the data reveals that there are three large transitions in mortality. The first is from Age 15-44 to 45-64. The second is from age 75-84 to 85+. The third transition is from age 65-74 to 75-84. The second transition is simpler to understand as this age group includes the upper limit on human lifespan which potentially includes a range of 35+ years. However there is a noticeable increase from 15-44 to 45-64 which is not as  easy to understand. We are moving from a 30 year age range to a 19 year age range but the mortality is increasing several fold. The data suggests a simple hypothesis

H1: In the UK, deaths in the age group 45-64 years of age are several times higher than deaths in the age group 15-44 years of age.

We can generate a second hypothesis

H2: The increase in deaths described in H1 results from a larger population in the age group 45-64 than in the age group 15-44 years of age

Both hypotheses need to be tested and we can do this through the use of additional datasets as well as statistical analyses.

(To be continued)

Index: There are indices for the TAWOP site here and here Twitter: You can follow ‘The Amazing World of Psychiatry’ Twitter by clicking on this link. Podcast: You can listen to this post on Odiogo by clicking on this link (there may be a small delay between publishing of the blog article and the availability of the podcast). It is available for a limited period. TAWOP Channel: You can follow the TAWOP Channel on YouTube by clicking on this link. Responses: If you have any comments, you can leave them below or alternatively e-mail justinmarley17@yahoo.co.uk. Disclaimer: The comments made here represent the opinions of the author and do not represent the profession or any body/organisation. The comments made here are not meant as a source of medical advice and those seeking medical advice are advised to consult with their own doctor. The author is not responsible for the contents of any external sites that are linked to in this blog.

8 comments

Leave a comment