Doing Science Using Open Data – Part 4: Is the UK Population Normally Distributed According to Age? (No)

In the first three parts of this series, I looked at UK mortality data which is freely available in conjunction with data from the UK census. I generated two hypotheses

H1: In the UK, deaths in the age group 45-64 years of age are 4 times higher than deaths in the age group 15-44 years of age.

H2: The increase in deaths described in H1 results from a larger population in the age group 45-64 than in the age group 15-44 years of age

H1 was generated from UK data and wasn’t tested any further. However H2 was partially tested (the dataset was incomplete) and appeared to be incorrect on further testing. A more convincing result would be obtained from statistical testing. Intuitively it seems quite obvious that H2 is false.

For ages 16-44 we got the following data

680,979
706,234
711,491
741,667
765,895
757,901
757,295
771,297
756,449
768,415
774,921
759,889
768,860
770,810
778,986
782,510
751,251
700,825
690,775
702,024
716,419
729,013
761,347
794,300
820,805
800,550
821,037
819,650
832,297

For ages 45-65 we got the following data

832,727
838,064
831,041
813,798
797,077
770,066
739,859
723,861
708,371
682,824
659,795
637,073
641,145
634,399
618,132
623,508
638,118
655,668
694,644
754,834
583,734

Now it is useful in comparing these populations to get an understanding of what they look like. I’ve graphed the two populations below. The red bars show the population in the age group 45-64 with increasing years (i.e 45, 46 etc). The blue bars show the age group 16-44 again with increasing years for successive bars.

In comparing the two populations we usually make assumptions about the populations. The most commonly discussed population distribution in statistics is the normal distribution.

A selection of Normal Distribution Probability Density Functions (PDFs), Author InductiveLoad, Public Domain

Clearly when we are looking at increasing age, these two populations are not normally distributed. If they were then the there would be a central peak with tapering on either side. Eyeballing the data reveals homogeneity in the age group 16-44 whilst there is a slight left sided skewing of the data in the 45-65 age group. However if we look at the original population data from the census we get the following

This graph was discussed briefly in the previous post. What is clear is that this is not a normal distribution. What is even more interesting is that this is not even a sample. This is the population based on the Census. In other words according to age, the UK population is not normally distributed. The Census doesn’t yet contain the data for the over 90 age group but there is a clear trend in the over 80 group even without this data. What is clear from the above is that the population is skewed to the left. This is hardly surprising as the lifespan is finite and age is a risk factor for mortality. What is also interesting about this graph though is that its not the nice idealised left skewed graph we might expect. Rather than beginning at a peak or trough, the graph begins at an intermediate level before passing through troughs and peaks followed by a steady decline at around age 45. This decline is broken up by a sharp increase in the mid-sixties. The graph might almost be described by the superposition of several distinct graphs.

Returning to our original question of comparing the two populations we can see that they do not come from a normally distributed population. The first group (16-44) comes from a part of the graph which varies between 300,000 and 400,000 people per age (in years). The second group (45-65) comes from the part of the graph which begins to show the downward trend in population per age (in years). The spike complicates matters slightly. Nevertheless we can see that they are not similar populations when we consider population as a function of age.

Appendix

Doing Science Using Open Data – Part 1

Doing Science Using Open Data – Part 2

Doing Science Using Open Data – Part 3

Index: There are indices for the TAWOP site here and here Twitter: You can follow ‘The Amazing World of Psychiatry’ Twitter by clicking on this link. Podcast: You can listen to this post on Odiogo by clicking on this link (there may be a small delay between publishing of the blog article and the availability of the podcast). It is available for a limited period. TAWOP Channel: You can follow the TAWOP Channel on YouTube by clicking on this link. Responses: If you have any comments, you can leave them below or alternatively e-mail justinmarley17@yahoo.co.uk. Disclaimer: The comments made here represent the opinions of the author and do not represent the profession or any body/organisation. The comments made here are not meant as a source of medical advice and those seeking medical advice are advised to consult with their own doctor. The author is not responsible for the contents of any external sites that are linked to in this blog.

4 thoughts on “Doing Science Using Open Data – Part 4: Is the UK Population Normally Distributed According to Age? (No)

  1. Pingback: Doing Science Using Open Data – Part 5: Looking at Populations « The Amazing World of Psychiatry: A Psychiatry Blog

  2. Pingback: Doing Science Using Open Data – Part 6: Modelling Populations « The Amazing World of Psychiatry: A Psychiatry Blog

  3. Pingback: Doing Science Using Open Data – Part 7: Modelling Populations 2 « The Amazing World of Psychiatry: A Psychiatry Blog

  4. Pingback: Doing Science Using Open Data – Part 8: Modelling Populations Part 3 « The Amazing World of Psychiatry: A Psychiatry Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s