# Doing Science Using Open Data – Part 8: Modelling Populations Part 3

In the seventh part in the Open Science series I looked at the UK mid-2011 Census and obtained the data below which represents the summed male and female figures for the UK population from ages 16 through to 44*.

680,979
706,234
711,491
741,667
765,895
757,901
757,295
771,297
756,449
768,415
774,921
759,889
768,860
770,810
778,986
782,510
751,251
700,825
690,775
702,024
716,419
729,013
761,347
794,300
820,805
800,550
821,037
819,650
832,297

Data for this group and the 45-65 year age group are graphed in Figure 1.

Figure 1: Summation of male and female figures for each age from mid-2011 Census. Red bars represent the age group 45-65 and the blue bars represent the age group 16-44

The next stage is to try and model this data. Looking at the blue bars in the graph above, it looks as though there is some periodicity in the data. This isn’t a great approximation but the bimodal distribution can be seen below where the peaks and troughs of the data are shown by the horizontal black lines.

Figure 2

In the previous post I looked at a discontinuous function to describe the data

1. For X = 16-18, Y = 700,000

2. For X = 19-32, Y = 770,000

3. For X = 33-35, Y = 700,000

4. For X = 36-45, Y = 770,000

I was originally looking for some sine or cosine functions to describe the data but didn’t come up with any solutions. So this time I turned to Wolfram Alpha. I subscribed to the Pro account and fed the data through (cut and paste) and with the click of a button the program performed several analyses of the data. The software completed a regression analysis and came up with the following values with a 99% confidence interval

α = 683649 +/- 24641

β = 2492 +/- 791

where y = β x + α

So in summary, for the UK mid-2011 census data the population can be modelled with the equation

y = 2492 x + 683649

where

x = age in years

y = population for each age

There are different ways to model the data varying from polynomial equations including the above through to discontinuous functions. The Wolfram Alpha analysis improves on the previous model. By using this equation we can do some further useful analysis.

*Part of the 7th post is reproduced here including the data

Appendix

Doing Science Using Open Data – Part 1

Doing Science Using Open Data – Part 2

Doing Science Using Open Data – Part 3

Doing Science Using Open Data – Part 4

Doing Science Using Open Data – Part 5

Doing Science Using Open Data – Part 6

Doing Science Using Open Data – Part 7

Index: There are indices for the TAWOP site here and here Twitter: You can follow ‘The Amazing World of Psychiatry’ Twitter by clicking on this link. Podcast: You can listen to this post on Odiogo by clicking on this link (there may be a small delay between publishing of the blog article and the availability of the podcast). It is available for a limited period. TAWOP Channel: You can follow the TAWOP Channel on YouTube by clicking on this link. Responses: If you have any comments, you can leave them below or alternatively e-mail justinmarley17@yahoo.co.uk. Disclaimer: The comments made here represent the opinions of the author and do not represent the profession or any body/organisation. The comments made here are not meant as a source of medical advice and those seeking medical advice are advised to consult with their own doctor. The author is not responsible for the contents of any external sites that are linked to in this blog.