# Doing Science Using Open Data – Part 8: Modelling Populations Part 3

In the seventh part in the Open Science series I looked at the UK mid-2011 Census and obtained the data below which represents the summed male and female figures for the UK population from ages 16 through to 44*.

680,979
706,234
711,491
741,667
765,895
757,901
757,295
771,297
756,449
768,415
774,921
759,889
768,860
770,810
778,986
782,510
751,251
700,825
690,775
702,024
716,419
729,013
761,347
794,300
820,805
800,550
821,037
819,650
832,297

Data for this group and the 45-65 year age group are graphed in Figure 1.

Figure 1: Summation of male and female figures for each age from mid-2011 Census. Red bars represent the age group 45-65 and the blue bars represent the age group 16-44

The next stage is to try and model this data. Looking at the blue bars in the graph above, it looks as though there is some periodicity in the data. This isn’t a great approximation but the bimodal distribution can be seen below where the peaks and troughs of the data are shown by the horizontal black lines.

Figure 2

In the previous post I looked at a discontinuous function to describe the data

1. For X = 16-18, Y = 700,000

2. For X = 19-32, Y = 770,000

3. For X = 33-35, Y = 700,000

4. For X = 36-45, Y = 770,000

I was originally looking for some sine or cosine functions to describe the data but didn’t come up with any solutions. So this time I turned to Wolfram Alpha. I subscribed to the Pro account and fed the data through (cut and paste) and with the click of a button the program performed several analyses of the data. The software completed a regression analysis and came up with the following values with a 99% confidence interval

α = 683649 +/- 24641

β = 2492 +/- 791

where y = β x + α

So in summary, for the UK mid-2011 census data the population can be modelled with the equation

y = 2492 x + 683649

where

x = age in years

y = population for each age

There are different ways to model the data varying from polynomial equations including the above through to discontinuous functions. The Wolfram Alpha analysis improves on the previous model. By using this equation we can do some further useful analysis.

*Part of the 7th post is reproduced here including the data

Appendix

Doing Science Using Open Data – Part 1

Doing Science Using Open Data – Part 2

Doing Science Using Open Data – Part 3

Doing Science Using Open Data – Part 4

Doing Science Using Open Data – Part 5

Doing Science Using Open Data – Part 6

Doing Science Using Open Data – Part 7