Doing Science Using Open Data – Part 7: Modelling Populations 2

OPEN SCIENCE

 

In the third part in this series I looked at the UK mid-2011 Census and obtained the data below which represents the summed male and female figures for the UK population from ages 16 through to 44.

680,979
706,234
711,491
741,667
765,895
757,901
757,295
771,297
756,449
768,415
774,921
759,889
768,860
770,810
778,986
782,510
751,251
700,825
690,775
702,024
716,419
729,013
761,347
794,300
820,805
800,550
821,037
819,650
832,297

Data for this group and the 45-65 year age group are graphed in Figure 1.

 

Picture1

Figure 1: Summation of male and female figures for each age from mid-2011 Census. Red bars represent the age group 45-65 and the blue bars represent the age group 16-44

The next stage is to try and model this data. Looking at the blue bars in the graph above, it looks as though there is some periodicity in the data. This isn’t a great approximation but the bimodal distribution can be seen below where the peaks and troughs of the data are shown by the horizontal black lines.

modellingcensusdata

Figure 2

In this approximation of the data there are 3 troughs for every 14 peaks in the data (counting the bars). Initially I thought to represent this as a rectangular pulse and to use a Fourier transform but this was slightly tricky (for me). As there is periodicity in the data I thought it might be possible to use sine and cosine functions. The graph below shows a light wave divided into its electromagnetic components along different axes. Focusing on one axis its possible to see that the wave can be described by an amplitude (i.e the height that the wave reaches) and a wavelength (the distance along the x-axis during one cycle – denoted here by lambda).

2000px-Light-wave.svg

Figure 3 – A Lightwave

Returning to the data above one approximation that looked promising was y = cos (theta) + sin(theta/2) as this produces wide peaks and narrow troughs. Then it should be a simple matter of increasing the amplitude by introducing a coefficient and adding a constant. However the problem is that the ratio of peak width to trough width needs to be 14/3 or approximately 4.6. The sine/cosine graph has a ratio which is much less than this which means it doesn’t fit the data well.

For the moment then the graph can be described by a discontinuous function

1. For X = 16-18, Y = 700,000

2. For X = 19-32, Y = 770,000

3. For X = 33-35, Y = 700,000

4. For X = 36-45, Y = 770,000

This is a simple rectangle function. For the age group 36-45, the Y value could be a little higher but I am approximating a simple fit to the data. As soon as i’ve figured out a function which incorporates periodicity i’ll replace this rectangle function (or if any of the readers work it out – please let me know). For the time being however the above graphs describe the UK mid-2011 census data for the age group 16-45.

Appendix

Doing Science Using Open Data – Part 1

Doing Science Using Open Data – Part 2

Doing Science Using Open Data – Part 3

Doing Science Using Open Data – Part 4

Doing Science Using Open Data – Part 5

Doing Science Using Open Data – Part 6

Index: There are indices for the TAWOP site here and here Twitter: You can follow ‘The Amazing World of Psychiatry’ Twitter by clicking on this link. Podcast: You can listen to this post on Odiogo by clicking on this link (there may be a small delay between publishing of the blog article and the availability of the podcast). It is available for a limited period. TAWOP Channel: You can follow the TAWOP Channel on YouTube by clicking on this link. Responses: If you have any comments, you can leave them below or alternatively e-mail justinmarley17@yahoo.co.uk. Disclaimer: The comments made here represent the opinions of the author and do not represent the profession or any body/organisation. The comments made here are not meant as a source of medical advice and those seeking medical advice are advised to consult with their own doctor. The author is not responsible for the contents of any external sites that are linked to in this blog.

 

2 thoughts on “Doing Science Using Open Data – Part 7: Modelling Populations 2

  1. Pingback: Doing Science Using Open Data – Part 7: Modelling Populations 2 | Open Knowledge | Scoop.it

  2. Pingback: Doing Science Using Open Data – Part 8: Modelling Populations Part 3 « The Amazing World of Psychiatry: A Psychiatry Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s