Exercise concerning shoe sizes¶
This notebook is an element of the free risk-engineering.org courseware. It can be distributed under the terms of the Creative Commons Attribution-ShareAlike licence.
Author: Eric Marsden eric.marsden@risk-engineering.org.
This notebook provides a simple exercise on analyzing shoe size data, fitting probability distributions, estimating quantiles. See the associated course materials for background information and to download this content as a Jupyter notebook.
We start by importing the numpy and scipy.stats libraries.
import numpy
import scipy.stats
Suppose we have been provided the following data concerning foot size measurements for male adults living in the USA and in China (please note that this is fake data).
sizes_usa = numpy.array([264.2606055 , 250.28558746, 261.34085849, 276.91813199, 270.89224811, 259.69859412, 271.24273057, 256.96583946,
272.53843509, 275.52722462, 253.67193351, 247.42186985, 271.73723597, 264.74759079, 270.57292362, 264.32454015, 265.78411038, 261.15839417, 252.70825706, 262.4551271, 266.64645336, 250.7710261 , 253.87963901, 283.9402588, 272.64784862, 267.09165008, 281.14164651, 248.6559977, 262.05368542, 264.15872161, 246.91588005, 267.63400138, 280.38863184, 249.72411677, 261.15086103, 274.82415135, 245.55151293, 255.86574174, 241.36452565, 242.82584408, 268.71765451, 268.25825254, 259.22225528, 276.80013577, 255.39710866, 248.90573902, 248.45854734, 262.60836491, 264.75120702, 279.93344374, 250.51985647, 268.18252613, 275.20961862, 262.37195741, 292.61857599, 246.25634414, 284.45979738, 272.59209998, 275.8836923 , 277.83485973, 272.78072024, 261.68725196, 259.08599666, 239.1756052 , 268.92738639, 278.89888931, 257.84888231, 248.57604218, 257.48344131, 267.26825904, 265.11001094, 291.98506469, 246.20451664, 289.77042382, 263.82922424, 270.60694715, 241.89420674, 272.98139756, 275.77924077, 273.81431896, 292.55644276, 270.61977562, 266.6790913 , 260.35612357, 272.17669851, 256.25978458, 249.98121499, 270.33487481, 246.2563045, 278.41950923, 276.76359193, 255.34659063, 260.56288729, 264.05571348, 247.0062789 , 273.35519756, 280.91068237, 245.54960975, 257.30132308, 251.36965034, 258.80330942, 272.2852862 , 252.91541058, 275.03411616, 259.8764344 ])
sizes_china = numpy.array([221.4022082 , 217.34358652, 233.6904864 , 223.67444491, 217.32876757, 207.28692833, 224.88240488, 235.85034937, 230.83961692, 231.15609745, 217.02018576, 225.62759641,
222.5095129 , 209.78435321, 230.28998288, 220.5971132 , 212.96629567, 216.89236823, 217.33462201, 207.41853359, 222.33495019, 223.0784021 , 224.29873411, 219.67104836, 224.93711638, 227.96766418, 214.64492136, 206.93977596,
226.81487205, 228.83213796, 221.77887899, 224.97203337, 218.65200224, 211.56270384, 223.19415137, 218.01472723, 217.99232987, 225.21723444, 225.32914666, 228.16028171, 223.83825115, 214.27191808, 209.14362017, 210.77464168,
213.59052403, 210.6949903 , 217.65555397, 225.40776962, 233.37833834, 227.09848991])
What is the mean shoes size in the two countries, and how does it compare?
Is the variability in the measured sizes larger in one country than the other?
Plot a histogram of the two datasets. What probability distribution seems appropriate to model the data?
Fit an appropriate probability distribution to the two datasets, plot a histogram, and check the quality of fit.
mu, sigma = scipy.stats.norm.fit(sizes_usa)
model_usa = scipy.stats.norm(mu, sigma)
A US shoe manufacturer produces shoes at sizes corresponding to 244mm to 284mm. What proportion of the adult population are they targeting?
If the manufacturer were to target the Chinese market, which sizes should they make if they wish to target the same proportion of the adult population there?