This month’s simulation isn’t very interactive. You can restart it by clicking the box but that’s about it. To make up for lack of interactivity it’s very festive. This is a ‘quincunx’ or a Galton board, if you spend a lot of time in British seaside towns you’ll recognise a penny fall machine. Whatever its name, it is a physical device that illustrates one of the most important facts in science and maths – the Central Limit Theorem. Its inventor Sir Francis Galton (cousin of Charles Darwin) had one of the most interesting careers of any scientist: he discovered the ‘wisdom of crowds’, made a beauty map of Great Britain and pioneered the use of questionnaires. For a long and still incomplete list look here

Watch the animation above run for a long time and eventually the histogram below, which counts the relative frequency of the ball’s final positions, looks like a bell shaped curve, known as a normal or Gaussian distribution. An bit of analysis of the Galton board lets us figure out the exact distribution of balls.

Let’s say there are $n$ rows. On the way down the ball bounces right $k$ times and left the rest of the $n-k$ times. This means the ball ends up in the $k^{th}$ bin from the left. The number of paths with $k$ bounces to the right is $n(n-1)…(n-k)$ (imagine choosing the row in which to bounce right, we have n choices initially, then $n-1$ choices for the next row and so on). This number is just the binomial coefficient ${n \choose k}$. Each bounce right has probability $p$ and each bounce left has probability $(1-p)$ thus the probability to end up in bin $k$ is

\[

{n \choose k} p^k (1-p)^{n-k}

\]

So in fact, with our finite number of pegs, the distribution is binomial. However as the parameter $n$ gets larger the binomial approaches closer and closer to the Gaussian distribution. This is a special case of the Central Limit Theorem (CLT), which states that when we take enough independent random samples we get a normal distribution of results, regardless of the underlying distribution we are sampling from. The “Central” in “Central Limit Theorem” refers to the importance of this theorem in probability theory. Galton waxed rhapsodic about it:

“The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement, amidst the wildest confusion.”

The Central Limit Theorem allows the whole of science to proceed confidently. If it wasn’t the case that the underlying distribution was irrelevant then, since we usually don’t know what the underlying distribution is, we could measure averages but we would have no idea what they meant nor how likely deviation from the average was. In reality thanks to CLT and to the very rapid decay of a Gaussian away from the centre we know that for almost all the things we measure most (>99%) of observations will be within $3 \sigma$ of the average and there is a simple way to estimate $\sigma$. Of course there are some important exceptions (in economics for example) but we almost always expect the distributions of real experimental observables to behave well enough that we can use the CLT.

For the mathematically inclined there is a very slick proof of the CLT on wikipedia using characteristic functions, here we will present a more plodding one. Take the sum of $N$ independent identically distributed variables, $x_1, x_2, …, x_N$,

\[

X = \sum_{i=1}^{N} x_i

\]

The probability to get the value $x_1$ is $P(x_1)$ and so on with the same $P$ for $x_2$ and the rest. The probability distribution of $X$ is given by integrating over all possible values of $x_i$ with the constraint that their sum is equal to $X$

\[

P(X) = \int_{-\infty}^\infty P(x_1) dx_1 \ldots \int_{-\infty}^\infty P(x_N) dx_N \delta(x_1 + \ldots x_N – X)

\]

Using the delta function to impose a constraint is a useful trick in probability theory. The next step is to write the delta function as a Fourier integral and seperate everything out. Writing the Fourier transform of $P(x)$ as $\tilde{P}(k)$ this gives, after a little rearranging,

\[

P(X) = \frac{1}{2 \pi} \int_{-\infty}^\infty e^{ikX} ( \tilde{P}(k) )^N

\]

the Fourier transform lets us turn multiple integration into simple multiplication. We now do a trick where we expand and contract an exponential (this is complicated),

\[

\tilde{P}(k) = \int_{-\infty}^\infty P(x) e^{ikx} dx = \sum_{n=0}^\infty \frac{(ik)^n}{n!} \int_{-\infty}^\infty P(x) x^n dx = \sum_{n=0}^\infty \frac{(ik)^n}{n!} \langle x^n \rangle = exp\left[ \sum_{n=0}^\infty \frac{(ik)^n}{n!} \langle x^n \rangle_c \right]

\]

The second last term defines $\int_{-\infty}^\infty P(x) x^n dx = \langle x^n \rangle$ and the last one defines the $\langle x^n\rangle_c$. The trick is in the last part is that the cumulant averages $\langle\rangle_c$ cancel the unwanted terms when we expand the last exponential. Explicitly they are

\begin{align}

\langle x \rangle_c &= \langle x \rangle = \mu \\

\langle x^2 \rangle_c &= \langle (x-mu)^2 \rangle = \sigma^2 \\

\langle x^3 \rangle_c &= \langle (x-mu)^3 \rangle \\

\ldots

\end{align}

You can show that adding a constant only changes the value of the first cumulant, the mean. Also you can put a Gaussian probability distribution in for $P(x)$,

\[

P(x) = \frac{1}{\sqrt{2 \pi} \sigma} exp[ (x-\mu)^2/2 \sigma^2]

\]

and see that for a Gaussian only the first and second cumulants are non-zero.

Now, going back to the expression for $P(X)$, we have to raise $\tilde{P}(k)$ to the power $N$. This has the effect of multiplying the cumulants by $N$,

\begin{align}

\langle X \rangle_c &= N \mu \\

\langle X^2 \rangle_c &=N \langle x^2 \rangle_c = N \sigma^2 \\

\langle X^3 \rangle_c &=N \langle x^3 \rangle_c \\

\ldots

\end{align}

Now define

\[

Y = \frac{(X – N\mu)}{\sqrt{N}}

\]

From the results above this variable has mean $0$ and standard deviation $1$. Its higher moments like

\[

\langle Y^3 \rangle_c = N^{-3/2} \langle X \rangle_c = N^{-1/2} \langle x^3\rangle

\]

tend to zero as $N$ tends to infinity. Because a distribution with only the first two moments non-zero is a Gaussian, as long as $N$ is large $Y$ has a Gaussian distribution with mean $0$ and standard deviation $1$ and hence $X$ also has a Gaussian distribution with mean $N \mu$ and variance $N \sigma^2$.

That is a proof of the Central Limit Theorem, showing that whatever the initial $P(x)$s were, take enough measurements and the sum/average will have a Gaussian distribution. Also note that first moment increases like $N$ while the ‘width’, $\sqrt{\langle x^2 \rangle_c}$ increases like $\sqrt{N}$ thus the more measurements the narrower the distribution becomes: bigger samples = better measurements.

You can see here that higher order cumulants like $\langle x^3 \rangle_c$ affect the distribution introducing, skew, ‘bumps’ or something else entirely. There are hundreds or researchers currently staring at the cosmic microwave background, searching for signatures like this (have a look at this for a pretty technical introduction). The idea is that there are quantum fluctuations in a small region of space. This is a very active research area but there are many good reasons to think that in the early universe something called inflation happened where this small region rapidly expanded. If the initial fluctuations were independent then their distribution was Gaussian and this is stretched across space. However if there are some correlations between fluctuations (so things like $\langle x^3 \rangle_c$ are non-zero) then this leads to non-Gaussian signatures. This is measured by looking at the cosmic microwave background, which is the cleanest record of what happened just after the big bang but before inflation occured. The amount of non-Gaussanity would tell us which theories of the very early universe were actually correct.

Gaussians, whether there or not, are always important.