Saturday, December 22, 2007

The Normal Bound

*Warning:Post contains pseudo-philosophical ramblings. Read at your own intellectual risk.

Most of us are only too familiar with crushing disappointments, meaningless jobs, indifferent love lives. Trapped in the monotony of everyday life, we make peace with regrets even as they slowly take the place of our dreams. We yearn for something more, and we continue to hope and dream and be disappointed and unhappy in an unyielding cycle. Sages have often hinted at the key to these life problems. "Seek material comforts, and you become enslaved by them.", "A simple life is a contented life", so says the Zen masters. Even Dilbert got into the act with "Tell me what you need, and I'll tell you how to get along without it". They all mean the same thing: Please lower your expectations.

There is even a mathematical construct for this. We are all bounded by the normal curve, and we aren't even aware of it. In math-speak, our hopes are often dashed because we aim at results which lie outside of the standard deviations that our lives are bounded by.

99.7% of [just about everything] are bounded by 3 standard deviations (sigmas) on either side(=6 sigmas in total). Six Sigma isn't the cult or martial arts movement that its fanciful name suggests, but a set of management practices implemented to improve manufacturing yield.

You wish to be a millionaire? Wealth is most likely a zero-sum game...so....
You yearn to be good-looking? Even the biological genes adhere to the normal curve. Just thank your lucky stars you (most of us anyway) are born without major defects.
You hope to have genius level IQ? Sorry to disappoint, but the normal curve doesn't agree.
You pray to be extremely talented? Consult with the normal curve first.
You want to be unique? But you already are...just like everyone else. We live in a wonderful world of predictable uniqueness and conventional individuality.

So what exactly is this normal curve that imprisons us so? Here's a Q-and-A session (with knowledge of high-school statistics assumed) to aid your understanding of your fate.

Q: What is the normal curve?
A: It's a probability distribution that is an approximation to the binomial distribution, which is the distribution of HAVES against HAVE-NOTS, the BLACKS against WHITES...or anything with bipolar outcomes (or Bernoulli trials).

It is also the approximation to a lot of other distributions, including that of sample mean, sample variances, in which case, the Central Limit Theorem applies, but I shall focus only on the special case of the Binomial (De Moivre-Laplace Theorem).

Q: The equation please.

n(x) is the normal equation, and R(x) is its integral or area under the curve with bounds given as -infinity to x.

Q: Why this equation?
A: It could be seen as a conscious sculpturing of what is essentially an exponential curve to a bell-shaped curve. And because if you integrate from -infinity to infinity, you get a probability of 1. Everybody in the world must belong under the umbrella.

But there is a rigorous mathematical proof which is based on the binomial distribution. Using firstly Stirling's Formula to strip away at the binomial coefficients, then approximating (k/n) ~ p and (n-k/n)~ q as n gets large,
(here, we are treating k as the random variable Number of Success, and the law of Large Numbers dictate that the rv goes near the mean), and substituting:
and finally applying Taylor's Formula, one arrives at the beautifully stark Exp[-0.5x^2]. This is a real statistical workhorse found in many applications.

A very clear proof is given in Yakov Sinai's Probability Theory:An Introductory Course.
Google on Sinai

Q: How about the term 1/ root (2pi)?
A: From the constant of Stirling's formula, which in turn is derived from the Wallis product. Wiki on Wallis

Q: The binomial distribution is discrete, varies according to p and n, and comes in different shapes and sizes, so how can it be approximated by the normal curve?
A: Ultimately, they all take on the shape of the bell-curve. The local asymmetries are overshadowed by the general symmetry as n goes larger.

Take for example binomial distribution with n=10, and probability p=0.1, with an obvious skew (they call it right skew, even though the graphs sort of leans left).



If I increase n to 100, the skew disappears--as if by magic. No, it's magic.



Q: How do they make sure that the normal curve, despite coming in all shapes and sizes, have a total area of 1?
A: Realise that area under f(x) = area under hf(hx), where h is scaling factor. It's that simple. Intuitively, for h>1, the height becomes taller, but the width becomes smaller.
For h<1, the height becomes shorter, but the width broadens. The area simply never changes.

In the plot below, the taller plot has h =1, while the flatter plot has h=0.5. The areas under both curves are....you guessed it, 1.



In math-speak,



Letting:



We get:



which is of the form hn(hx).

Q: How do we convert every single normal curve to their standard form?

Substituting:


We re-evaluate the normal integral with a given range k1 to k2 as:


In doing so, we have also scaled the discrete k-axis (of the binomial) to the continuous z-axis. Some ppl call this the z-transform. But this terminology conflicts with the actual z-transform used in signal processing. I stay away from the term.

Q: Is the substitution arbitrary?
A: No. We know the number of success X (random variable of the binomial distribution) has mean and variance.

In general,



We are turning all binomial distros to normals with mean 0 (centred at 0 ) and variance 1.

Q: If it wasn't arbitrary, then how was it derived?
A: I am not aware of any derivation, but the remarkable coincidence (in the fact that this substitution in the proof reduces the binomial to an exponential curve and at the same time, reduces the mean to 0 and variance to 1) seems like a natural phenomenon just waiting to be discovered.

Q: How can one take a discrete scale and turn it into a continuous scale?
A: This is only an approximation. The area under the discrete scales are formed by blocks of rectangles (or Riemann Sums, since everybody loves Riemann), while the continuous area is formed by passing a curve through the centre of the rectangles. The curve will naturally tend to underestimate the true area. To compensate for the error, we enlarge the range of the integral to include the areas at the extreme sides. They call this the continuity correction.



Q: Why do we use the standard normal statistic table?
A: The integration of the statistical workhorse is difficult. There is a fancy term for the integration: Jacobi elliptic integral, when we change the integration from xy-coordinates to polar. Someone came up with the bright idea to tabulate all the possible permutations of the answers into what is known as the Statistical Tables.
Anyway, in those pre-Casio days, they had tables for everything.

Q: What is the name of this theorem again?
A:De Moivre-Laplace Theorem. It first appeared in 1733, published by Abraham de Moivre. At that time, most of humanity was ekeing out a living not much better off than that of the Dark Ages, and yet math was already, compared with* our present world, graduate level stuffs.

*not compared to

Q: How is it related to the Central Limit Theorem?
A: The CLT is the generalisation of the De Moivre-Laplace Theorem. As long as we have the mean and the variance of any one single trial of the distribution (e.g. for binomial distro, a single trial is the Bernoulli Trial, and hence mean = p, and variance=pq), we can approximate them using the transformation below:



Q:What do I get out of the normal curve?
A: What seems at first to be asymmetry in a collection of data is revealed to be a beautiful(if austere)symmetry. The good news is, you can apply this theorem to practically anything, simplifying many otherwise computationally tedious tasks. The bad news is, you are part of it.

Q:How do I get out of the normal curve?
A:Best of luck.

No comments: