Monday, December 31, 2007

New Year Resolutions

...No more Maximators, but I'll have problems with pouring boiling water on ants. It's the only time I get to be a napalm pilot.

Saturday, December 29, 2007

Flight Of The Conchords - Jenny

Bret: Hello.

Jemaine: Hi.

Bret: Hello man sitting in the park.

Jemaine: I just said hi, woman in the park.

Bret: How you doin'?

Jemaine: Mmm'good thanks.

Bret: Your looking good.

Jemaine: Pardon?

Bret: I said you're looking good.

Jemaine: Fair enough.

Bret: 'Jenny

Jemaine: Pardon?

Bret: Jenny

Jemaine: No I am sorry I think you've mistaken me for somebody else

Bret: No it's me, I'm jenny, my name is Jenny

Jemaine: Oh You're'oh' Ha ha ha ha' I thought' oh' what a hilarious misunderstanding.
Nice to meet you Jenny

Bret: We've met before - quite a few times actually.

Jemaine: Yes of course we have. I meant it was nice to meet you that time that I met you. Where was it that we met that time that I met you when I met you?

Bret: At a party.

Jemaine: That's right! Wasn't it one of those boring work parties?

Bret: No.

Jemaine: That's why I said wasn't it. It was the party of a mutual friend. - Was it? - Wasn't it? - Was it? - Wasn't it?

Bret: Yes it was.

Jemaine: Yeah, I thought so. Oh'Bobby's.

Bret: No

Jemaine: Doug's?

Bret: No

Jemaine: D-dog's?

Bret: No

Jemaine: Maxwell's?

Bret: No

Jemaine: Andy's?

Bret: Yes Andy's

Jemaine: Yeah Andy's party, ooh that's right. Ooh, Andy knows how to throw a party, doesn't he Jenny?

Bret: Yeah, I love Andy's parties!

Jemaine: I love Andy's parties. What crazy parties. How is that guy anyway?

Bret: She's good

Jemaine: Ooh that's right, Andy hates it when I forget that.

Bret: We watched a movie.

Jemaine: Yeah'it was something like but not necessarily Schindler's List. We watched it and we wept

Bret: It was Police Academy 4. We went for a walk

Jemaine: On our feet if I remember correctly.

Bret: We walked to the top of the hill and we ate sandwiches.

Jemaine: Oh, We'd just grab a sandwich and put it in our mouths. Oh, that's the only way to have sandwiches. Oh Jenny, tell me do you still walk? Do you still get into sandwiches in a big way?

Bret: Still walk a lot but I am not eating as many sandwiches as back then

Jemaine: Uh'

Bret: Do you remember what we did up there at the top of the hill?

Jemaine: Kind of'

Bret: We were standing at the look out

Jemaine: Oh, I remember exactly what we did at the look out. We just looked out' across the city from our little spot on the hilltop. Oh, It is so pretty from way up there. We talked about how the lights from the buildings and cars seemed like reflections of the stars that shined out so pretty and bright, that night.

Bret: It was daytime.

Jemaine: The daytime of the night.

Bret: Do you remember what you said to me?

Jemaine: Not word for word actually Jenny, but I remember there was some verbs.

Bret: Well you said meet me here in one year. You just needed some time to clear your head, and you seem to have done that.

Jemaine: La la la la la la la la la la la la la.

Bret: We have a child.

Jemaine: Pardon?

Bret: We have a child.

Jemaine: Why didn't you tell me, Jenny? Why didn't you tell me that day when we went to the top of the hill and we made sweet, oh how we made such sweet, sweet sandwiches. Does it have my eyes, my way with words? Does it look like me at all?

Bret: No, not at all 'cause we adopted him. I can't believe you don't remember, it was a very difficult process!

Jemaine: Oh'uh, oh'are you sure that was me Jenny?

Bret: Yes I am pretty sure that it was you, John.

Jemaine: I'm Brian

Bret: Oh my god! I'm so sorry!

Jemaine: Don't worry.

Bret: Now that's terrible.

Jemaine: Oh, don't worry.

Bret: Oh, how embarrassing!

Jemaine: Don't worry Jenny, I'm actually quite relieved. *That kind of thing just happens all the time, I just got one of those faces I suppose

Bret: So does John, ha, he's got one of those faces as well'

Bret and Jemaine: *awkward laugher*

*Note from blogger: I got kicked playfully once while sitting in a public place. Of course I was pleasantly surprised when I turned around and saw this pretty girl with long tresses and big eyes, only to be disappointed that I wasn't the "I" she had meant to kick (playfully I should add). I just got one of those generic faces I suppose.

Flight Of The Conchords...

...are the best thing since Monty Python. Check out all their vids on youtube, and be in awe of their mastery of guitar licks and comedic timing.

Getting serious on social issues...

and having fun with randomness.

The duo, who are obviously very comfortable with the ladies, have a deadly refrain for the perennially clueless(and the nocturnally prematured). Try these lines next time you find yourself tongue-tied for words:
"I want to make love to you, it's the least I can do. In the bedroom, I'm a gentleman, the ladies come before me....but 2 minutes...2 minutes is all you need--cos I'm so intense..."

Wednesday, December 26, 2007

The Snowman

Flying through the moonlit sky and dancing under the Northern Lights are what childhood dreams are made of. The girl in the attic must have been Norwegian girl Cecilia from Through the Looking Glass, Darkly. There are no penguins in the Artic though. And the haunting minor song belies what seems at first to be a Christmas tale full of innocent fun and glee.

Saturday, December 22, 2007

The Normal Bound

*Warning:Post contains pseudo-philosophical ramblings. Read at your own intellectual risk.

Most of us are only too familiar with crushing disappointments, meaningless jobs, indifferent love lives. Trapped in the monotony of everyday life, we make peace with regrets even as they slowly take the place of our dreams. We yearn for something more, and we continue to hope and dream and be disappointed and unhappy in an unyielding cycle. Sages have often hinted at the key to these life problems. "Seek material comforts, and you become enslaved by them.", "A simple life is a contented life", so says the Zen masters. Even Dilbert got into the act with "Tell me what you need, and I'll tell you how to get along without it". They all mean the same thing: Please lower your expectations.

There is even a mathematical construct for this. We are all bounded by the normal curve, and we aren't even aware of it. In math-speak, our hopes are often dashed because we aim at results which lie outside of the standard deviations that our lives are bounded by.

99.7% of [just about everything] are bounded by 3 standard deviations (sigmas) on either side(=6 sigmas in total). Six Sigma isn't the cult or martial arts movement that its fanciful name suggests, but a set of management practices implemented to improve manufacturing yield.

You wish to be a millionaire? Wealth is most likely a zero-sum
You yearn to be good-looking? Even the biological genes adhere to the normal curve. Just thank your lucky stars you (most of us anyway) are born without major defects.
You hope to have genius level IQ? Sorry to disappoint, but the normal curve doesn't agree.
You pray to be extremely talented? Consult with the normal curve first.
You want to be unique? But you already are...just like everyone else. We live in a wonderful world of predictable uniqueness and conventional individuality.

So what exactly is this normal curve that imprisons us so? Here's a Q-and-A session (with knowledge of high-school statistics assumed) to aid your understanding of your fate.

Q: What is the normal curve?
A: It's a probability distribution that is an approximation to the binomial distribution, which is the distribution of HAVES against HAVE-NOTS, the BLACKS against WHITES...or anything with bipolar outcomes (or Bernoulli trials).

It is also the approximation to a lot of other distributions, including that of sample mean, sample variances, in which case, the Central Limit Theorem applies, but I shall focus only on the special case of the Binomial (De Moivre-Laplace Theorem).

Q: The equation please.

n(x) is the normal equation, and R(x) is its integral or area under the curve with bounds given as -infinity to x.

Q: Why this equation?
A: It could be seen as a conscious sculpturing of what is essentially an exponential curve to a bell-shaped curve. And because if you integrate from -infinity to infinity, you get a probability of 1. Everybody in the world must belong under the umbrella.

But there is a rigorous mathematical proof which is based on the binomial distribution. Using firstly Stirling's Formula to strip away at the binomial coefficients, then approximating (k/n) ~ p and (n-k/n)~ q as n gets large,
(here, we are treating k as the random variable Number of Success, and the law of Large Numbers dictate that the rv goes near the mean), and substituting:
and finally applying Taylor's Formula, one arrives at the beautifully stark Exp[-0.5x^2]. This is a real statistical workhorse found in many applications.

A very clear proof is given in Yakov Sinai's Probability Theory:An Introductory Course.
Google on Sinai

Q: How about the term 1/ root (2pi)?
A: From the constant of Stirling's formula, which in turn is derived from the Wallis product. Wiki on Wallis

Q: The binomial distribution is discrete, varies according to p and n, and comes in different shapes and sizes, so how can it be approximated by the normal curve?
A: Ultimately, they all take on the shape of the bell-curve. The local asymmetries are overshadowed by the general symmetry as n goes larger.

Take for example binomial distribution with n=10, and probability p=0.1, with an obvious skew (they call it right skew, even though the graphs sort of leans left).

If I increase n to 100, the skew disappears--as if by magic. No, it's magic.

Q: How do they make sure that the normal curve, despite coming in all shapes and sizes, have a total area of 1?
A: Realise that area under f(x) = area under hf(hx), where h is scaling factor. It's that simple. Intuitively, for h>1, the height becomes taller, but the width becomes smaller.
For h<1, the height becomes shorter, but the width broadens. The area simply never changes.

In the plot below, the taller plot has h =1, while the flatter plot has h=0.5. The areas under both curves guessed it, 1.

In math-speak,


We get:

which is of the form hn(hx).

Q: How do we convert every single normal curve to their standard form?


We re-evaluate the normal integral with a given range k1 to k2 as:

In doing so, we have also scaled the discrete k-axis (of the binomial) to the continuous z-axis. Some ppl call this the z-transform. But this terminology conflicts with the actual z-transform used in signal processing. I stay away from the term.

Q: Is the substitution arbitrary?
A: No. We know the number of success X (random variable of the binomial distribution) has mean and variance.

In general,

We are turning all binomial distros to normals with mean 0 (centred at 0 ) and variance 1.

Q: If it wasn't arbitrary, then how was it derived?
A: I am not aware of any derivation, but the remarkable coincidence (in the fact that this substitution in the proof reduces the binomial to an exponential curve and at the same time, reduces the mean to 0 and variance to 1) seems like a natural phenomenon just waiting to be discovered.

Q: How can one take a discrete scale and turn it into a continuous scale?
A: This is only an approximation. The area under the discrete scales are formed by blocks of rectangles (or Riemann Sums, since everybody loves Riemann), while the continuous area is formed by passing a curve through the centre of the rectangles. The curve will naturally tend to underestimate the true area. To compensate for the error, we enlarge the range of the integral to include the areas at the extreme sides. They call this the continuity correction.

Q: Why do we use the standard normal statistic table?
A: The integration of the statistical workhorse is difficult. There is a fancy term for the integration: Jacobi elliptic integral, when we change the integration from xy-coordinates to polar. Someone came up with the bright idea to tabulate all the possible permutations of the answers into what is known as the Statistical Tables.
Anyway, in those pre-Casio days, they had tables for everything.

Q: What is the name of this theorem again?
A:De Moivre-Laplace Theorem. It first appeared in 1733, published by Abraham de Moivre. At that time, most of humanity was ekeing out a living not much better off than that of the Dark Ages, and yet math was already, compared with* our present world, graduate level stuffs.

*not compared to

Q: How is it related to the Central Limit Theorem?
A: The CLT is the generalisation of the De Moivre-Laplace Theorem. As long as we have the mean and the variance of any one single trial of the distribution (e.g. for binomial distro, a single trial is the Bernoulli Trial, and hence mean = p, and variance=pq), we can approximate them using the transformation below:

Q:What do I get out of the normal curve?
A: What seems at first to be asymmetry in a collection of data is revealed to be a beautiful(if austere)symmetry. The good news is, you can apply this theorem to practically anything, simplifying many otherwise computationally tedious tasks. The bad news is, you are part of it.

Q:How do I get out of the normal curve?
A:Best of luck.

Thursday, December 20, 2007

Things I learnt from Nassim Taleb

(in order of priority)

1. Beware David Hume's black swan.
No amount of observations of white swans can allow the inference that all swans are white, but the observation of a single black swan is sufficient to refute that conclusion.

Chickens pay the ultimate price by haemorrhaging like elephants.

2. Heed Solon's warning: It ain't over till it's over.

3. Learn from Odysseus who stuffed wax in his ears in the Sea of the Sirens:
I am not intelligent enough, nor strong enough, to fight my emotions.

4. Do not get married to yourself. Change your ideas and opinions as often as is necessary.
(In a parallel, Arthur Rubinstein was known as a pianist who changed his musical ideas and thoughts on a whim, much like George Soros and his investment principles.)

5. Children only learn from their own mistakes, no possible warning by others can prevent them from touching the hot stove. Likewise, book knowledge never lasts compared to life's (bitter)experience.

6. Yiddish saying: If I am forced to eat pork, it better be of the best kind.

7. Maths is a way of thinking, (much) more than a way of computing.

8. Your life expectancy increases each year you get older. If national life expectancy is 75, and you are 74, you are NOT LIKELY to die the next year. Remember that 50% of the population have lower than national life expectancy.
Other similar sayings: 50% of the population has lower than average IQ.
If you are not doing more than the average, you are pulling the average down.

9. Survivorship bias:
Imagine 1,000,000 analysts (or monkeys) making completely random predictions on 2 outcomes (market go up/market go down). 1 epoch (whatever length of time you may choose to define) later, only 500,000 analysts get their predictions correct.
Another epoch later, 250 000 analysts remain standing. After 15 epochs, there are still at least 30 analysts (1mil/ 2^15) surviving with 100% records. They are the fools of randomness we worship as Gurus.

10. Gurus usually develop the unfortunate habit of writing books about their random success.

Sunday, November 25, 2007

End of the road for an Addict

It's not often that humour can be extracted from a subject matter as heavy as World War 2, but there you go...

Saturday, November 10, 2007

4D combinations

Ok. You see the number plate flashes by, and the gambler in you immediately wonders how many permutations of 4D can you buy with that 6 digit combinations - 378953.

Easy. It's 6P4 ( order matters here), and then to account for the double occurence of 3, we divide by 2. 6P4 being 360, the answer is 180.

Right? Wrong. Though the actual answer, 192, is close, the thought process is more tedious than is first suspected. What's more, should any of the 12 more numbers which you miss out strike top prize, you will be left cursing your high school math teacher.

Let's examine what happens with 6P4. It's fundamentally similar to 6C4 (where ordering does not count), and then multiply by 4! ways of reordering the 4 numbers you choose.

6C4 = 15. Let's list down all 15 cases.
Since there are 2 occurence of 3, I shall denote them by 3 and 3.


Let's denote the 15 occurences into 3 groups:
Group 1: No occurence of 3.
Only 7895 here.
4C4 x 4! = 1 x 24 = 24 ways

Group 2: 1 occurence of 3.
8 groups here, but since 3 and 3 are similar, we shall deal only with 4 groups.
4 groups, in which there are 4! ways to order each group:
4C3 x 4! = 4 x 24 = 96 ways

*why 4C3? Since 3 is already selected, we have 4 more numbers 7,8,9,5 vying for 3 places.

Group 3: 2 occurence of 3.
6 groups here.
Since the 3 and 3 are similar, it is no longer 4! to order with each group.
(Remember, 3378 and 3378 are exactly similar.) We do 4!/2! = 12 ways.
4C2 x 4!/2! = 6 x 12 = 72 ways.

*Why 4C2? Since 33 is already selected, we have 4 more numbers 7,8,9,5 vying for 2 places.

Total number of ways = 24 + 96 + 72 = 192 ways.

What if now the 6 sets of numbers is changed to 337789?

We have to split our tasks into 9 groups:(think of trinary system 3^2).
Luckily for us, some groups are mathematically impossible.

Total number of ways = 78 ways.

Imagine if you had used 6P4, and then divide by 2(to account for 3 presumably) and then divide by 2 again (to account for 7), ending up with 360/2/2 = 90. With more careful math, you can actually save yourself from buying 12 repeat combinations!

Monday, November 05, 2007

Great Songs that met their Video Match

Celebrating the simplicity of it all...

...or contemplating the fragility of it all.

Friday, November 02, 2007

Binary Model and Inclusion-Exclusion Theory

A dice is rolled t times. What is the probability that all sides appear at least 2 times?

Sounds like a simple question. We have to be cautious though, for "what is the meaning of life?" is too a simple question that has stumped humanity.

First up, we search for an opening to the problem that is at once accessible to us.
Consider this: if all sides were to appear at least 2 times, the smallest value for t has to be 12. One possible set for t=12 would be {1,1,2,2,3,3,4,4,5,5,6,6}. In fact, this is the only set possible.

We see that for t<12, P = 0. And we can calculate the probability for t=12 by means of simple combinatorics:

This approach may not be useful, but the answer comes in handy to allow us to verify whatever complicated equations we may derive later.

(Or we can build from this approach, evaluating 13 throws as having 1 extra throw to distribute among the 6 sides. 6 choose 1 = 6. Hence, ans = 6*P12.
For t=14, we have extra 2 throws to distribute among 6 sides = 2 balls to distribute among 6 urns. 7 choose 5 = 21. But it is not so simple to evaluate now, as the number of ways is not all the same probablity.)

Where do we go from here? Introduce the binary system! And one that is symmetric in event space. Let each bit represent one side of the dice: A1 A2 A3 A4 A5 A6.

Now, how shall we define 1 and 0?
Consider that we want to find probability of "at least 2 times". In other words, we can find 1 - P(0) - P(1). With this in mind, we define bit 1 as occurence of 1 or 0 times, and bit 0 as occurence of 2 and above times, so that we can do something like 1 - P(...).

Now we can construct the Binary Table!

Let us inspect the meaning of each row.

The first row means all the numbers must appear at least 2 times. Hey! This is the case we want to find! Now things becomes apparent. To find the probability of this row, all we have to do is to sum up the probability of the rest of the rows, and do a 1 - P(prob of the rest of all the rows). Note that all the rows of the Binary Table are mutually exclusive.

The second row. It would mean Number 1 appears once or 0 times, and the rest all appear at least 2 times.

The last row. It means that all sides appear at most 1 time, hence giving a maximum of t=6 (or a minumim of 0). As earlier established, t has to be >=12, so the probability of this happening is 0. We shall not consider this case.
So there are a total of 2^6= 64 cases for us to sweat on.
(actually 63, as we discount 111111....)

Let's start evaluating the probability of Row 2. (Row 1 is 000000. There is very little information we can get from that.)

How can we find the probability that the Number 1 appears only once or none WHILE ensuring that the rest appear at least 2 times? Unfortunately it is too difficult.

Imagine t=12.
For 1 to appear 0 times(NONE): we need to find the number of ways to solve A2+A3+A4+A5+A6 = 12...a case of 12 balls distributed among 6 urns.
For 1 to appear 1 time: we need to solve for the number of ways to solve
A2+A3+A4+A5+A6 = 11...

Mind boggling stuff. And that's only for the 1 out of 62 cases we need to establish.

Instead, for that row, I am forced to consider a variation of it:

Here, I introduce a 3rd state x to mean Don't Care. I do not care what happens to the rest of the number as long as Number 1 appears 0 or 1 time in t throws. The probability of the case I just mentioned is actually humanely possible:

Why the term

Because we still need to account for the fact that this appearance of Dice Number 1 can occur at the first throw, or at the second throw,...up till the t throw. For lack of a better term, let's call this the t Loop.

The formula can now be combined:

Now we shall generate the full table, which I shall call the Don't Care Table.

One very important point to note is that the table is a truncated form. I left out the rows xxxx1x, xxx1xx, xx1xxx, x1xxxx, 1xxxxx, whose calculation is similar to the one we did for xxxxx1. Please bear this in mind! Also note that the rows of this table aren't mutually exclusive!

But how does the Don't Care table help when all we want is to sum up all the 62 rows of the Binary table. Ah....Poincare has this figured out. He found that to calculate the probability of the whole table minus the case of 000000 (we cannot include 000000, else the Probabilty is unity), for example, we must do:

As mentioned, the rows are not mutually exclusive. xxxxx1 and its 5 cousins (xxxx1x, xxx1xx,xx1xxx,x1xxxx,1xxxxx) contain all of what we want from the Binary Table...and more. We have some subtraction on our hands.

Consider the last row 111111. This row is exactly the same whether in the Don't Care or Binary Table. It is the simplest element in a recursive algorithm.
Now consider the second last row x111111. It can either be 0111111 or 1111111.
Hence, to get 0111111, all we have to do is to compute x111111 - 111111.
Therein lies the recursive algorithm, which we use to work back all the way to the top.

If you are not convinced, work out the next row 001111 (which is going to be a bit more tedious.) First we evaluate xx1111:
Hence, 001111 = xx1111-(011111+111111)-101111
Knowing x111111=011111+111111, the equation becomes 001111 = xx1111-x11111-101111

We now need to break down 101111 into its elements.
We consider 1x1111 = 111111 + 101111
101111 is then equal to 1x1111 - 111111.
Since 1x1111 = x11111 in absolute value, we get 101111 = x11111 - 111111

We get 001111 = xx1111 - x11111 - 101111 = xx1111 - x11111 - x11111 + 111111
= (2C0)xx1111 - (2C1)x11111 + (2C2)111111

Finally, we can set up this equation:
(6C1)000001 + (6C2)000011 + (6C3)000111 + (6C4)001111 + (6C5)011111 + (6C6)111111
= (6C1)xxxxx1 - (6C2)xxxx11 + (6C3)xxx111 - (6C4)xx1111 + (6C5)x11111 - (6C6)111111

This is basically the inclusion-exclusion theory, now cast in a different light with my binary model. Bear in mind that it is EXACTLY of the form:

Let's take a look at the next row.

If you realise here, we are going to introduce another binary system. Take stock for a moment what is happening: A binary system and a t Loop inside another binary system which has been modified to a Don't Care system. That's rhetoric you can ignore.

The internal binary arises from having 2 Numbers now combining to give 00, 01, 10 and 11. The equation is now:

Why the term

The t Loop that was mentioned a while earlier has gotten a bit sticky. When there was only 1 event, now there are 2 events A1 and A2 to account for, resulting in a total of 3 distinct groups to arrange: A1,A2 and the rest (t-2). Rearranging the term, we get:

We now revise the Row 2 equation:

We do the same for all rows, and come to a general ROW equation:

Now comes the defining moment.
We are going to loop through all the rows, executing
xxxxx1 - xxxx11 + xxx111-xx1111+x11111
(ignoring 111111 of course),
which is equal to the summation of the probability of all the rows in the Binary Table except 000000.

Summation of the probability of all the rows in Binary Table except 000000=
Alternating inclusion exclusion of Dun Care Table =

Hence, our answer to the question posed right at the beginning is:

I have not verified the answers yet. Remember at the beginning, we solved for t=12. This is a good start to verify the formula.

Note that the internal binary loop would become Trinary (0,1,2) if the question is rephrased:
A dice is rolled t times. What is the probability that all sides appear at least 3 times?

Thursday, November 01, 2007

Analogy of binary systems in probability

Probability is a tricky subject. Often we confuse the sub-branch of counting, which is concerned with the number of ways to perform a certain task with the science of probability. We often forget that each number of ways to perform a certain task does not necessarily share the same probability, or in math-speak, they do not share a symmetric event space.

For example, if you are tasked to find the number of ways to arrange 5 identical balls in 2 urns. The formula for such a scenario is given as below:

As an aside, an ingenious way (unfortunately not mine) of interpreting the formula goes as follows:
I line up all 5 balls in a row. There are 2 urns, so I utilise an imaginary stick and place it between the 5 balls, creating effectively 2 partitions (one left of the stick, and one right). Each partition denotes an urn, so 1 stick is enough to denote 2 partitions. If there are 3 urns, 2 sticks are needed...and so on, giving:

No of sticks needed = No of urns - 1.

Then I simply find the number of ways to order all the sticks and balls, giving me:

(since the balls are themselves indistinguishable and the stick is indistinguishable, so we divide by 5! and 1! respectively.)

What if you are now told that each urn must now contain at least 1 ball each?
Simple. Using the same values, 1 ball is first distributed to each urn to ensure that each urn must now contain at least 1 ball each. That leaves us with 5-2=3 balls left to arrange with the same number of sticks (r-1), giving:

Now comes THE question:
Let the no. of ways of arranging 5 balls in 2 urns given each urn must have at least 1 ball = P1(5,2).
Let the no. of ways of arranging 5 balls in 2 urns given each urn has no restrictions on the number of balls = P0(5,2).

Is the probability of having each urn containing at least 1 ball ( which can be rephrased as the probability of having no empty urn) equal to:

The number of ways to arrange 5 balls in 2 urns is not all equal in probability.

A better way to view the problem uses the analogy of states (like binary states, tri-states that electrical engineers are comfortable with).
The 2 urns represent 2 states, giving us the familiar binary system. Each ball represent one bit. Hence there are a total of:

2^5=32 possible states.

(Of course, we are now simply viewing the problem as putting n distinguishable balls into r distinguishable urns, which is also the number of ways to draw n balls from r unique types(each unique type is an urn!) with replacement, where order counts. Just like a mountaineer who always tries to find new ways to climb the same mountain, mathematicians are always trying to view the same problem from different angles, hoping to gain further insights or discover something which was previously hidden. Back to the problem:)

Breaking up the problem:

The probability of distribution of balls such that we end up with no empty urns becomes relatively clear. It is simply:

Note that 00000 and 11111 all denote 1 empty urn.

The analogy of binary system helps bring probability problem into sharp focus. Another question illustrates its power.

Cards were dealt face down from a deck of 52. What is the probability that the 1st ace encounterd occurs on the nth draw?

We now imagine a binary system of 1 (Ace) and 0 (no ace) with a total of 52 bits.
Picture the table with headers as below:

The meaning of the curly braces, which divide up the table into 2 groups, will be apparent later. But first, we define the sample space, noting that each way has equal probability of occurence.There are a total of 4 aces, to be scattered among all 52 bits. So for every row, there are 4 1s, and 48 0s. So how many rows are there?
Total number of rows possible:

Now, we introduce the restrictions. The 1st ace must occur on the nth draw.
We divide up the group into A1 to An, and An+1 to A52.
In the first group, only 1 ace resides in it, specifically in the An position.
In the second group, 3 aces are distributed among An+1 to A52.

The problem is now reduced to a matter of throwing out unwanted rows and retaining those that we want. So which are the rows that we have to throw out?
1) All those with more than 1 Ace in the group A1 to An.
2) All those with only 1 Ace in the group A1 to An, but whose 1 Ace does not reside in the nth position.

There are quite a lot of rows to throw out here. Instead of throwing out rows, we can think of the rows that we must keep. There is only one group we must keep, which is:
1) All those with 1 Ace in the group A1 to An, and whose 1 Ace resides in the nth position.

For this group, how many ways are there to arrange them? The answer lies in the other group An+1 to A52. Considering that 3 aces are to be found among An+1 to A52, and that there are 52 - n bits from An+1 to A52, we get the final answer:

Now what is the significance of multiplying the answer found above by n? Essentially it gives us the probability of finding 3 Aces in the group An+1 to A52. Without the binary system, these angles may not be so readily apparent.

Lastly, the binary system has made light of an unnecessarily complicated problem: The issue of drawing 4 Venn circle diagrams. I believe most of us were introduced to the Venn diagrams from young, first with 2 circles, then 3 circles. It is a very elegant and simple solution to many probability questions involving many intersections. Put it simply, without John Venn and his marvellous invention, my head would spin like a gyroscope everytime I encounter probability. So if 3 circle Venns have been useful, then 4 circle Venns must be even more useful. I chanced upon this diagram at Wikipedia:

The diagram is flawed as soon as I counted the number of states in it. Using the binary system, for each circle we denote it as ONE bit. There are 4 circles, hence 4 bits. In a 4 bit system, there are supposed to be 2^4 = 16 mutually exclusive regions marked out by the intersections of the 4 circles. (The last region is 0000, representing the area outside of all 4 circles.) But I counted only 13.

So I present my own model in which care was taken to include all 15 different regions as spelt out by the binary system.

A brief explanation:
The 4 events are depicted by their colours:
Red, Blue, Black and Green.
While Red, Blue and Black are the usual 3 circle Venn,
I added Green...modified to resemble a doughnut with a hole in the middle and an extension outside of the 3 circles.

This model is useful for anyone who wishes to understand the inclusion-exclusion theories and other theories based on this, e.g. Waring's Theorem, and also to visualise the De Montmort's Problem of Coincidences better. Now I wonder how did the Wikipedia poster manage to get inclusion-exclusion theory correct with a flawed model.

An alternative model was offered by my brother. (See below). Basically all you have to do is to make sure the new region(Green)splits every other existing region into two...a truly intuitive illustration of the Binary Model indeed! Maybe not too symmetrical though...

Now the problem comes alive when you try to add more and more regions. In adding the regions, we are supposed to cross the boundaries of each existing region one and only one time. Now, is this problem vaguely similar to the famous Bridge of Konigsberg Problem as proposed by Euler?