Main Sequence Page 5

A Little Probability: Casting Out Dice


Cut to the chase: My current best solution
Cut to the chase: C Program for Empirical Values
Cut to the chase: C Program for Theoretical Values

Contents

Introduction
Methods
First Guess for <T(N)>
Some Notation
PDF(1)
CDF(1)
<T(1)>
<T(1)> != <Tbad(N)>
<T(N)>
Loose Ends
Spontaneous Processes
Expectation in Physics
Where Next?

Introduction
If you've ever played a dice game (say Risk for example, or D&D) where rolling high numbers is desirable, you may feel a small twinge of happiness every time you throw a die and it comes up 6. I do, and so one day I found myself playing a silly game with a big handful of standard dice. I would throw them all at once and set aside any dice that came up 6 from the group; those were the good dice. Then I would scoop up the remainder (those that weren't showing 6 pips) and throw these, repeating the remove-the-6's procedure after each throw. Of course the idea is to try and get all the dice to read 6 in as few throws as possible. Clearly this is a game of skill.

After playing this game for a little while I wondered how many throws of the dice it ought to take in order to complete the game. This led to a study of probability written up on this page. Since I'm re-discovering these ideas and trying to stay reasonably concise, this page will refer to some background concepts listed here.

To clarify the statement of the game: What we seek is an expectation value <T> which is the number of times we expect to throw a handful of N dice such that each die comes up a '6' at least once. Another way to think of <T> is in terms of an experiment conducted many times. Suppose we have 17 dice and we throw them repeatedly, pulling out the ones that come up 6 until we have none left. (We could also paint them blue and keep throwing them until all the dice are blue. Since they are assumed to operate independently it doesn't matter if we keep throwing the ones that have already come up six.)

Suppose this first trial with the 17 dice took 12 throws to eliminate all the dice from the game; this is the result for the first experiment. We repeat this experiment many times, each time recording how many throws were necessary to finish the game. Finally after we've done the experiment many times we average the number of throws we counted for each game to get <T(17)>. We'd like to arrive at a general formula for any number of dice, and we can call this <T(N)>.

Here's the thing: When first looking at this problem I found a solution to the basic question that worked pretty well for a large initial number of dice. In fact I imagined at the time that I had solved the problem. But strangely the solution failed miserably for one or two or three dice. And furthermore the rationale used to produce the formula was very suspicious! Refining my original guess to one that actually works for any number of dice was then my new objective, and this effort produced dozens of pages of mistakes and blind alleys and incorrect results... but I think I finally got it.

So: The question again:

Beginning with N dice, how many throws <T(N)> will be necessary (on average) to produce all 6's?
This is the same as asking what is <T(N)> where each die has come up 6 at least once?


The failure of my original hypothesis (for small N) strikes me as an amusing microcosm of an aspect of science. Reading science history it seems the rule rather than an exception that the early answers to any question are found wanting and it takes some time before a deeper and more complete explanation is found.

Methods
Let's face it, the novelty of counting the number of throws necessary to reduce 30 dice to zero wears thin in a hurry, and that's just one data point.  I have used a computer program that rolls many dice very quickly to arrive at my empirical or experimental values for <T(N)>. I can then compare these results to theoretical tries to see how my theory looks. The theoretical values I can calculate using a different computer program since I wish to compare <T(N)>theory with <T(N)>experiment for at least N = 1, 2, 3, ..., 150.

First Guess for <T(N)>
There are proper rules for probability, and then there are made-up hokey rules. I'll start with made-up hokey rules to arrive at that initial incorrect solution. Here is a first question leading toward the incorrect solution:

What is <T(N)> if the dice behaved perfectly but were not discrete?

This is a really bizarre and unclear statement. For one thing, perfect behavior implies the dice are discrete; a die can come up a 6 or not; it can not come up 30% 6 and 20% 5 and 50% 1, 2, 3, or 4.  So is this question nonsense, a contradiction in terms? Actually what I mean by 'not discrete' is this: Suppose the dice are capable of being cut into pieces, 'fractional dice'. Suppose further that of any number of dice (like say 19.4 of them) can be thrown and exactly 1/6 of them will come up 6 and will therefore be eliminated from the game. This is what I mean by perfectly behaved but non-discrete dice: They distribute their rolls evenly among all 6 possible outcomes, even though there is a non-integral number of dice.

As an aside: I am essentially using a large-numbers approximation here by assuming 1/6th of my dice can be eliminated. Small wonder this solution does not work well for small numbers of dice! Here is a digression on the definitions of discrete and continuous types of probability.

To continue with this derivation: Suppose I start with N dice and 1/6 of them turn up 6's. Then after 1 throw I have R(1) = (5/6)xN dice left. After 2 throws I have R(2) = (5/6)x(5/6)xN dice remaining. After t throws I have R(t) = N x (5/6)t dice remaining. If I set R(t) = 0 and solve for t then t will be my answer for <T(N)>. Simple!

(5/6)t x N = 0

This unfortunately has the solution t = +infinity. But let's not give up hope. We could fudge a little and say "Let's see how many throws it will take to wind up with 1 die remaining." Now my equation will have a finite solution:

(5/6)t x N = 1 and t = ln(N) / ln(6/5) = 0.182 ln N.

Then I say "Hah! I have tricked the problem into telling me how many rolls it takes to eliminate N-1 dice, since I began with N and I now have 1 left. So I can say

<T(N-1)> = 0.182 ln N

or:

<T(N)> = 0.182 ln (N+1)

Of course this is not quite correct because I had that extra die riding along on my way to eliminating N dice, and it might have come up 6 in there somewhere in which case... I'm done before I finish... but this is very confusing! Another thing I could do is take <T(N-1)> = 0.182 ln N and add 6 more throws to get rid of that last die. I should expect to have to throw a die six times in order to expect it to come up six at least once. Or maybe only three times since it could come up in the first 3 throws or the second three throws with equal probability.

You see how muddled my reasoning is becoming here?  This is all just casting about for a clue. It's not correct!

What if I use this fractional-dice idea and start with N dice and roll until I have just 1/2 a die left? In this case since I can't really have half a die left, maybe it goes poof and becomes zero dice precisely at that point. So using the same equation I'll suppose that I'm finished as soon as I have 1/2 a die left:

(5/6)t x N = 1/2 and t = ln(2N) / ln(6/5) = 0.182 ln (2N).

This is my phony answer which seems to work quite well for N = 20, 21, 22, ... and so on for larger values of N, the number of dice I begin with. Is that weird or what?

To reiterate: I start with a large number of dice, I assume that every time I roll them precisely 1/6 of them go away even if this is a non-integer value, and once I have 0.5 dice left it collapses to zero dice and I'm done and that is my value of <T(N)>. Very very suspicious argument!

As I mentioned this formula does not work for smaller numbers of dice, like for N = 1 it gives <T(1)> = 3.8. What is <T(1)> really? This I can calculate precisely by other means, so I'll go on to that next. Once more to summarize the bad answer:

<Tbad(N)> = ln(2N) / ln(6/5).
Some Notation
I do not have the means at this time to render the math in a pretty way. Therefore I will resort to three pieces of notational subsitution.

Sum (i = 1 to N) { f(i) } <==> Summation as index i goes from 1 to N of some function f(i).
Lim (t -> +inf) { g(t) } <==> The limit as t goes to +infinity of some function g(t).
(a b) <==> 'a choose b' or Combination(a  b), equal to a! / ((a-b)! b!).

What is pdf1(t)?
Note: In the above question, 't' means 'upon failing to throw a 6 t-1 times in a row, throw t comes up 6'.
Note: In the above question, the subscript 1 means "using one die".

From here I am jumping into the use of a probability distribution function for the discrete case of fair dice. There is a note on this here. Now I'm trying to proceed properly to eventually arrive at an expression for <T(N)> that is correct for all N. I'll start with <T(1)> which is easy enough to get and to do that I proceed in four steps.

[1]
The first step is easy: Throw away the rubbish from the previous section and just start with P(6) = 1/6. This is the assumption that the dice are all fair, that the probability or rolling a 6 is one in six.

[2]
The expression for <T(1)> depends on the discrete probability density function pdf1(t). Step 2 then is calculating pdf1(t).

[3]
Then I should verify it sums to 1. This sum is called the cumulative distribution function cdf, so showing cdf1(t -> +infinity) = 1 is step 3.

[4]
In step 4 I'll derive <T(1)>.



Step 2 Question: What is pdf1(t), the probability of rolling a 6 on throw t?

(By the rules of the game I must take into account that
I might already have rolled a 6 on an earlier throw
in which case the probability on throw
t is zero
because the die has been eliminated from play.)


To do this in the "counting ways" method I can imagine that a die is thrown t-1 times without ever coming up 6. Here are some spaces for writing down the results of these t-1 throws:

__ __ __ __ __ __ __ ... __
1  2  3  4  5  6  7      t-1

What rolls can go in these spaces? On throw 1 I could have got a 1 or a 2 or a 3 or a 4 or a 5, 5 ways. Same thing for roll 2 and for roll 3 and so on. Since these throws are independent of one another, citing Idea 4 about independent events, the total number of ways of producing these desired t-1 throws is obtained by multiplying the ways for each throw: 5 x 5 x 5 x 5 x 5 x ... x 5 = 5t-1 = ways to reach the desired result (no 6's came up).

To count the total number of ways of throwing a die t-1 times I do exactly the same thing only this time allowing 6's as well, to arrive at total ways = 6t-1. Finally once we've established the probability of arriving at throw t without having rolled a six, which is the number of possible ways divided by the total number of ways, we ask: What is the probability of now finally rolling a 6 on throw t? Answer: 1/6. So the aggregate probability on t throws of:

Not rolling 6 on throw 1 and Not rolling 6 on throw 2 and ... and Not rolling 6 on throw t-1 and Rolling 6 on throw t is:

P(6 on throw t) = (5/6)t-1 x (1/6) = pdf1(t) for one die.

Great. We have a probability distribution function for a single die, and this is the cornerstone of the calculation of <T(1)>.

Step [3]: Does cdf1(t -> +infinity) = 1?
Bearing in mind that cdf(t) = SUM(i=1 to t) { pdf(t) }.

Step 3 Question: Does the cumulative distribution function or cdf for the pdf derived in Step 2 indeed sum to 1?

In full I want to calculate LIM(tau -> +infinity) { SUM(t=1 to tau) { pdf1(t) = (1/6) x (5/6)t-1 } }.

The solution is found from a very pretty idea in algebra. The idea is that the series

S = 1 + a + a2 + a3 + a4 + ... + at

can be multiplied by -a to produce a new series

T = -a - a2 - a3 - a4 - a5 - ... - at - at+1.

When these two series S and T are added together, only the first and last terms remain: S + T = 1 - at+1. The rest of the terms 'telescope' away to nothing. Hence writing T as -aS:

S (1 - a) = 1 - at+1

or

S = (1 - a
t+1
) / (1 - a)

(I got this nice idea from Apostol's Calculus-I.) As a result the sum S for a=(5/6) is:

S = (1/6) x (1 - (5/6)t+1) / (1 - (5/6)).

In the limit t -> +infinity (where it is vital to note that 5/6 is less than one) this becomes

S = (1/6) x (1) / (1 - (5/6)) = 1.

That is, the cdf corresponding to
pdf1(t) converges nicely to one as t goes to infinity, meaning that if I throw a single die for a really really long time, the probability that I will sooner or later roll a six is very good. So far so good.

What is <T(1)>?

<T(1)> = 6. But I should add some text here.

Notice that <T(1)> and <Tbad(N)> do not agree
Perhaps there is a conceptual framework that permits us to take the "dropping out 1/6th" approach in the limit of large N that requires a different way of finishing. But this is taking a false result and trying to shove it into a round hole. Better is to start over from the ground up and derive a correct answer.

What is <T(N)>?
Going from <T(1)> to <T(N)> is not simple, but it's not too horrible either. As for T<1>: I figured out a PDF, made sure it was normalized (summed to 1), and from there wrote the expression for <T(N)>, this time as the double sum where t = 1,..., inf and i = 1, ..., N.

<T(N)> = Sum (t = 1, inf) { t x Sum (i = 1, inf) { F1 x F2 x F3 x F4 } }

The internal Sum over the index i is the expression for the pdf. It remains to specify the four factors F1, F2, F3, F4. They represent the probability of 'finishing the game' on throw t. That is, assuming 'i' dice remain after t-1 throws, what is the probabiity they all come up '6' on throw t? This is a compounded probability, hence four factors; it must not only deal with the last question but should also include the probability of arriving at that state, having precisely 'i' dice remaining in the game.

F1 = (1/6)^i        
This is the probability of rolling 'i' sixes on throw t.

F2 = N choose i 
This is the number of ways of selecting i dice from the original pool of N dice.

F3 = ((5/6)^(t-1))^i
This is the probability that i dice were thrown t-1 times and never came up 6.

F4 = (1-(5/6)^(t-1))^(N-i)
This is the probability that the remaining (N-i) dice were thrown (t-1) times and came up 6 at least once.

So that's it, and it agrees well with experiment. Can it be simplified algebraically? Approximated? I don't know just yet.

Here is code for the experimental calculation of <T(N> and here is code for the theoretical calculation.


Some Loose Ends
Plot <T>experimental versus <T>theoretical.
Does <T> match <Tbad> for large N?
Does <T> match the cheater's-solution that uses N+1?
BThm and Fibonacci
Radioactive Decay

How does this relate to spontaneous processes?

Since our dice behave a little like unstable atoms waiting around to fall apart (come up 6) there is a nice connectivity between the dice problem and radioactive decay. Is the early "wrong" guess more applicable when we consider huge numbers of atoms or dice?

How does one approach time expectation in physics?
The various stages in the proton chain discussed on Page 2 each have associated times, i.e. how long it takes for them to occur in the center of a star. Each process or reaction happens at a characteristic rate. WHY??? What makes one particle likely to change structure in a matter of minutes while another requires a million years on average?  (A partial answer: It depends upon the energy structure of the process in question. I think that many reactions require tunneling through a potential barrier and the harder this is the longer it will take to occur.)

The connection of course is that the dice problem considered here, though discrete, employs ideas useful in probability theory for physical processes. Or so I hope. If I can think about rolling dice and make the connection from problem to pdf to expectation, then the hope is that the thoughts will carry through to a derivation of the Planck radiation law (for example). 

Where next?
So far in this sequence of web pages I've thought about cosmic rays, photoelectrons, synthesis of elements in stars, and cellular automata (in addition to rolling a few handfuls of dice). At this point I think it's important to try to solve a more difficult problem from physical theory. I have an idea of what that problem should be and it comes from this realization: I don't understand something properly if I can use my poor understanding to make up things that don't exist. That is, the things I make up that don't exist don't exist for a reason, so if I don't know the reason then I'm missing something.

Example: The photon is a little quantum of energy that flies along, and as it moves it produces or is a little electric field that oscillates, plus it produces or is a little magnetic field at right angles that also oscillates. Furthermore electrical charges that accelerate produce magnetic fields and magnetic fields that change with time produce what? Electric fields? We have light, electrical fields, charges, magnetic fields all bound together so I figure I ought to be able to construct a magnetic field that can bend the path of a particle of light, or an electric field that does this. Let's call these devices magnetic prisms and electric prisms. The difficult problem I would like to solve is: How to build an electric and/or a magnetic prism capable of bending visible light.

Because this idea is (a) difficult and worse (b) probably next to impossible, I expect to exert some serious effort to fail at it properly. That is, this proposition of understanding magnetic prisms will require a degree of competence and sophistication. So before starting that I'll need a warm-up in the form of two intermediate steps, the next two pages of this Main Sequence.

The first warm-up is easy: Calculate a lot of interesting physical results like the peak emission wavelength of the sun and that from the tip of my nose. The second step will be harder: Count how many photons there are in a given magnetic field (and determine their energies) and from that calculate how long it will take a permanent magnet to run out of energy if I put it to work.

So let's get on with some calculations.


Previous
Next
Home