Last night and some hours yesterday I spent my time playing the browser game 2048. It was great fun and hard to stop. But finally I reached the goal of the game,
and can leave it for good.
There is also a long discussion on this game on hackernews.
Last night and some hours yesterday I spent my time playing the browser game 2048. It was great fun and hard to stop. But finally I reached the goal of the game,
and can leave it for good.
There is also a long discussion on this game on hackernews.
A historian, a data scientist, a programmer, a mathematician, and a philosopher discuss the question, how likely it is that a lottery draw (6
out of 49
) contains two consecutive numbers.
The historian argues that from 1955 up to 2011, there were 5026 lottery draws in Germany, every Saturday, and from 2000 on, two draws every Wednesday. In 2557
cases, there were consecutive numbers among the drawn numbers. Hence, the likelihood is around 2557/5026=51%
.
By the way, he adds, did you know how Casanova got rich while he introduced the lottery in Paris?
The data scientist argues that a lottery draw is a repeatable experiment that can be easily simulated. He fires up his R console and creates first a function that checks if a draw contains consecutive numbers:
> has.consecutives <- function(draw) any(diff(sort(draw))==1)
The function first sorts the numbers drawn, and then calculates the differences between those numbers. If there are consecutive numbers, then there must be a 1
in these differences.
Now one lottery draw can be simulated with sample.int(49,6)
. To get an estimate for the probability of the occurrence of consecutive numbers, he repeats these lottery draws very often:
> N <- 100000
> counts <- replicate(N, has.consecutives(sample.int(49,6)))
> sum(counts)/N
[1] 0.49498
The simulation suggests the probability is slightly smaller than 0.5
. To rule out this result was produced by chance, he calculates a confidence interval for the wanted proportion. There is – and that is charming about the statistical computing environment R – a package for that, which implements several methods for this task:
> library(binom)
> with(
binom.confint(
sum(counts), N,
method="wilson"),
c(lower, upper)
)
[1] 0.4918814 0.4980790
Now the theory states that if the simulation is repeated very often, in 95%
of the cases the true parameter lies in the calculated intervals. Hence the data scientist is quite certain about the first two digits of the probability.
The programmer takes a recursive approach. If we find a function f(n,m)
that returns the number of ways to choose m out of n numbers without consecutive numbers, then we can calculate the wanted probability by
> 1 - f(49,6)/choose(49,6)
Some corner cases for f
are obvious: if m
is 1
, there are n
ways. For n=3
and m=2
, there is exactly one way. By the pigeonhole principle, when m>ceiling(n/2)
, there is no way.
In the general f(n,m)
case, we can distinguish between the ways where neither the first nor the last number is chosen (f(n-2,m)
) and the ways where either the first and/or the last element is chosen. In the latter case, by the inclusion/exclusion principle, the number of ways is 2*f(n-2, m-1) - f(n-4, m-2)
. In case m=2
, the latter term becomes 1
; this is another hard-coded return value f(n,0)=1
. Altogether, the function is defined as
> f <- function(n,m) {
if(m==0) return(1)
if(m==1) return(n)
if(n==3 && m==2) return(1)
# pigeonhole principle
if(m>ceiling(n/2)) return(0)
# inclusion/exclusion principle
return(
f(n-2,m)
+ 2*f(n-2, m-1)
- f(n-4, m-2)
)
}
> 1 - f(49,6)/choose(49,6)
[1] 0.4951984
After some thinking, the mathematician comes up with equivalent problem formulation. Instead of counting the number the draws without consecutive numbers, what if we count the number of ways to draw the blanks immediately after the drawn numbers?
Given a draw N1, ..., N6
without consecutive numbers, the numbers N1+1, ..., N6+1
after each drawn number are not drawn. We have to allow the number 50
to make this work, though.
Hence, the number of draws 6
out of 49
without consecutives corresponds to number of 6
blanks drawn from (50-6)
numbers. So the solution boils down to
> 1 - choose(50-6, 6)/choose(49,6)
[1] 0.4951984
Finally, the philosopher points out that it is not the result, but the solution approaches that are important. The objective truth about consecutive numbers in lottery draws is a fact. Now that it is known, it might be tempting to disregard the ways used to find it. But there is value in subjective reflection, in appreciating the individual thoughts and approaches, as this reflection deepens the understanding of reality.