A stochastic problem by Warren Buffet solved with simulation

This post presents a stochastic problem, with application to financial theory taken from this magazine article. Some say the problem goes back to Warren Buffett. Thanks to my colleague Norman Markgraf, who pointed it out to me.

Assume there are two coins. One is fair, one is loaded. The loaded coin has a bias of 60-40. Now, the question is: How many coin flips do you need to be “sure enough” (say, 95%) that you found the loaded coin?

See here for the original paper including a analytic solution, but somewhat differing from the solution presented here (not saying the solution presented hereis the right one).

Let’s simulate la chose.

library(mosaic)
library(tidyverse)
library(tictoc)

Here’s the proportion of successes from a coin flip:

{mosaic::do(10) * rflip(n = 10)}[["prop"]]
#>  [1] 0.5 0.4 0.4 0.4 0.4 0.5 0.4 0.6 0.5 0.7

Let’s define a function that does the flipping say a 1000 times with a prespecified (c_coins) number of coins with a given bias.

flip_loaded_coin <- function(n_coins, 
                             repeats = 1000, 
                             prob = .6) {
  
  distrib <- {mosaic::do(repeats) * 
      mosaic::rflip(n = n_coins, prob = prob)}[["prop"]]

}

Now let’s flip 1 coin, then 2 coins, …, then 200 coins, reporting the proportion of success each time.

tic()
distrib <- 1:200 %>% 
  map_dfc(~ flip_loaded_coin(.x))
toc()
#> 8.357 sec elapsed

Extract the confidence interval for each sample size (1 to 200 in our simulation).

distrib_ci <- distrib %>% 
  map_dfr(~ confint(.x)) %>% 
  add_column(ID = 1:nrow(.)) %>% 
  mutate(bias = ifelse(`2.5%` > .5, TRUE, FALSE))

Plot it:

distrib_ci %>% 
  ggplot(aes(x = ID)) +
  geom_hline(yintercept = .5, 
             linetype = "dashed", 
             color = "grey60") +
  geom_errorbar(aes(ymin = `2.5%`,
                    ymax = `97.5%`,
                    color = bias)) +
  theme(legend.position = c(.9, .1))

It appears that approx 100 coin flips are needed to quity confidently (for whatever that means) distinguish a baised coin (with a bias of .6) from a fair coin.

For comparison, let’s look at the interval sizes for the unbiased coin.

tic()
distrib_unbiased <- 1:200 %>% 
  map_dfc(~ flip_loaded_coin(.x, prob = .5))
toc()
#> 8.739 sec elapsed

Extract the confidence interval for each sample size (1 to 200 in our simulation).

distrib_unbiased_ci <- distrib_unbiased %>% 
  map_dfr(~ confint(.x)) %>% 
  add_column(ID = 1:nrow(.)) %>% 
  mutate(bias = ifelse(`2.5%` > .5, TRUE, FALSE))

Plot the CIs for the unbiased coin:

distrib_unbiased_ci %>% 
  ggplot(aes(x = ID)) +
  geom_hline(yintercept = .5, 
             linetype = "dashed", 
             color = "grey60") +
  geom_errorbar(aes(ymin = `2.5%`,
                    ymax = `97.5%`,
                    color = bias)) +
  theme(legend.position = c(.9, .1))