There is an interesting question β

There are 5 different types of toys at a KFC restaurant. If you go there, you will get one toy randomly. How many times do you need to go to KFC in order to get all 5 toys?

The question is about probabilistic analysis. Different professionals, such as a business analyst, a statistical programmer, a mathematician and a software developer, will have different thinking pathway to solve this problem. Let's see what they would think.

A business analyst will tend to do scenario analysis at the first step.

Assume I am so lucky that each time I visit KFC and get a different toy, then I only need 5 times to have all the five toys. The minimum number is 5.

To get a different toy I need to go to KFC five times. If there is not the toy I want, I will come back with empty hand. Thus, I will have to go to KFC 5*5 = 25 times. Of course, this scenario never happens.

OK. Then the mean times I need to go to KFC seems to be in a range [5, 25). Let's give the simplest try β use the (5+25)/2 to get **15 **times. The number is not accurate but at least we have an estimate.

As a brute-force tool, simulation is the instant thought for a statistical programmer. Let the computer randomly create 10,000 trials β say a person plays the game 10,000 times. After averaging the results, the computer eventually tells the expected times to get the 5 toys.

I modified the SAS code from a post by Rick Wicklin, and set the maximum number of visits to KFC as 32. After 10,000 runs, the mean is **11.37**. The hunch tells that this number should be quite close.

`************(1)Simulate 10000 trials**************************;`

proc iml;

K = 5; /* number of toys */

L = 32; /* max visits per trial */

/* generate NSim trials of L visits */

NSim = 10000;

x = j(Nsim, L);

call randseed(12345);

call randgen(x, "Uniform");

x = ceil(K*x); /* integers in [1,K] */

/* record the visit when 5 toys are taken */

c = j(NSim,1,0); /** allocate */

do i = 1 to NSim;

do j = 5 to L;

rowUnique = countunique(x[i, 1:j]);

if rowUnique = 5 then do;

c[i, 1] = j;

goto skip;

end;

end;

skip:

end;

/* output the result */

create _1 from c;

append from c;

close _1;

;quit;

data _2;

set _1;

/* remove the trials that didn't get 5 toys in 32 visits */

where col1 >= 5;

run;

************(2)Show the results*******************************;

proc sgplot;

density col1;

density col1 / type = kernel;

run;

proc means;

run;

Actually this question is a variant of Coupon collector's problem. The mean/expectation and standard deviation can be derived directly by the formulas. The final expectation should be 5*(1+1/2+1/3+1/4+1/5) = **11.41**. This is the answer.

A software developer considers time complexity and space complexity first. When N is approaching infinity, the question is similar to a merge sorting. Given a merge sort is O(nlog(n)), the expected times must be greater than 5*ln(5) = **8.05**. At least this number will be a lower bound for this question.