This simulation is designed to demonstrate the difference between subjective, empirical, and computational probabilities. A simulation was used that records the result of flipping a coin ten times concentrating on the number of heads obtained. This simulation was run sixty times and the results were entered into an Excel spreadsheet for analysis. First subjective probabilities were generated from educated guesswork. Then empirical probabilities were generated based on the results of the simulations. Finally computational probabilities were calculated using the binomial distribution formula. Both the subjective and the empirical probabilities were compared with the computational probabilities in order to emphasize the differences.
To begin with there are several questions regarding the subjective probabilities associated with the eleven possible results. Since five heads will occur more frequently than any other combination, it should be significantly greater than the average value of 9.1%. The assigned value for obtaining five heads is 22%. Since the binomial distribution is symmetric the probability of getting three heads will be the same as the probability of getting seven heads. Knowing that the total number of heads must be between zero and 10 and also that the sum of all the percentages must equal 100% exactly these were assigned a subjective probability of 12%. Again since the binomial distribution is symmetric the probability of getting no heads will be the same as getting all heads. The subjective probability assigned to both of these results was 1%.
From this point the remainder of the subjective probability chart was filled out knowing that the distribution was symmetric, each value had to be between 0 and 1, and the sum of all the probabilities had to be exactly 1. After a little bit of experimentation the table at the top of the following page was generated:
|X = Heads||0||1||2||3||4||5||6||7||8||9||10|
Examination by eye confirms that the distribution is symmetric and each value is within the allowed range. For the final requirement it is readily confirmed that
2 (0.01 + 0.02 + 0.05 + 0.12 + 0.19) + 0.22 = 1.00
and an acceptable subjective probability distribution has been generated.
Assuming that a fair coin is being used, the probability of getting heads on any toss will be 0.50 as will be the probability of getting tails. This means that over an extended period of time half of the tosses are expected to be heads and half of them are expected to be tails. As a result for a total of 600 tosses the expected value for heads is 300. Also since the probabilities of success (heads) and failure (tails) are equal this means that the probability of getting three heads will be exactly equal to the probability of getting three tails. Finally, the probability of getting three heads is exactly equal to the probability of getting seven tails. This is due to the fact that this is a binomial distribution and the only possibilities are heads or tails. If exactly three heads are generated then the other seven tosses must have resulted in tails.
The first graph presented for discussion is a line graph showing the evolution of cumulative heads percentage over all 60 simulations. It would be expected that this cumulative percentage would approach the success probability over a long enough timeline. Since the probability of success in this case (assuming a fair coin) is 0.50 the line graph should approach this value with decreasing fluctuations as the number of simulations increases. This line graph is presented at the top of the following page:
The sharp jump at the beginning simply represents an abundance of heads after a short number of tosses. Around the 25th toss the cumulative probability returned to its expected value and then dipped below 0.50 showing an overall abundance of tails. As expected the line graph continued to fluctuate around 0.50 with the degree of fluctuation decreasing as the total number of simulations increased.
The second graph is a comparison of the subjective probabilities to the computational probabilities calculated. The subjective probabilities can be thought of as a gut instinct while the computational probabilities are exact based on the binomial distribution formula. This graph is presented at the top of the following page:
The subjective probabilities are shown in blue and the computational probabilities are shown in red. For this particular subjective probability distribution the central values were underestimated while the outer values were overestimated. Notice that both distributions are symmetric as would be expected. The overall logic error in the subjective probability distribution appears to be a misunderstanding of how quickly the probability decreases as the number of successes deviates in either direction from the expected value of five. Also notice that the standard deviation of the subjective distribution is larger than that of the computational distribution.
The final graph is a comparison of the empirical probabilities to the computational probabilities. The empirical probabilities are determined from the sixty simulations: empirical probability distributions are expected to approach the computational distribution as the number of trials increases. Again this graph is presented at the top of the following page:
Interestingly enough the empirical distribution also appears to underestimate the computational distribution for the central values and overestimate it for the outer values. Notice here that the empirical distribution is no longer symmetric, and there is no reason to expect that it would be. It appears that sixty simulations are enough for the empirical distribution to be recognizable compared with the computational distribution. In other words the empirical distribution is a reasonable approximation but is not exact.
In conclusion this experiment is very good for showing the differences between the three types of probability distributions. It is simple to execute and the fact that the probabilities of success and failure are equal make the subjective distribution easier to analyze. All three distributions are relatively similar and all of the results were as expected.
The first thing that I learned was that there is actually some reasoning that goes into determining a subjective distribution. Before this experiment I felt it was a mostly useless exercise, especially if the computational distribution was readily available. Now I realize that with a bit of logic it is possible and actually not too difficult to generate a reasonable subjective distribution. The advantage of this is that for situations where the computational distribution is more difficult a ballpark idea can be gathered with relative simplicity.
The next thing that I gained from this project is a better interpretation of the line graph for cumulative percentage of heads. I knew that it would approach the expected value, but I never really thought about what the fluctuations really meant. No I understand that when the cumulative probability is greater than the expected value that there is an abundance of successes while when the cumulative probability is less than the expected value there is an abundance of failures. The last thing that I learned was that a relatively small number of simulations can generate a reasonable empirical distribution. With eleven different possible outcomes I would have thought that considerably more than sixty simulations would have been needed.
On a slightly different note I also learned some new functions for the Excel spreadsheets. The most interesting one to me was the command to generate the exact probabilities for a binomial distribution. All the other functions I used were familiar to me, but it was good practice to use them again. I feel like this project helped me learn and reaffirm quite a bit of knowledge related to probability distributions and the use of Excel to analyze them.
|Number||Heads||Number Heads||Tossed||Percent Heads|
Simulation number, Total number of heads, Cumulative number of heads
Total number of coins tossed, Cumulative percentage of heads
Number of successes, Subjective probability, Actual successes
Empirical probability, Computational probability