Regression to the Mean & the Gambler's Fallacy - Simulated

Published 2021-01-26

There are two seemingly contradictory findings in statistics which apply to independent events:

Regression to the Mean
The Gambler's Fallacy

Regression to the mean is defined as:

The phenomenon that arises if a sample point of a random variable is extreme (nearly an outlier), a future point will be closer to the mean or average on further measurements. - Wikipedia

Example: If coin tosses produced heads, the next coin tosses are more likely to produce 2 heads and 2 tails.

Whereas the Gambler's Fallacy is defined as:

The erroneous belief that if a particular event occurs more frequently than normal during the past it is less likely to happen in the future. - Wikipedia

Example: The false belief that if you toss a coin times and each toss is heads, the next toss is more likely to be tails.

These two findings seem to be contradictory.

In the first, we hear that after tossing heads in a row, the next set of tosses are more likely to be evenly split between heads and tails.

In the second, we hear that the past doesn't influence the future for independent events like coin tosses or dice rolls.

Prior Work

Googling "Regression to the Mean vs Gambler's Fallacy" brings up a number of times where people have become confused over these two findings:

I find the medium article by Ku Lok Sun the most succint and accessible explination of the difference between the two, but none of the sources above provide a simulation of the phenomenon.

Simulating both findings

To get a practical sense that the findings are in fact not contradictory, lets perform a test.

First we'll flip coins and plot the results.

will repersent heads.
will represent tails.

Now lets take a look at streaks of or more heads in a row as well as streaks of or more tails in a row.

Heads streaks > are highlighted as
Tails streaks > are highlighted as

To explore regression to the mean and the gambler's fallacy, we're interested in knowing what happens immediately after a streak of . So lets pull out:

Streaks of
1. Streaks greater than will be split into (streak length) / parts.
The coin flip immediately after the streak
The set of flips immediately after the streak of .

We're interested in (2) so we can verify that the chances of heads/tails after a streak is still 50/50 (gambler's fallacy).

We're interseted in (3) so we can verify that a series of flips immediately after an outlier (the streak) is, on average, closer to the mean (regression to the mean).

Streak	Next Flip	Next Flips	Flips Sum
Streaks	/	/	of Closer to mean

Gambler's Fallacy - In the second column above, the probability of flipping heads or tails immediately after a streak is still close to 50/50. The prior streak doesn't impact the next flip.

/ * 100 = %
/ * 100 = %

Note: refreshing this page re-runs the coinflips if you want to run the test multiple times.

Regression to the mean - The last column is a tally of column 3 where heads = -1 and tails = 1. A magnitude < 4 means that we're closer to the mean than the streak. In the last column, nearly all of the sequences of flips immediatelly following a streak are closer to the mean (i.e., have nearly even numbers of heads and tails).

This confirms regression to the mean: after seeing an outlier the next measurement is more likely to be closer to the mean.

/ * 100 = % of runs are closer to the mean.

But we're taking into consideration both heads streaks and tails streaks. Could they by evening one another out?

Heads Only

To be sure, let's re-do the analysis above but for streaks of heads only.

Streak	Next Flip	Flips After	Flips Sum
Streaks	/	/	of Closer to mean

Gambler's Fallacy:

/ * 100 =%
/ * 100 =%

Regression to the Mean:

/ * 100 = % of runs are closer to the mean.

These findings corroborate our original findings.

Conclusion

In accord with the gambler's fallacy, every flip is 50/50, regardless of the streak proceeding the flip.

Since every flip is 50/50, a streak of heads or tails is a deviation from the mean and an outlier. By definition, flips after a streak (an outlier) are more likely be closer to the mean.

Regression to the mean, rather than being contrary to the gambler's fallacy, is really a restatement of it. A sequence of random events will always tend towards the mean. The chance of getting heads or tails on any given flip is the mean.

Demo source: