I’ve always wondered why high schools bury students in calculus instead of teaching them the beauty of statistics and probability. A tiny fraction of these students will actually use calculus in their lives, but statistics are for everyone. And without clear statistical principles in our head, we get intimidated by numbers. I thought a small write-up on probability and statistics that touches upon some of their *arcane* concepts without sounding too technical was in order.

Randomized response

To reel you guys in, let me begin with a real world application of probability, nothing obscure. Just an interesting use for the concept.

Suppose you’re conducting a survey where you ask people whether they cheated on their spouse. In spite of repeated assurances of confidentiality, the participants could never be sure that their data wouldn’t be traced back to them. After all, it’s on a piece of paper or a file on some computer. Who’s to say some disgruntled employee wouldn’t release them to the world?

The *randomized response *method, that’s who. It allows us to obtain our data without causing a rip tide of punitive alimonies. Here’s how it goes. When the participant comes to a yes/no question about a sensitive issue, he flips a coin. If the coin comes up *heads*, he fills *Yes*. If it comes up *tails*, he answers truthfully. It’s that simple. No one watches him flip the coin, so his motivations for filling *Yes *are secret.

We know that if we flip a coin enough times, we’ll get *heads* roughly half the time. So let’s say 1000 people participated in the survey, and assume that 700 of them answered *Yes* to the damning question. And 300 answered *No*. There is only one reason to answer *No*—you didn’t cheat on your spouse. This means every person who answered *No* got *tails* on the coin flip. That means an equal number of people must have gotten *heads* (300). So, out of the 700 who answered *Yes*, 300 did so because of the coin flip, which leaves 400 people who definitely cheated—their spouses are none the wiser.

Bayes’ theorem

Thomas Bayes blew our minds on conditional probability, you know, those icky questions like, “If it rains tomorrow, what’s the probability that the bus will be late?” The Bayes theorem, if one’s unfamiliar with it, gives us some counter-intuitive answers to questions that we would otherwise take for granted.

*Say 1% of women over forty have breast cancer. Assume that 95% of women with breast cancer will test positive for it. Also assume that 5% of those without breast cancer will also test positive—false alarms. If a woman tests positive, what’s the probability that she actually has breast cancer? 95%? 90%? It’s at least 50%, right? It’s actually about 16%, which, incidentally is the percentage of doctors who got this question right.*

Whenever an event we test for is present in a small fraction of the population, however precise the test, any true positive will be drowned in the absolute number of false alarms. Welcome to the world of Bayesian probability. Simply put, if 10000 women were tested for breast cancer, and 100 of them actually have it, 95 of the 100 will test positive. And out of the 9900 who don’t have breast cancer, 5% or 495 will test positive. This means, for every 10000 tested, 590 will test positive, of which only 95 will actually have breast cancer—16%.

This is why doctors re-test the samples that test positive. In this case, if a sample tests positive twice, the probability of cancer rises to 78%. Fun, right?

Confidence limits and statistical significance

Whenever those of us in the science fields hear the word *significant*, we go, “Oh yeah? Prove it.” When we say ‘significant’ we mean statistically significant with a given probability value. Even those outside the sciences hear of *confidence limits *and statements like “We know this with 95% confidence…” So what does it mean to have statistically significant information or to have confidence in it?

If we conclude something from a study with 95% confidence, we mean that we allow for a 5% chance that our results were sheer dumb luck. In other words, even though scientific research follows an *innocent until proven guilty* principle, if we kill 5 out of every 100 innocent people, we call it a good day.

To elucidate this, let’s say I gave you a coin and told you that it favors *heads*, i.e. if flipped enough times, it will give more *heads* than *tails*. It’s up to you, the skeptic, to test it instead of just believing me.

So you flip the coin once and get heads. Eureka! This coin favors heads! Not so fast…there was a 50% chance of getting heads by pure chance anyway. At best, you can state with 50% confidence that this coin favors heads. So you ante up again and re-flip this coin. Another heads. Don’t call Stockholm just yet. There’s now a 50% of 50% i.e. 25% probability that these two results were pure chance. But your confidence has increased now. You can state with 75% surety that there’s some funny business with the coin.

You flip it again. Another heads. Now your confidence has gone up to 88%.

Flip again. Another heads? You’re now 94% confident that the coin is biased. With the next flip, your confidence rises to 97%, which is more than enough for most scientific experiments.

Of course, I give this example to explain the intuition behind the % confidence concept. This experiment takes for granted a lot of things that change with every flip—how high you flip, air resistance, which side faces up when you flip, etc. In reality, you don’t accuse a coin of bias after five flips.

Expectation, Law of large numbers, and the Gambler’s Fallacy

Consider an unbiased six-faced die with the faces numbered 1 through 6. If you roll a 1, you get $1 and if you roll a 2, you get $2…you get the idea. We all know that the probability of landing any particular number is 1/6. If you threw enough times, what’s the average amount of money you’d make per roll?

Expectation simply means the probability of an event multiplied by the reward or punishment associated with that event. There’s a *one-in-six* chance of rolling any given number.

The law of large numbers says that if you roll this die enough times, your expectation per roll winds up around $3.5. Every number is equally likely, so the reward expected from any particular roll is the average of the rewards for each number—

(1/6 X 1) + (1/6 X 2) + (1/6 X 3) + (1/6 X 4) + (1/6 X 5) + (1/6 X 6)

= (1+2+3+4+5+6)/6

= 21/6 or $3.5

We must remember that this *averaging out* happens over many many rolls…nearly approaching infinity. If we ignore this, we commit what’s known as the *gambler’s fallacy*. Every number on the die is equally likely, and each roll is independent of any other. If you rolled 1, 2, and 3 in succession, it doesn’t mean that 4, 5, and 6 are due. Every roll has 1/6 likelihood of yielding a particular number. Yes, if you rolled the die 60000 times, you’ll most likely end up with equal rolls for each number.

People who buy lottery tickets based on numbers that are *due* are fooling themselves. Then again, people who expect to make a lot of money on lottery tickets wouldn’t be swayed by statistics and probability anyway.

So there it is. A small primer on statistics and probability with some real-world examples. Some of this is oversimplified here and more nuanced in actuality. Some of the intuitive explanations are based on how I understand them and subject to further exposition.

**Related articles**

- Statistics or Calculus? Do both! (learnandteachstatistics.wordpress.com)
- Yudkowsky — Bayes theorem
- An Intuitive (and short) explanation of Bayes Theorem (betterexplained.com)
- Wikipedia entry on law of large numbers