Some statistical notes

Source: The Atlantic
Source: The Atlantic

However, The Atlantic article notes that, “the significance of these figures may be hugely overblown. “Everybody who’s remotely professionally involved in this kind of stuff knows that beyond about 10, 15, 20 years, [population estimates] are basically useless,” says Dr. Sean Fox of the University of Bristol in the U.K.”

Personally, I’d still be worried.



1. Rare events / things are likely to be over-represented in survey results due to random chance, if the chance of randomly picking that option among the survey items is higher than the chance of it occurring in real life. For example, let’s suppose I hand out 1000 surveys with three options to select from:

  1. Heterosexual
  2. Homosexual
  3. Asexual

Then chances are I will end up with an over-representation of asexuals. In real life, asexuality is rare–a British survey estimates it at about 1% of the British population, so I expect to get about 10 surveys marked asexual. But let’s suppose some people decide to just fill my survey out completely at random because they’re just here for the free M&Ms, or they’re not paying very good attention and mark the wrong box, or I accidentally make a mistake while tallying up the numbers. Then the chances of randomly ticking “asexual” are 33%. If 1% of responses are randomly incorrect, then I will get an additional 3.3 or so asexuals–that is, I will over-estimate the asexual population by about 33%. If 3% of responses are incorrect, then fully half of my reported asexuals aren’t asexual at all.

This problem will only get worse if there are two rare categories you can select on my survey. Suppose you can also select your race:

  1. White
  2. Black
  3. Hispanic
  4. Anything else

And we’re doing this survey in Comanche, TX, where Whites are 80%, Blacks are 1%, Hispanics are about 17%, and everyone else is about 2%.

The statistical odds of a black asexual in Comanche, TX, assuming these are independent variables, are therefore around 0.01%–in other words, we probably shouldn’t find any, so let’s hand out our survey to 10,000 people so we have a reasonable chance of finding one. (You know, pretending that Comanche has 10,000 people.)

If you’re filling this survey out randomly for the M&Ms, you’ve got a 25% chance of marking black and a 33% chance of marking asexual, for an 8.3% chance of marking both. If 1% of people do this, then we should see about 8 black asexuals–about 8 times as many as we ought to see.

A prominent real life demonstration of this effect was Pat Buchanan’s performance in the 2000 election in Florida. Voters had a close to 33% chance of randomly voting Buchanan if they mis-poked the ballot, but only 0.4% of people nationwide voted for Buchanan. This resulted in a large over-counting of votes for Buchanan.

Pop Palm Beach= 1.135 million * 51.3% voting rate = 582,255 voters. 0.4% of that is 2,329 votes. But if 1%–5,822–of those voters vote randomly, that’s another 1,921 votes for Buchanan. If the difference between winning and losing in Palm Beach comes down to less than 2,000 votes, then random chance, not democracy, is casting the deciding vote.

If your error rate goes above 1%, things obviously get even worse.

(To his credit, Pat Buchanan freely admitted that his anomalously high numbers in Palm Beach were probably due to people getting mixed up about the ballot.)


2. The black (African American) IQ score distribution may be wider and/or less normal than claimed.

The number of high-scoring blacks does not line up with the expected number of high-scoring blacks based on IQ distribution estimates. Pumpkin Person does a good breakdown of the math on this one, in their post, “Are too many U.S. blacks scoring high on IQ tests?