A Little Review of Big Data Books

I recently finished three books on “big data”– Big Data: A Revolution That Will Transform How We Live, Work, and Think, by Viktor Mayer-Schönberger and Kenneth Cukier; Everybody Lies: Big Data, New Data, and What the Internet can tell us about who we Really Are, by Seth Stephens-Davidowitz; and Big Data At Work: Dispelling the Myths, Uncovering the opportunities, by Thomas H. Davenport.

None of these books was a whiz-bang thriller, but I enjoyed them.

Big Data was a very sensible introduction. What exactly is “big data”? It’s not just bigger data sets (though it is also that.) It’s the opportunity to get all the data.

Until now, the authors point out, we have lived in a data poor world. We have had to carefully design our surveys to avoid sampling bias because we just can’t sample that many people. There’s a whole bunch of math done over in statistics to calculate how certain we can be about a particular result, or whether it could just be the result of random chance biasing our samples. I could poll 10,000 people about their jobs, and that might be a pretty good sample, but if everyone I polled happens to live within walking distance of my house, is this a very representative sample of everyone in the country? Now think about all of those studies on the mechanics of sleep done on whatever college students or homeless guys a scientist could convince to sleep in a lab for a week. How representative are they?

Today, though, we suddenly live in a data rich world. An exponentially data rich world. A world in which we no longer need to correct for bias in our sample, because we don’t have to sample. We can just get… all the data. You can go to Google and find out how many people searched for “rabbit” on Tuesday, or how many people misspelled “rabbit” in various ways.

Data is being used in new and interesting (and sometimes creepy) ways. Many things that previously weren’t even considered data are now being quantitized–like one researcher quantitizing people’s backsides to determine whether a car is being driven by its owner, or a stranger.

One application I find promising is using people’s searches for various disease symptoms to identify people who may have various diseases before they seek out a doctor. Catching cancer patients earlier could save millions of lives.

I don’t have the book in front of me anymore, so I am just going by memory, but it made a good companion to Auerswald’s The Code Economy, since the modern economy runs so much on data.

Everybody Lies was a much more lighthearted, annecdotal approach to the subject, discussing lots of different studies. Davidowitz was inspired by Freakonomics, and he wants to use Big Data to uncover hidden truths of human behavior.

The book discusses, for example, people’s pornographic searches, (as per the title, people routinely lie about how much porn they look at on the internet,) and whether people’s pornographic preferences can be used to determine what percent of people in each state are gay. It turns out that we can get a break down of porn queries by state and variety, allowing a rough estimate of the gay and straight population of each state–and it appears that what people are willing to tell pollsters about their sexuality doesn’t match what they search for online. In more conservative states, people are less likely to admit to pollsters that they are gay, but plenty of supposedly “straight” people are searching for gay porn–about the same number of people as actually admit to being gay in more liberal states.

Stephens-Davidowitz uses similar data to determine that people have been lying to pollsters (or perhaps themselves) about whom they plan to vote for. For example, Donald Trump got anomalously high votes in some areas, and Obama got anomalously low votes, compared to what people in those areas told pollsters. However, both of these areas correlated highly with areas of the country where people made a lot of racist Google searches.

Most of the studies discussed are amusing, like the discovery of the racehorse American Pharaoh. Others are quite important, like a study that found that child abuse was probably actually going up at a time when official reports said it wasn’t–the reports probably weren’t showing abuse due to a decrease in funding for investigating abuse.

At times the author steps beyond the studies and offers interpretations of why the results are the way they are that I think go beyond what the data tells, like his conclusion that parents are biased against their daughters because they are more concerned with girls being fat than with boys, or because they are more likely to Google “is my son a genius?” than “is my daughter a genius?”

I can think of a variety of alternative explanations. eg, society itself is crueler to overweight women than to overweight men, so it is reasonable, in turn, for parents to worry more about a daughter who will face cruelty than a boy who will not. Girls are more likely to be in gifted programs than boys, but perhaps this means that giftedness in girls is simply less exceptional than giftedness in boys, who are more unusual. Or perhaps male giftedness is different from female giftedness in some way that makes parents need more information on the topic.

Now, here’s an interesting study. Google can track how many people make Islamophobic searches at any particular time. Compared against Obama’s speech that tried to calm outrage after the San Bernardino attack, this data reveals that the speech was massively unsuccessful. Islamophobic searches doubled during and after the speech. Negative searches about Syrian refugees rose 60%, while searches asking how to help dropped 35%.

In fact, just about every negative search we cold think to test regarding Muslims shot up during and after Obama’s speech, and just about every positive search we could think to test declined. …

Instead of calming the angry mob, as everybody thought he was doing, the internet data tells us that Obama actually inflamed it.

However, Obama later gave another speech, on the same topic. This one was much more successful. As the author put it, this time, Obama spent little time insisting on the value of tolerance, which seems to have just made people less tolerant. Instead, “he focused overwhelmingly on provoking people’s curiosity and changing their perceptions of Muslim Americans.”

People tend to react positively toward people or things they regard as interesting, and invoking curiosity is a good way to get people interested.

The author points out that “big data” is most likely to be useful in fields where the current data is poor. In the case of American Pharaoh, for examples, people just plain weren’t getting a lot of data on racehorses before buying and selling them. It was a field based on people who “knew” horses and their pedigrees, not on people who x-rayed horses to see how big their hearts and lungs were. By contrast, hedge funds investing in the stock market are already up to their necks in data, trying to maximize every last penny. Horse racing was ripe for someone to become successful by unearthing previously unused data and making good predictions; the stock market is not.

And for those keeping track of how many people make it to the end of the book, I did. I even read the endnotes, because I do that.

Big Data At Work was very different. Rather than entertain us with the success of Google Flu or academic studies of human nature, BDAW discusses how to implement “big data” (the author admits it is a silly term) strategies at work. This is a good book if you own, run, or manage a business that could utilize data in some way. UPS, for example, uses driving data to minimize package delivery routes; even a small saving per package by optimizing routes leads to a large saving for the company as a whole, since they deliver so many packages.

The author points out that “big data” often isn’t big so much as unstructured. Photographs, call logs, Facebook posts, and Google searches may all be “data,” but you will need some way to quantitize these before you can make much use of them. For example, companies may want to gather customer feedback reports, feed them into a program that recognizes positive or negative language, and then quantitizes how many people called to report that they liked Product X vs how many called to report that they disliked it.

I think an area ripe for this kind of quantitization is medical data, which currently languishes in doctors’ files, much of it on paper, protected by patient privacy laws. But people post a good deal of information about their medical conditions online, seeking help from other people who’ve dealt with the same diseases. Currently, there are a lot of diseases (take depression) where treatment is very hit-or-miss, and doctors basically have to try a bunch of drugs in a row until they find one that works. A program that could trawl through forum posts and assemble data on patients and medical treatments that worked or failed could help doctors refine treatment for various difficult conditions–“Oh, you look like the kind of patient who would respond well to melatonin,” or “Oh, you have the characteristics that make you a good candidate for Prozac.”

The author points out that most companies will not be able to keep the massive quantities of data they are amassing. A hospital, for example, collects a great deal of data about patient’s heart rates and blood oxygen levels every day. While it might be interesting to look back at 10 years worth of patient heart rate data, hospitals can’t really afford to invest in databanks to store all of this information. Rather, what companies need is real-time or continuous data processing that analyzes current data and makes predictions/recommendations for what the company (or doctor) should do now.

For example, one of the books (I believe it was “Big Data”) discussed a study of premature babies which found, counter-intuitively, that they were most likely to have emergencies soon after a lull in which they had seemed to be doing rather well–stable heart rate, good breathing, etc. Knowing this, a hospital could have a computer monitoring all of its premature babies and automatically updating their status (“stable” “improving” “critical” “likely to have a big problem in six hours”) and notifying doctors of potential problems.

The book goes into a fair amount of detail about how to implement “big data solutions” at your office (you may have to hire someone who knows how to code and may even have to tolerate their idiosyncrasies,) which platforms are useful for data, the fact that “big data” is not all that different from standard analytics that most companies already run, etc. Once you’ve got the data pumping, actual humans may not need to be involved with it very often–for example you may have a system that automatically updates drives’ routes with traffic reports, or sprinklers that automatically turn on when the ground gets too dry.

It is easy to see how “big data” will become yet another facet of the algorithmization of work.

Overall, Big Data at Work is a good book, especially if you run a company, but not as amusing if you are just a lay reader. If you want something fun, read the first two.

Advertisements

Exciting Bith Data from 1919

Ex Libris

While searching for data on birth rates by profession, I came across Birth Statistics for the Birth Registration Area of the United States, 1919, which has tons of fascinating information.

The “birth registration area” is all of the states that sent in birth data for the survey–CA, CT, IN, KS, KY, ME, MD, MA, MN, MI, NH, NY, NC, OH, OR, PA, RI, SC, UT, VT, VA, WA, and WI. Missouri, that “den of outlawry,” shall not feature.

“In the birth registration area of the United States in 1919 there were 1,373,438 live births, which represent a birth rate of 22.3 per 1,000 of population… Of the 1919 births, 705,593 were males and 667,845 were females, or a proportion of 1,057 males to 1,000 females.

“There is a marked excess of births over death in every state in the birth registration area. In New Hampshire the figures are lowest… A marked excess is also shown for nearly every city, and wherever the deaths outnumber the births it is usually among the colored population. The mortality rate of infants under 1 year of age per 1,000 births … is 87, ranging in the states from 63 in Oregon and Washington to 113 in South Carolina.

“The birth rates for the registration states ranged from 16.8 in California to 29.3 in Utah, and the death rates ranged from 10.5 in Minnesota to 15.3 in Maryland. The greatest excess of births over deaths–18.3 per 1,000 population–appears for Utah, and the lowest–3.1 per 1,000–for California.”

In 1919, most of the cities with the lowest birthrates were, predictably, in California, though a smattering of similarly-low cities existed elsewhere; Brookline, MA, though, had by far the lowest rate, at 8.1.

What’s up with Brookline? Was it full of priests? Shakers?

The highest birthrates were in Columbia, SC and Johnstown, PA, but several cities in Connecticut, RI, and MA had similarly high rates.

The highest death rates were Lexington, KY 25.8 and Columbia, SC 32.5. At 9.6, Flint, Michigan and Quincy MA had the lowest death rates, though several other cities were quite close, like Racine, Wis, 9.7.

This data is crying out for a map, so I made two, one showing just the per-state averages and one including the major cities + highest and lowest smaller cities:

Feel free to take and use as you please

 

The scan is not easy to read in places, so forgive me if I’ve confused a 4 and a 1 somewhere, or a 3 and a 2.

The town of Brookline, MA, kind of threw off the scale by having far fewer births (8.1) than everywhere else. (MA also had some very high birth rates.) Columbia, SC, has both the highest birth rate and highest death rate (I haven’t made a map of death rates, yet.) I think it is interesting how some cities are right in line with their state’s average, and some are very different.

We can pick out several trends: the West probably had more men than women, resulting in lower birthrates. Mormon Utah was serious about making babies. The Midwest and North East had overall moderate birth rates, though there are a few towns in there that look heavily Irish. Note:

“…it appears that far more births occur annually to white foreign-born married women aged from 15 to 44, proportionally to their number, than to native white married women of corresponding ages. In Connecticut in 1910 over 46 percent of white married women aged 15 to 44 were of foreign birth, but 57% of the children … were reported as children of mothers of foreign birth.”

The South, like Utah, has very high fertility rates–possibly due to high fertility rates among the black population, though I wouldn’t be surprised if Southern whites were having more babies, too.

That’s all for now, though I hope to make some more graphs/maps based on this book’s data soon.

Terrorists are getting better at Terrorism

Courtesy of Saul Montes-Bradley‏ @Debradelai

Saul Montes-Bradley is the author of Gander: Terrorism, Incompetence, and the Rise of Islamic National Socialism

Data from 1981-2015 is from the State Department; 2016’s data is from Homeland Security. Note that this is global, not limited to the US or Europe; it’s also specifically terrorism, not guerrilla warfare or similar war-related acts.

Let’s assume the data is accurate and not biased by something like “we couldn’t get into this area to count how many attacks there were before 2000,” nor, “Well, before this was a ‘war’ and 3,000 people were dying from ‘warfare’ every year but now we’re calling it ‘terrorism’.”

Montes-Bradle attributes the massive, recent rise to Obama/Obama’s policies, but I note that the rise began in 2004–when Bush II was still in power–and had a local maximum in 2007–also when Bush II was still in power. Things improved during Bush’s final year in office, and continued improving (slowly) for Obama’s first four years in office, before jumping back to Bush-levels in 2013.

So: clearly something has changed, and I’m going to say it changed in 2004, though we might say 2001. But what? And why? I’m going to go out on a limb and say that the terrorists got serious about killing people. A lot of bombs and even airplane hijackers back in the 70s and 80s didn’t actually kill anyone, or if they did, casualties were fairly low. 9-11 marked a big departure from previous terrorism in that it actually killed a huge number of people, especially relative to the number of terrorists involved.

Terrorists are getting better at what they do because terrorists change their tactics much faster than governments change theirs. Terrorism mutates faster than governments can respond.

Data on a Variety of Topics

Since yesterday’s post actually took about a week to write, today I’m just posting some of the data/graphs I ran across in the process but didn’t utilize.

CkdXr8uXIAAODbK

Since anyone can make a graph and claim to have used X source, and even honest people sometimes make mistakes, I try to double-check graphs before I use them. I never did manage to double-check this one, so it didn’t get used. If anyone can vouch for or against it, I’d be grateful.

65-c-nwd4jbqxrveknpyjiyrs-_j-syxmqhbaozshekyvehizapallfs9ha3023tqozyke4ixuyapzd7b5zoms7jlkqrx5nbvrtzljch1ezqwas0-d-e1-ft cqkck8gwiaa4exj screenshot-2016-05-07-17-13-33 picture-19c black-friends-white-friends screen-shot-2016-07-12-at-11-05-47-am picture-154  ft_15-11-19_speech picture-18

edited to remove poster's name
edited to remove poster’s name

Oh, and just in case anyone wants it, here is the data I used to construct the graphs on lynching/lynching rates:

picture-9 picture-12 picture-13

 

Two Graphs

It’s one of those you need a graph, you gotta make it yourself kind of days. picture-18

Feel free to take and use these graphs for your own essays.

The data for these graphs came from the Wikipedia page on lynchings in the US and the Tuskegee Institute. The Tuskegee Institute may not have counted 100% of lynchings (I don’t think anyone really could,) but these are the ones they documented. “White” in the original dataset I recoded as “non-black” because Tuskegee included Asians, Hispanics, and Indians as “white.”

picture-121 copy

I couldn’t find stats on what % of blacks were victims of lynching, so I used the demographic data from the Wikipedia page on US blacks, which I assume gets its data from the census to calculate the rate per 100k black people.

Since the census is only conducted once every decade and the lynching data was reported for each year, I used the average per-year change between censuses to estimate the population on non-census years.

(I suppose this still does not give us an aggregate total percent.)

I chose “rate per 100k” because that’s how homicide data is normally presented. For example, a rate of just over 2, at the peak of black lynching, means that about 2 out of 100,000 black people were lynched, or 0.002% of the total population.

By comparison, the United States today has an overall homicide rate of 3.9 per 100k people (meaning that any random person walking around the US today is more likely to get murdered than a black person was to get lynched, though of course this does not count non-lynching forms of racially motivated murder.)

I read a book and it’s Friday: Homicide, by Daly and Wilson

Today’s selection, Homicide, is ev psych with a side of anthropology; I am excerpting the chapter on people-who-murder-children. (You are officially forewarned.)

Way back in middle school, I happened across (I forget how) my first university-level textbook, on historical European families and family law. I got through the chapter on infanticide before giving up, horrified that enough Germans were smushing their infants under mattresses or tossing them into the family hearth that the Holy Roman Empire needed to be laws specifically on the subject.

It was a disillusioning moment.

Daly and Wilson’s Homicide, 1988, contributes some (slightly) more recent data to the subject, (though of course it would be nice to have even more recent data.

Picture 6 Picture 5 Picture 4 Picture 2 Picture 1 CgxAZrOUYAEeANF

(I think some of the oddities in # of incidents per year may be due to ages being estimated when the child’s true age isn’t known, eg, “headless torso of a boy about 6 years old found floating in the Thames.”)

We begin with a conversation on the subject of which child parents would favor in an emergency:

If parental motives are such as to promote the parent’s own fitness, then we should expect that parents will often be inclined to act so that neither sibling’s interests prevail completely. Typically, parental imposition of equity will involve supporting the younger, weaker competitor, even when the parent would favor the older if forced to choose between the two. It is this latter sort of situation–“Which do you save when one must be sacrificed?”–in which parents’ differential valuation of their children really comes to the fore. Recall that there were 11 societies in the ethnographic review of Chapter 3 for which it was reported that a newborn might be killed if the birth interval were too short or the brood too numerous. It should come as no surprise that there were no societies in which the prescribed solution to such a dilemma was said to be the death of an older child. … this reaction merely illustrates that one takes for granted the phenomenon under discussion, namely the gradual deepening of parental commitment and love.

*Thinks about question for a while* *flails* “BUT MY CHILDREN ARE ALL WONDERFUL HOW COULD I CHOSE?” *flails some more*

That said, I think there’s an alternative possibility besides just affection growing over time: the eldest child has already proven their ability to survive; an infant has not. The harsher the conditions of life (and thus, the more likelihood of actually facing a real situation in which you genuinely don’t have enough food for all of your children,) the higher the infant mortality rate. The eldest children have already run the infant mortality gauntlet and so are reasonably likely to make it to adulthood; the infants still stand a high chance of dying. Sacrificing the child you know is healthy and strong for the one with a high chance of dying is just stupid.

Whereas infant mortality is not one of my personal concerns.

Figure 4.4 shows that the risk of parental homicide is indeed a declining function of the child’s age. As we wold anticipate, the most dramatic decrease occurs between infants and 1-year-old children. One reason for expecting this is that the lion’s share of the prepubertal increase in reproductive value in natural environments occurs within the first year.

(I think “prepubertal increase in reproductive value” means “decreased likelihood of dying.”)

Moreover, if parental disinclination reflects any sort of assessment of the child’s quality or the mother’s situation, then an evolved assessment mechanisms should be such as to terminate any hopeless reproductive episode as early as possible, rather than to squander parental effort in an enterprise that will eventually be abandoned. … Mothers killed 61 in the first 6 months compared to just 27 in the second 6 months. For fathers, the corresponding numbers are 24 vs. 14. [See figure 4.4] … This pattern of victimization contrasts dramatically with the risk of homicide at the hands of nonrelatives (Figure 4.5)…

I would like to propose an alternative possibility: just as a child who attempts to drive a car is much more likely to crash immediately than to successfully navigate onto the highway and then crash, so a murderous person who gets their hands onto a child is more likely to kill it immediately than to wait a few years.

A similar mechanism may be at play in the apparent increase and then decrease in homicides of children by nonrelatives during toddlerhood. Without knowing anything about these cases, I can only speculate, but 1-4 are the ages when children are most commonly put into daycares or left with sitters while their moms return to work. The homicidally-minded among these caretakers, then, are likely to kill their charges sooner rather than later. (School-aged children, by contrast, are both better at running away from attackers and highly unlikely to be killed by their teachers.)

Teenagers are highly conflictual creatures, and the rate at which nonrelatives kill them explodes after puberty. When we consider the conspicuous, tempestuous conflicts that occur between teenagers and their parents–conflicts that apparently dwarf those of the preadolescent period–it is all the more remarkable that the risk of parental homicide continues its relentless decline to near zero.

… When mothers killed infants, the victims had been born to them at a mean age of 22.7 years, whereas older victims had been born at a mean maternal age of 24.5. Thi is a significant difference, but both means are signficantly below the 25.8 year that was the average age of all new Candian mothers during the same period, accoding to Cadian Vital Statistics.

In other words, impulsive fuckups who get accidentally pregnant are likely to be violent impulsive fuckups.

We find a similar result with respect to marital status: Mothers who killed older children are again intermediate between infanticidal women and the population-at-large. Whereas 51% of mothers committing infanticide were unmarried, the same was true of just 34% of those killing older children. This is still substantially above the 12% of Canadian births in which the new mother was unmarried …

Killing of an older child is often associated with maternal depression. Of the 95 mothers who killed a child beynd its infancy, 15.8% also committed suicide. … By contrast, only 2 of 88 infanticidal mothers committed suicide (and even this meager 2.3% probably overestimates the assocation of infanticide with suicide, since infanticides are the only category of homicides in which a significant incidence of undetected cases is likely.) … one of thee 2 killed three older children as well.

Anyone else thinking of Andrea Yates and her idiot husband?

In the Canadian data, it is also noteworthy that 35% of maternal infanticides were attributed by the investigating police force … [as] “mentally ill or mentally retarded (insane),” verses 58% of maternal homicides of older children. Here and elsewhere, it seems that the sots of cases that are simultaneously rare and seemingly contrary to the actor’s interests–in both the Darwinian and the commonsense meaning of interest–also happen t be the sorts of cases most likely to be attributed to some sort of mental incompetence. … We identify as mad those people who lack a species-typical nepotistic perception of their interests or who no longer care to pursue them. …

Violent people go ahead and kill their kids; people who go crazy later kill theirs later.

We do at least know the ages of the 38 men who killed heir infant children: the mean was 26.3 years. Moreover, we know that fathers averaged 4 years older than mothers for that substantial majority of Canadian births that occurred within marriages… . Since the mean age for all new Canadian mothers during the relevant period… was 25.8, it seems clear that infanticidal fathers are indeed relatively young. And as was the case with mothers, infanticidal fathers were significantly younger than those fathers who killed older offspring. (mean age at the victim’s birth = 29.2 years). …

As with mothers, fathers who killed older children killed themselves as well significantly more often (43.6% of 101) than did those who killed their infant children (10.5% of 38). Also like mothers is the fact that those infanticidal fathers who did commit suicide were significantly older (mean age = 30.5 years) than those who did not (mean = 25.8). Likewise, the paternal age at which older victims had been born was also significantly greater for suicidal (mean = 31.1 years; N = 71) than for nonsuicidal (mean =27.5; N = 67) homicidal fathers. And men who killed their older children were a little more likely to be deemed mentally incompetent (20.8%) than those who killed their infants (15.8%). …

Fathers, however, were significantly less likely to commit suicide after killing an adult offspring (19% of 21 men) than a child (50% of 80 men.) … 20 of the 22 adult victims of their father were sons… three of the four adult victims of mothers were daughters. … There is no hint of such a same-ex bias in the killings of either infants… or older children. …

An infrequent but regular variety of homicide is that in which a man destroys his wife and children. A corresponding act of familicide by the wife is almost unheard of. …

No big surprises in this section.

Perhaps the most obvious prediction from a Darwinian view of parental motives is this: Substitute parents will generally tend to care less profoundly for their children than natural parents, with the result that children reared by people other than their natural parents will be more often exploited and otherwise at risk. Parental investment is a precious resource, and selection must favor those parental psyches that do not squander it on nonrelatives.

Disclaimer: obviously there are good stepparents who care deeply for their stepchilden. I’ve known quite a few. But I’ve also met some horrible stepparents. Given the inherent vulnerability of children, I find distasteful our society’s pushing of stepparenting as normal without cautions against its dangers. In most cases, remarriage seems to be undertaken to satisfy the parent, not the  child.

In an interview study of stepparents in Cleveland, Ohio, for example–a study of predominantly middle-class group suffering no particular distress or dysfunction–Loise Duberman (1975) found that only 53% of stepfathers and 25% of stepmothers could claim to have “parental feeling” toward their stepchildren, and still fewer to “love” them.

Some of this may be influenced by the kinds of people who are likely to become stepparents–people with strong family instincts probably have better luck getting married to people like themselves and staying that way than people who are bad at relationships.

In an observational study of Trinidadian villagers, Mark Flinn (1988) found that stepfathers interacted less with “their” children than did natural fathers; that interactions were more likely to be aggressive within steprelationships than within the corresponding natural relationships; and that stepchildren left home at an earlier age.

Pop psychology and how-to manuals for stepfamilies have become a growth industry. Serious study of “reconstituted” families is also burgeoning. Virtually all of this literature is dominated by a single theme: coping with the antagonisms…

Here the authors stops to differentiate between between stepparenting and adoption, which they suspect is more functional due to adoptive parents actually wanting to be parents in the first place. However,

such children have sometimes been found to suffer when natural children are subsequently born to the adopting couple, a result that has led some professionals to counsel against adoption by childless couples until infertility is definitely established. …

Continuing on with stepparents:

The negative characterization of stepparents is by no means peculiar to our culture. … From Eskimos to Indonesians, through dozens of tales, the stepparent is the villain of every piece. … We have already encountered the Tikopia or Yanomamo husband who demands the death of his new wife’s prior children. Other solutions have included leaving the children with postmenopausal matrilineal relatives, and the levirate, a wide-spread custom by which a widow and her children are inherited by the dead man’s brother or other near relative. …

Social scientists have turned this scenario on its head. The difficulties attending steprelationships–insofar as they are acknowledged at all–are presumed to be caused by the “myth of the cruel stepparent” and the child’s fears.

See: Freud.

Why this bizarre counterintuitive view is the conventional wisdom would be  a topic for a longer book than this; suffice to say that the answer surely has more to do with ideology than with evidence. In any event, social scientists have staunchly ignored the question of the factual basis for the negative “stereotyping” of stepparents.

Under Freud’s logic, all sorts of people who’d been genuinely hurt by others were summarily dismissed, told that they were the ones who actually harbored ill-will against others and were just “projecting” their emotions onto their desired victims.

Freudianism is a crock of shit, but in this case, it helped social “reformers” (who of course don’t believe in silly ideas like evolution) discredit people’s perfectly reasonable fears in order to push the notion that “family” doesn’t need to follow traditional (ie, biological) forms, but can be reinvented in all sorts of novel ways.

So are children at risk in stepparent homes in contemporary North America? [see Figures 4.7 and 4.8.] … There is … no appreciable statistical confounding between steprelationships and poverty in North America. … Stepparenthood per se remains the single most powerful risk factor for child abuse that has yet been identified. (here and throughout this discussion “stepparents” include both legal and common-law spouses of the natural parent.) …

Speaking of Figures 4.7 and 4.8, I must say that the kinds of people who get divorced (or were never married) and remarried within a year of their kid’s birth are likely to be unstable people who tend to pick particularly bad partners, and the kinds of people willing to enter into a relationship with someone who has a newborn is also likely to be, well, unusual. Apparently homicidal.

By contrast, the people who are willing to marry someone who already has, say, a ten year old, may be relatively normal folks.

Just how great an elevation of risk are we talking about? Our efforts to answer that question have been bedeviled by a lack of good information in the living arrangements of children in the general population. … there are no official statistics [as of when this was written] on the numbers of children of each age who live in each household type. There is no question that the 43% of murdered American child abuse victims who dwelt with substitute parents is far more than would be expected by chance, but estimates of that expected percentage can only be derived from surveys that were designed to answer other questions. For a random sample of American children in 1976, … the best available national survey… indicates that only about 1% or fewer would be expected to have dwelt with a substitute parent. An American child living with one or more substitute parents in 1976 was therefore approximately 100 times as likely to be fatally abused as a child living with natural parents only…

Results for Canada are similar. In Hamilton, Ontario in 1983, for example, 16% of child abuse victims under 5 years of age lived with a natural parent and a stepparent… Since small children very rarely have stepparents–less than 1% of preschoolers in Hamilton in 1983, for example–that 16% represents forty times the abuse rate for children of the same age living with natural parents. … 147 Canadian children between the ages of 1 and 4 were killed by someone in loco parentis between 1974 and 1983; 37 of those children (25.2%) were the victims of their stepparents, and another 5 (3.4%) were killed by unrelated foster parents.

…The survey shows, for example, that 0.4% of 2,852 Canadian children, aged 1-4 in 1984, lived with a stepparent. … For the youngest age group in Figure 4.9, those 2 years of age and younger, the risk from a stepparent is approximately 70 times that from a natural parent (even though the later category includes all infanticides by natural mothers.)

Now we need updated data. I wonder if abortion has had any effect on the rates of infanticide and if increased public acceptance of stepfamilies has led to more abused children or higher quality people being willing to become stepparents.

History is meaningless without narrative

What is a story? Events set into a pattern.

Patterns–narratives–are how we understand the world.

Look away from the screen. What do you see? A collection of lines and colors? Or objects?

You can make sense of the light entering your eyes because your brain organizes them into patterns. You recognize that a colon and a parentheses are a face :) You recognize that orange and black stripes mean a tiger is nearby. Sounds coalesce into words and marks scratched in wet clay into epics.

If you spot a tiger every time you go to the watering hole, you notice a pattern–and if you’re lucky, find a new watering hole. If you can’t recognize patterns, chances are good you’ll be eaten by a tiger.

Brains love patterns so much, you can trigger a state of bliss just by repeating patterns to yourself. Former schizophrenics have related to me just how nice schizophrenia can feel, which I admit seems kind of counter-intuitive, but then, I found a pattern in some data today and was so happy as a result, that I can see how that might be so.

Suppose I read you numbers at random from some dataset–say, daily rainfall in Helsinki for the past 2000 years. Each number would tell you something about that particular day, but the dataset as a whole would tell you nothing. Random data is just noise. Even if I read the numbers in order, you’d probably hear little more than noise, though if you paid attention, you might start to hear a pattern after a year or two of rainfalls.

But if I made a graph, Helskinki’s rainy October and Novembers–and dry Aprils–would suddenly stand out. We could make graphs of rainfall over years, months, or centuries. We could look for all kinds of patterns–and interesting outliers.

Once we see patterns, we find meaning.

History is the study of change. An accounting of history without patterns soon devolves into random noise. Names, dates. Names, dates. The narratives give it meaning.

I first really discovered this while trying to research the French Revolution via Wikipedia. Wikipedia tries its darndest not to impart any particular bias to its historical articles, resulting in a lot of names and dates and places, without much that ties it all together. This actually makes them hard to read; after a while my eyes glaze over and my brain starts refusing to process anymore. By contrast, pick up any book on the French Revolution, and you’ll probably discover the author’s central thesis “The peasants made them do it!” or “Crop failures drove them to revolt!” or “System breakdown!” The author takes care to marshal evidence in favor of his thesis, drawing out the patterns for you.

It took only one small book on the French Revolution for it to suddenly make sense. The was a stark difference between my brain’s willingness to follow this author’s train of thought (“The peasants made them do it!”) and my brain’s willingness to follow the Wikipedia’s N-POV articles, even though I did not necessarily agree with the author’s thesis.

To be clear, the Wikipedia is not bad for avoiding POV; many, many theses are completely wrong. You could not even begin to write an article on the French Revolution if you wanted to make an accurate presentation of all the theses people have had on the subject, or even just the major ones. The best thing for the Wikipedia is to try to present factual information, and leave it up to the readers to find their own patterns.

(The “badly written” Wikipedia articles have bias and POV-issues and actually make sense, even if I often disagree with the author’s thesis.)

Much of what I do here on this blog is look for patterns in the data. “Here’s something interesting,” I say. “Can I find any patterns? Anything that might fit this data?” It is all very speculative. I know it is speculative. I hope that you know that I know that I am speculating, and not proclaiming to know the One True Truth.

Take the post, “Why do Native Americans have so much Neanderthal DNA?” Native Americans appear to have more Neanderthal DNA than other people is the starting datum; from there I try to marshal up some patterns that might explain things. Same with, say, “Adulterations in the Feed.”

Ultimately, I wager that a lot of my theories will turn out to be wrong. The real world does not care about patterns nearly so much as our little brains do, and we are prone to seeking out patterns in data even when there really aren’t any. Sometimes shit just happens and it doesn’t really mean anything bigger than the shit that is happening right now. Maybe there is no master plan. But we can’t live without meaning. We must have our patterns to make sense of the world, so our patterns we will have.

Remember, you are a braid in spacetime:

from Life is a Braid in Spacetime by Max Tegmark, Illustration by Chad Hagen

Doh

I just remembered an essay I wrote back in my school days, comparing rates of Behavior X in the US to various European countries, and recommending that we should, as a public policy matter, adopt legal standards on the matter closer to the European ones, but forgot to control for ethnicity.

In retrospect, it seems like such an obvious thing I should have controlled for when presenting the data. :(