Skip to main content

Sci-fi in aid of Science

I was a pretty big fan of science fiction in my younger days, and still read some from time to time. I think Frank Herbert’s  Dune is a great novel (the sequels not so much), enjoyed reading works by Heinlein, Le Guin and Asimov.. 

One of the genre’s leading lights back then was Arthur C Clarke, who wrote the novel 2001: A Space Odyssey (in 1982) on which the film was based. I was not a Clarke fan, don’t remember that I read any of his stuff. However, he made an interesting contribution to the culture beyond his books themselves, when he formulated three ‘laws’ regarding technology that have come to be known as Clarke’s Laws. He didn’t proclaim these all at once, and in any case it is the third law that is most cited, which so far as I can determine first appeared in a letter he wrote to Science in 1968. [If anyone has better info on the third law’s original appearance and antecedents I’d love to hear it.]

Clarke’s Third Law is: ‘Any sufficiently advanced technology is indistinguishable from magic.’

That strikes me – and many others, apparently – as a perceptive statement. Think of how someone living in 1682 anywhere in the world would regard television or radio. 

As with any perceptive and oft-repeated assertion,  this prompted others to lay down similar edicts, such as Grey’s Law: “Any sufficiently egregious incompetence is indistinguishable from malice.”

[I cannot trace Grey’s law back to anyone named Grey – if you can, let me know.]

Note that there is a difference, as Clarke’s law speaks to how something will be perceived, whereas Grey’s points at the consequences of incompetence vs malice. If you are denied a mortgage by a bank despite your stellar credit rating, the impact on you of that decision does not depend on whether it is attributable to the credit officer’s incompetence or dislike of you. 

On to Science, then, and what I will call Gelman’s Law (although Gelman himself does not refer to it that way). 

Most non-academics I know view academics and their research with a somewhat rosy glow. If someone with letters after their name writes something, and particularly if they write it in an academic journal, they believe it. 

It does nothing to increase my popularity with my friends to repeatedly tell them: it ain’t so. There is a lot of crappy (a technical academic term, I will elaborate in future posts) research being done, and a lot of crappy research being published, even in peer-reviewed journals. What is worse is that as far as I can tell, the credible research is almost never the stuff that gets written up in the media. Some version of Gresham’s Law [‘bad money drives out good money’] seems to be at work here. 

A blog that I read regularly is titled Statistical Modeling, Causal Inference and Social Science (gripping title, eh?), written by Andrew Gelman, a Political Science and Stats prof at Columbia U. I recommend it to you, but warn that you better have your geek hard-hat on for many of the posts. 

Although I often disagree with Gelman, he generally writes well and I have learned tons from his blog. One of the things that has endeared it to me is his ongoing campaign against academic fraud and incompetent research. 

He has formulated a Law of his own, which he modestly attributes to Clarke, but which I will here dub Gelman’s Third Law: 

“Any sufficiently crappy research is indistinguishable from fraud.”

I think this law combines the insights of Clarke’s and Grey’s. The consequences of believing the results from crappy research do not differ from the consequences of believing the results from fraudulent research, as with Grey. However, it is also true that there is no reason to see the two things as different. If you are so incompetent at research as to produce crap, then you should be seen as a fraud, as with Clarke. 

I will be writing about crappy/fraudulent research often here, in hopes of convincing readers that they should be very skeptical the next time they read those deadly words: “Studies show…”

I will close this by referring you, for your reading pleasure, to a post by Gelman titled:    

 It’s bezzle time: The Dean of Engineering at the University of Nevada gets paid $372,127 a year and wrote a paper that’s so bad, you can’t believe it.

It’s a long post, but non-geeky, and quite illuminating. (Aside: I interviewed for an academic position at U of Nevada in Reno a hundred years ago. They put me up in a casino during my visit. Didn’t gamble, didn’t get a job offer.) You can read more about this intrepid and highly paid Dean here. His story is really making the (academic) rounds these days. 

You’re welcome, and stay tuned. I got a million of ‘em….

p.s. Discovered this since I wrote the above, but before posting. One of many reasons this stuff matters, from Nevada Today

University receives largest individual gift in its history to create the George W. Gillemot Aerospace Engineering Department 

The $36 million gift is the largest individual cash gift the University has received in its 149-year history 

Anyone care to bet on whether this Dean gets canned?

Tags: Research, Education, Standards

 

Uses and Abuses of Statistics – MLB Edition

If you watch a lot of sports as I do, you cannot fail to be aware of the so-called ‘Analytics Revolution’, a phenomenon that has wormed its way into sports broadcasting. Whatever professional teams may be doing with the reams of game and performance data they now collect, one cannot miss how much sportscasters talk about it, before, during and after each broadcast. 

As someone whose happiness would greatly increase if said sportscasters would just shut up, I cannot say all this statistic-centric chattering is welcome, but sometimes it is interesting. A frequent use of stats in a broadcast is when one of the commentators cites a statistic that they think is directly relevant to what is happening in the game. For example –  

Hockey team x scores the first goal of the game, and the commentator says ‘The team that scores first wins the game z% of the time.’

Baseball team y goes into the 6th inning trailing by 2 runs and the commentator says ‘Teams that trail by 2 or more runs in the second half of a game have only a Z% chance of winning.’

Now, there is no mystery as to where these statements come from. For the first one, you just look at the last 10 years (say) of all NHL games and see which team scored first and which team won. The percentage of the games in which it is the same team gives you z in the statement. 

A particular example of this occurred during the second round of last season’s MLB playoffs, when two teams were playing the third game of a best-of-five series, tied at one win each. The commentator said ‘The team that wins the third game in this situation goes on to win the series 70% of the time.’

Once again, it’s clear this statement comes from looking back at previous MLB best-of-five series in which the teams split the first two games, but in this case I had an immediate reaction to this stat: that seems too low. 

My immediate no-pencil-and-paper reaction was not that he was quoting a mistaken actual statistic, but rather that I thought the 3rd-game-winning-team would win the series more  often than that. I got out my pad and pen, and here is what I came up with. 

What would simple probability calculations predict for the probability in question? Imagine team A has won the third game against team B, so it is leading the series 2 to 1 with two possible games to go. Assume also, just as a starting point, that because this is the playoffs, these are two evenly matched teams. Thus, absent any specific information about each team in each game (who is pitching, injuries, weather, etc) one would expect the probability that either team wins is ½. 

Given that, you can calculate the probability of team A going on to win the series (having won game three) by noting that the series after game three can go only one of three ways:

  1. A wins game four and the series
  2. A loses game four but wins game five and the series

iii. A loses both games four and five and loses the series. 

This is the whole universe of possibilities, and it is easy to calculate the probability of each one.

  1. The probability is ½ under our assumption that is the probability A wins any single game
  2. The probability A loses game four is ½ and the probability A wins game five is also ½, so the probability of those two events happening is ½ times ½ which is ¼. (The probability-aware out there will note that I have assumed that the probability of winning in each game is independent of what happens in the other game. I will come back to that below.)

iii. The probability A loses each game is again ½, so again the probability it loses both is ¼. 

Note that these three probabilities do add up to 1, so we have covered everything, but we have also found that the probability that either i or ii happens – the two cases in which A wins the series – add up to ¾, or 75%. 

So, on this account, my instincts were right, 70% is lower than 75%. 

However, when a calculation comes out differently than an actual number from the world, it is the calculation that must be re-thought. My first thought along those lines was the following: if A wins game three, then it has won two of three games against B, and although that is a small sample, it does point to the possibility that maybe team A is somewhat better than B, and that should be taken into account. 

For example, maybe in this scenario the probability A wins either of games four or five should be 0.55 and the probability B wins only 0.45. 

This is not helpful in reconciling the data with the calculations, however, as if one re-does the calculations for the probability of each of the three outcomes above, one now gets:

  1. 0.55
  2. 0.45 x 0.55 = 0.2475

iii. 0.45 x 0.45 = 0.2025

and the predicted probability of A winning the series is now up to 79.75%, even further away from the empirical 70%. 

Huh. 

So, one has to look at something else, and my preferred culprit would be the assumption built into all these calculations that the outcome of each game is independent of what happened in previous games. In particular, I suspect that a team that has won two of the three first games takes it a little easy in the fourth game. Not just that Team A’s players might ‘relax’ a bit, but also that team A’s manager might save his best pitcher for game five if he is needed, hoping that if they win game four his ace will be available for game one of the next round. In that scenario, the probability team A wins game four is less than 0.5, not more. You all can probably think of other explanations. 

In any case, it was clear that what the sportscaster who said this last autumn wanted us fans to think is ‘whoa, winning game three is really important’, when in fact there is something more interesting to be said: why don’t winners of game three do better than they do in a five-game series? 

Tags: news media, culture