One of my pet peeves is the fact that numerical ideas often (too often) get misused in a bid to convey the wrong impression. It is done by those who should know better and very often by those who indeed know better. Recently, Butttowood had a decent article on this. The financial sector is the biggest culprit. Especially, in the willfully misleading category.

When I was in my previous avatar as a flunky in a global investment advisory, the Economist at the firm in charge of global asset allocation released a report key insight was "Semiconductors lead the rest of technology in the recovery cycle in 75% of the recessions" (or some such tripe). Now, this godforsaken sector was one I was in charge of and therefore had to read more on.

Turns out our Global Guru had looked at the last 4 major global recessions over the last 80 years and found that in 3 out of those 4 semis semiconductor stocks bounced sharper and sooner than the rest of tech.

This is tripe. When you are doing empirical research and are looking at 4 cycles, you have no business describing anything in percentage terms. I stopped reading anything else published by the "Guru". But who am I to say anything? He was the top ranked Economist in the investment industry, and I was the guy who wasn't good enough to get fired when I desperately wanted to.

In the Indian context, Outlook published a cover story saying total scam amount in India Rs. 1.75 lakhs crores or some such. They detailed many scams in this one -

The list roughly goes like this

900 crores - Fodder scam

600 crores - Taj Corridor scam

23 crores - Railway placements scam

3 crores - Perhaps accepted in bribe by the first cousin once removed of some central Govt employee

etc etc.

The last scam number listed was

Total black money stashed abroad = 1.72 lakh crores (estimated).

So, this ginormous number that forms more than 95% of the amount put in the cover page is plucked out of the hat. So, why the $#*k should you go into details in the other scams?

There is this beautiful idea of significant digits in the art of measurement. I never really understood this while at school. Any measurement, be it with Vernier Callipers or Screw guage comes with a built-in error factor. (Least Count?). So, whenever you gave any measurement the number of significant digits must be determined keeping this in mind.

So, if the built-in error is 1 cm. We cannot give a measurement that says 223.5 cms (even if take 10 measurements and average this out). We can at best say 223 cms or 224 cms. The 223.5 suggests that we have confidence over that 0.5, which we can technically not have. Simple idea really. It is like saying if your measurement has some built-in error, do not convey more accuracy than there is. So, if you measure some length 18 times with Vernier Callipers and the average comes out to be 32.222cms. You should bite the bullet and say roughly 32cms. Conveying confidence beyond what the numbers tell you is a crime (or at least should be considered one). Finance and sports are the two fields where this gets done the most.

You might have seen something along these lines frequently

1. The best stock returns are seen from Thursday to Monday.

2. Left-arm bowlers have seen the most success in ODIs conducted since the 1990s.

3. Ricky Ponting really struggles against India as he has an average that is 6 runs than his overall average

If you analysed stock returns over three-day window, some three window would have the best returns. This does not mean that that three-day window has something special going for it. This means something else. Something very special. Something that every statistician worth his salt must have the courage to say on 90% of the times he attempts some statistical analysis. This means

So, are all statistical inferences absurd? Of course not. This is where the term statistical significance comes into the picture. If the observation is statistically significant, the flag must be raised. And only then must the flag be raised. So, how do we wind our heads around statistical significance. Statisticians have fancy terms for this. But let us see if we can have an intuitive approach around this. Let us have a go at this with an example.

Let us say we want to test whether Ricky Ponting underperformance against a particular team, say, India is statistically significant. Let us further say his average against India is less than his overall average by about 15% (this looks significant).

Now, let us not take his average against his nemesis, India and keep it as a benchmark metric. Now, let us revisit the original sample and extract a sub-sample from this randomly. If the benchmark metric is lower than that observed in the sub-sample, say, 90% of the time, then let us say that the underperformance is statistically significant.

Let us build on this with numbers. Let us say, Ricky Ponting has scores of {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} in 10 innings. Further let us say, he has played 2 matches each against 5 teams. His overall average is 5.5. Now, let us say he has an average against India that is more than 15% less than his overall average. Or, an average of 4.5 or lesser. Is this statistically significant?

If we extract two scores that have to average 4.5 or lesser, we can have {1, 2}, {1, 3} {1, 4}, {1, 5}, {1, 6}, {1, 7}, {1, 8}, {2, 3}, {2, 4}, {2, 5}, {2, 6}{2, 7}, {3, 4}, {3, 5}, {3, 6}, {4, 5} - 16 possibilities. Totally, there are 45 possibilities. So, there is a nearly 40% chance that some random sample of 2 out of this 10 will have an average that is 15% lesser than the original. So, this 15% below average number means nothing. It is not statistically significant.

In reality, a great many quoted inferences derived from numbers are not statistically significant. If you are handed any statistical inference on a platter, you have hajaar grounds to suspect it is false. And we have not even come to the idea of bias. There are many ways in which we can bias a sample, Some biases creep in, while some others are introduced. Let me give a few examples.

The 2015 world cup stats counter states rather gleefully that the Indian batting unit has one of the highest strike rates in power plays. Now, we need to remember that this is largely because of 70% of India's cricket is played on subcontinental wickets, where the par score is 330-ish. England might play 50% of its matches on English wickets, where the par score might be 260-ish. So, unless the powerplay strike rates are at least 30% apart, we have no business making any inferences. These are the biases that the samples naturally carry.

There are some other biases that data-presenters can bring in. The most beautiful and one that most fudge-statisticians introduce with an unbearable holier-than-thou approach is the selection bias. Let me deal with this with an example.

Let us say, there are two stocks Alpha and Beta that, as an analyst I want to suggest are correlated heavily. I will draw the stock charts for Alpha and Beta and compute correlation numbers. But here is where I will be smart. I will choose the end date to be today's date and the start date to be any date from 2010 to 2013. I will find the correlation numbers for all 1000 or so possibilities and pick the date from which the correlation is the highest. If it is a Friday afternoon and I want to be really intellectually dishonest before my weekend, I will 'float' my end date also.

For any two stocks, if you have large enough database, about 40 minutes of time on your hands, and a moral compass pointing towards "bonus" there is a 50% chance of finding one set of dates where the correlation is more than 90%. You can even wear your best "Why are you looking at me like that. This is what the numbers are telling us". If you want to be thorough, you should find some pseudo-intellectual justification for having picked the date range that you did indeed pick. In case you are wondering how I know this scam with this much clarity, you should look for Business Objects Cognos 91% correlation on some research database. (In my defence, I am not proud of this).

A good statistician is one who can look at a lot of data and tell us why they do not mean much; and then pick up one nugget that actually means something. The statistician who cannot say "this means nothing" should be kicked out of his job. Lot of stats that we see online and on Television are the ones that should be binned.

When I was in my previous avatar as a flunky in a global investment advisory, the Economist at the firm in charge of global asset allocation released a report key insight was "Semiconductors lead the rest of technology in the recovery cycle in 75% of the recessions" (or some such tripe). Now, this godforsaken sector was one I was in charge of and therefore had to read more on.

Turns out our Global Guru had looked at the last 4 major global recessions over the last 80 years and found that in 3 out of those 4 semis semiconductor stocks bounced sharper and sooner than the rest of tech.

This is tripe. When you are doing empirical research and are looking at 4 cycles, you have no business describing anything in percentage terms. I stopped reading anything else published by the "Guru". But who am I to say anything? He was the top ranked Economist in the investment industry, and I was the guy who wasn't good enough to get fired when I desperately wanted to.

In the Indian context, Outlook published a cover story saying total scam amount in India Rs. 1.75 lakhs crores or some such. They detailed many scams in this one -

The list roughly goes like this

900 crores - Fodder scam

600 crores - Taj Corridor scam

23 crores - Railway placements scam

3 crores - Perhaps accepted in bribe by the first cousin once removed of some central Govt employee

etc etc.

The last scam number listed was

Total black money stashed abroad = 1.72 lakh crores (estimated).

So, this ginormous number that forms more than 95% of the amount put in the cover page is plucked out of the hat. So, why the $#*k should you go into details in the other scams?

There is this beautiful idea of significant digits in the art of measurement. I never really understood this while at school. Any measurement, be it with Vernier Callipers or Screw guage comes with a built-in error factor. (Least Count?). So, whenever you gave any measurement the number of significant digits must be determined keeping this in mind.

So, if the built-in error is 1 cm. We cannot give a measurement that says 223.5 cms (even if take 10 measurements and average this out). We can at best say 223 cms or 224 cms. The 223.5 suggests that we have confidence over that 0.5, which we can technically not have. Simple idea really. It is like saying if your measurement has some built-in error, do not convey more accuracy than there is. So, if you measure some length 18 times with Vernier Callipers and the average comes out to be 32.222cms. You should bite the bullet and say roughly 32cms. Conveying confidence beyond what the numbers tell you is a crime (or at least should be considered one). Finance and sports are the two fields where this gets done the most.

You might have seen something along these lines frequently

1. The best stock returns are seen from Thursday to Monday.

2. Left-arm bowlers have seen the most success in ODIs conducted since the 1990s.

3. Ricky Ponting really struggles against India as he has an average that is 6 runs than his overall average

**Why are these absurd?**If you analysed stock returns over three-day window, some three window would have the best returns. This does not mean that that three-day window has something special going for it. This means something else. Something very special. Something that every statistician worth his salt must have the courage to say on 90% of the times he attempts some statistical analysis. This means

*nothing.**Ditto the other two inferences.*

**The idea of statistical significance**So, are all statistical inferences absurd? Of course not. This is where the term statistical significance comes into the picture. If the observation is statistically significant, the flag must be raised. And only then must the flag be raised. So, how do we wind our heads around statistical significance. Statisticians have fancy terms for this. But let us see if we can have an intuitive approach around this. Let us have a go at this with an example.

Let us say we want to test whether Ricky Ponting underperformance against a particular team, say, India is statistically significant. Let us further say his average against India is less than his overall average by about 15% (this looks significant).

Now, let us not take his average against his nemesis, India and keep it as a benchmark metric. Now, let us revisit the original sample and extract a sub-sample from this randomly. If the benchmark metric is lower than that observed in the sub-sample, say, 90% of the time, then let us say that the underperformance is statistically significant.

Let us build on this with numbers. Let us say, Ricky Ponting has scores of {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} in 10 innings. Further let us say, he has played 2 matches each against 5 teams. His overall average is 5.5. Now, let us say he has an average against India that is more than 15% less than his overall average. Or, an average of 4.5 or lesser. Is this statistically significant?

If we extract two scores that have to average 4.5 or lesser, we can have {1, 2}, {1, 3} {1, 4}, {1, 5}, {1, 6}, {1, 7}, {1, 8}, {2, 3}, {2, 4}, {2, 5}, {2, 6}{2, 7}, {3, 4}, {3, 5}, {3, 6}, {4, 5} - 16 possibilities. Totally, there are 45 possibilities. So, there is a nearly 40% chance that some random sample of 2 out of this 10 will have an average that is 15% lesser than the original. So, this 15% below average number means nothing. It is not statistically significant.

In reality, a great many quoted inferences derived from numbers are not statistically significant. If you are handed any statistical inference on a platter, you have hajaar grounds to suspect it is false. And we have not even come to the idea of bias. There are many ways in which we can bias a sample, Some biases creep in, while some others are introduced. Let me give a few examples.

The 2015 world cup stats counter states rather gleefully that the Indian batting unit has one of the highest strike rates in power plays. Now, we need to remember that this is largely because of 70% of India's cricket is played on subcontinental wickets, where the par score is 330-ish. England might play 50% of its matches on English wickets, where the par score might be 260-ish. So, unless the powerplay strike rates are at least 30% apart, we have no business making any inferences. These are the biases that the samples naturally carry.

There are some other biases that data-presenters can bring in. The most beautiful and one that most fudge-statisticians introduce with an unbearable holier-than-thou approach is the selection bias. Let me deal with this with an example.

Let us say, there are two stocks Alpha and Beta that, as an analyst I want to suggest are correlated heavily. I will draw the stock charts for Alpha and Beta and compute correlation numbers. But here is where I will be smart. I will choose the end date to be today's date and the start date to be any date from 2010 to 2013. I will find the correlation numbers for all 1000 or so possibilities and pick the date from which the correlation is the highest. If it is a Friday afternoon and I want to be really intellectually dishonest before my weekend, I will 'float' my end date also.

For any two stocks, if you have large enough database, about 40 minutes of time on your hands, and a moral compass pointing towards "bonus" there is a 50% chance of finding one set of dates where the correlation is more than 90%. You can even wear your best "Why are you looking at me like that. This is what the numbers are telling us". If you want to be thorough, you should find some pseudo-intellectual justification for having picked the date range that you did indeed pick. In case you are wondering how I know this scam with this much clarity, you should look for Business Objects Cognos 91% correlation on some research database. (In my defence, I am not proud of this).

A good statistician is one who can look at a lot of data and tell us why they do not mean much; and then pick up one nugget that actually means something. The statistician who cannot say "this means nothing" should be kicked out of his job. Lot of stats that we see online and on Television are the ones that should be binned.

Brilliant one Rajesh :)

ReplyDelete