Thursday 28 February 2013

Top tip for today ...

Irwin Greenberg (http://clicks.robertgenn.com/irwin-greenberg.php) apparently said:

“Find the artists who are on your wavelength and continuously increase that list. Learn from the masters, learn from artists alive today whether it's someone you may never meet in person or it’s a close artist friend.  

Visit museums and galleries, buy books and magazines, take classes. 

Embrace the life of a student, no matter your age or ability, and you will become a better artist”.

[Courtesy:  Ray Frisken: artnewsflash@gmail.com]

What's it got to do with us?  Substitute 'analyst' for 'artist' and substitute 'Go to conferences and workshops' for 'Visit museums and galleries' :-)


Tuesday 26 February 2013

Does Excel™ really excel ?


A good friend of mine, when asked exactly what it is I do for a living, sometimes replies “he does sums”.

There is a fair amount of truth in this;  indeed much of the quantitative analysis that anyone does can involve doing lots of ‘sums’.

And to help us, we use various software tools such as SPSS, Q, R, Minitab, Stata, Excel,etc.

But occasionally we can notice something slightly weird, with an unexpected result, or a computation that doesn’t seem to lead to where it should.

Very often, this is simply a data cleaning issue (or, rather, a lack of data cleaning issue).  For the heavy quant people, there are well-known mantras such as “Step 1:  clean your data; Step 2:  clean your data again; Step 3:  repeat Steps 1 and 2”. 

Or: “95% of advanced analysis is getting the datafile into shape”. 

Or even, as lamented by Sherlock Holmes, in the Conan-Doyle story “The adventure of the Copper Beeches”:   ‘"Data! Data! Data!" he cried impatiently.  “I can’t make bricks without clay!”’

Another good quant analysis rubric is:  “If it looks unusual, it’s probably wrong.”

In the above vein, I recently came across a rather disturbing article(a), published only a couple of years ago, that deals with a claimed plethora of computational errors that are literally built into Excel™.

After conducting a large number of tests (admittedly, some of them using datasets that might be described as ‘slightly esoteric’), the authors nonetheless conclude that “…it is not safe to assume that Microsoft Excel’s statistical procedures give the correct answer.  Persons who wish to conduct statistical analyses should use some other package.”

I discussed this with a senior statistical consultant, who replied:

“This paper criticising Excel freaked me out when I first read it ...  However, over time, I have become less concerned.  I looked into some of the tests … a few general conclusions:

a)         It is disappointing that Microsoft doesn’t fix these things.

b)         The errors are at the margins.  That is, we are talking about inaccuracies that tend to occur when the techniques are unreliable anyway (e.g., severe multicollinearity). 

c)         There is more than a degree of unfairness in the critique.  For example, in the case of Solver, I have found it repeatedly to do a better job than the various optimisers in R.”

So, given all the above, and all other things being equal, it is probably best to take the bad news concerning Excel with a grain of statistical salt.  Nonetheless, it may sometimes be wise to use two alternative computational means when working with something really critical, just to be sure.


Tuesday 19 February 2013

What data can't (and can) do ...



I’m indebted to Julie Houston, from Nitty Gritty Research http://www.nittygritty.net.au/ for this very recent item (at least, very recent at the time of writing this post):


The subject of the article is “the strengths and limitations of data analysis”.

Bit of a dry topic, you might think?

But the really interesting bit (are?) is the 250+ detailed comments posted in response to the article.

Worth a slow read over a glass of red, I think :-)



Thursday 14 February 2013

Amazing statistical fact #2


In any gathering of people, how many must there be to be 50% sure that at least two people will share the same birthday?

Answer:  At least 27 people.

How many people at that gathering must there be to be virtually certain that two of them will share the same birthday?

Answer:  At least 57 people.

As people enter a room one at a time, which one is most likely to be the first to have the same birthday as someone already in the room?

Answer:  The 20th person to arrive.

What is the average number of people (selected at random) required to find two with the same birthday?

Answer:  On average, 25 people are required.

[Example: There have been 27 Prime Ministers of Australia.   Paul Keating, the 24th Prime Minister, and Edmund Barton, the first Prime Minister, share the same birthday, 18 January.]

Source:  All this and more can be found at: http://en.wikipedia.org/wiki/Birthday_problem


Monday 11 February 2013

A quick word ...


A few years ago, Wordles™ were a popular means of presenting lots of open-ended text.  Here’s an example developed using the text on my own website.


I still think Wordles can be pretty cool.

But now, along has come the new, even cooler, version known as Wordyup™. 

Developed by Garreth Chandler and his team at Twist of Lime www.twistoflime.com.au, Wordyup “ … turns the usual 1000's of open ended responses on a survey into real insights with dynamic key word analysis quickly, easily and what's more ... it's fun!”

And more to the point, as Garreth says, he “… can't stop playing with it...”

Have a look for yourselves: https://www.wordyup.com/ and let me know what you think.  Better still, let Garreth know what you think. 




Thursday 7 February 2013

Amazing statistical fact ...

2013 is the International Year of Statistics.

So, to recognise that, I thought you might be interested in the following:


Suppose there is a medical test that is designed to detect whether you have an illness/infection/whatever.

Suppose the chance of anyone actually having that illness/infection/whatever is 5%.

Suppose that if you do have that illness, then the chance of that test detecting that you have it is 95%.

That sounds pretty good, doesn’t it?

Suppose the chance that the same test will indicate you have that illness, if you actually don’t, is just 5%.

That sounds pretty good too.

Fairly straightforward statistical analysis will show, irrefutably, that if that apparently reliable test indicates you have that illness, the chance that you actually do have it is only 50%.

Scary.  But it’s true.

Suppose the chance of anyone having that illness is actually much lower, say 1%.

Then if the test indicates you have that illness, the chance that you actually do have it is only 16% !

I learned about the above from Kerry Mengersen, whose course “Bayes for Beginners” I undertook back in 2006:  http://www.statsoc.org.au/CPD16


Monday 4 February 2013

Statistical goldmine !


Some years ago, it was possible to download (for free) a comprensive statistical text from www.statsoft.com .  It was great, just sat on my desktop until I needed it.

These days, you can get the same thing as an online resource, again for free http://www.statsoft.com/textbook/ .

I’ve just been made aware of something that is arguably as good, if not better http://surveyanalysis.org .

Whilst still under development, it already contains a massive amount of information, of interest to anyone who works in the advanced analytics area.

I recommend it strongly, and I can assure you that it is written by one of the smartest people I know.





Sunday 3 February 2013

Round the twist

Weirdly, I have found this post http://alandgraf.blogspot.com.au/2012/06/rounding-in-r.html to be fascinating. Maybe I need to get out more?

That blog post deals with how to round up (or down) when you have a number that isn’t a whole number.

For example, if you have an observation or data point that is equal to 4.5 and you want to use only whole numbers in your analysis, should you round down to 4 or up to 5?

It appears that there is no hard and fast rule for doing this; some argue for down and some for up. Similarly, some software rounds down in this instance and some up.

There is actually an international standard that applies; ISO/IEC/IEEE 60559:2011 which is identical to the IEEE Standard for Floating-Point Arithmetic (IEEE 754) established in 1985. [ISO/IEC/IEEE 60559:2011 covers a zillion other aspects of numerical computing and took seven years to produce.]

In relation to the above rounding issue, ISO/IEC/IEEE 60559:2011 apparently says, in effect:

“… round numbers ending in "1, 2, 3, and 4" down, and numbers that end in "6, 7, 8, 9" up. Then, specifically regarding "5", if the preceding digit is odd, round up and if the preceding digit is even, round down.”

The advantage of this is that 50% of the numbers will be rounded up, and 50% rounded down, instead of rounding up 5/9th's of the time, and so introducing a bias.

As one statistician (and a much better one than I am) I asked about this confirmed “…the clever thing about rounding to evens is that the average is not biased when this is done.”


Friday 1 February 2013

I don't want your help, I just need your advice ...

Some of you will know of the hilarious late 80’s spoken piece by Fred Dagg (aka John Clarke), in which he ‘translated’ real-estate agent speak.  Here’s a short extract:

‘Owner transferred - reluctantly instructs us to sell’ means that the house is for sale.
‘Genuine reason for selling’ means that the house is for sale.
‘Rarely can we offer …’ means that the house is for sale.
‘Superbly presented delightful charmer’ doesn’t mean anything really, but it’s probably still for sale.
‘Most attractive immaculate home of character in prime dress-circle position’ means that the thing that’s for sale is a house.
There is lots more from Dagg/Clarke, of the same ilk.
My experience is that there are almost direct parallels in our own businesses, e.g.
‘I just need to pick your brain’ means that I want you to give/tell me something for free, on the basis of your extensive experience and knowledge that you have spent many, many years acquiring.
‘I have an interesting challenge for you’  means that I want you to give/tell me something for free etc.
‘I wonder if you could come in for a meeting and help us sort out what we need to do’ means that I want you to take 2 or 3 hours out of your day and give/tell me something for free etc.
‘Quick question’ means that I am going to ask you something complicated that you will need to think about for a while, and I want you to do it for free.
‘Exciting new project’ means that I need your input for our proposal, and I don’t want to pay for it.
And probably the best of all (and I am not making this up) ….
‘I don't want your help, I just need your advice’ !!
Now, before you write me off as just another grumpy researcher (which isn’t actually too far from truth), please be assured that I almost invariably do respond positively to requests like the above.  I estimate that it takes me up to around an hour per week.  That’s around 50 hours per year, or a week-and-a-half per year that I could otherwise spend on project work/going to the beach/walking in the park/<insert your own words here>.
And I am the first to admit that I can myself be guilty of exactly the same sin, that is, phoning up a contact and asking for some free advice/input … but maybe that’s the payback: I help others, and someone else helps me?
It’s a fine line … and I know this general area has also been a hot topic amongst the IRG’ers (AMSRS’s Independent Research Group) in the recent past, admittedly more in the context of “We’ve commissioned you to do X, and we now want to add Y to the project, but we don’t want to pay any more, because there will be more work for you down the track”.
So where does one draw the line.  Indeed, should one draw the line?  Or is it simply a case of  ‘you scratch my back, and I’ll scratch yours’?