In the Wizard of Oz, Toto pulls back the green curtain to expose that the Wizard of Oz is a fraud. We can peep behind the ‘green curtain’ of the data visualisation to learn how to ‘poke holes’ in the data that you are given, both in business and in everyday news headlines.
In order to explode the myths in the data that surrounds us every day, it is a little known secret that there are hidden patterns in the data chaos that surrounds us. Deviations from these patterns highlight invention, bias, anomalies and even deliberate fraud.
You can use both R and Power BI data visualisation combined with timeless data analysis and patterns such as Benford’s Law to reveal or conceal efforts to distort the numbers, and question the veracity of the data.
You’ll need courage, heart and wisdom to analyse data, since truthful data doesn’t necessarily give easy answers! My slides are here, but they are pretty bare because I don’t read off slides, but I talk around them. So here is some context.
What is Benford’s Law? Here is the Wikipedia definition: Benford’s law, also called the first-digit law, states that in lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way. According to this law, the first digit is 1 about 30% of the time, and larger digits occur as the leading digit with lower and lower frequency, to the point where 9 as a first digit occurs less than 5% of the time.
To detect manipulations or fraud in accounting data, people such as Mark Nigrini have successfully used Benford’s law as part of their fraud detection processes. It’s not foolproof however: Bernard Madoff filed accounts that were consistent with Benford’s Law, for example, It’s a start, and like other stats, we are dealing with probabilities rather than certainties.
It has come into light with respect to the European Union because countries need to meet the Stability and Growth Pact criteria before they can join the European Union. Therefore, countries are ‘incentivized’ to make sure that they look stable. The data regarding Greece looks strange indeed, particularly just before they joined the Euro. You can see the FT article here.
Academic evidence has found that Greece, for example, shows the greatest deviation from Benford’s Law among all the European states. Again, probabilities rather than certainties. As always, the debate rages on.
How can you use R and Power BI to look at this? Well, first of all you need a data set. If you are interested in playing with economic data, you can download it here.
R comes with a handy Benford’s Law packages for you to try as a demo:
You can visualise the data in R, of course. Alternatively, you could take the data as a csv dump, and load your data into PowerPIvot.
Once the data is in PowerPivot, you simply need the leftmost number. In my example, I took the Deficit Surplus data from Eurostat, and renamed the surplus column to Deficit Surplus. Then, we get the left hand digit. All this is straightforward enough:
=Left(ABS(‘Deficit Surplus'[Deficit Surplus]), 1)
Now, all you need to do is make this data into a Pivot table in Excel.
You can then graph this data quite simply:
Here is the same chart in Power View:
As a next step, next time we will look at this data for different countries and years. I hope you’ll enjoy this journey with me. In the meantime, my slides are below.