How do you evaluate the performance of a Neural Network? Focus on AzureML

I read the Microsoft blog entitled ‘How to evaluate model performance in Azure Machine Learning‘. It’s a nice piece of work, and it got me thinking. I didn’t see that the blog post contained anything about neural network evaluation, so this topic is covered here.

How do you evaluate the performance of a Neural Network? This blog focuses on Neural Networks in AzureML, in order to help you to understand what they mean.

What are Neural Networks?

Would you like to know how to make predictions from a dataset? Alternatively, would you like to find exceptions, or outliers, that you need to watch out for? Neural networks are used in business to answer the business questions. They are used to make predictions from a dataset, or to find unusual patterns. They are best used for regression or classification business problems.

What are the different types of Neural Networks?

I’m going to credit the Asimov Institute with this amazing diagram:

Neural Network Types

In AzureML, we can review the output from a neural network experiment that we created previously. We can see the results by clicking on the Evaluation Model task, and clicking on the Visualise option.

Once we click on Visualise, we can see a number of charts, which are described here:

  • Receiver Operating Curve
  • Precision / Recall
  • Lift visualization

The Receiver Operating Curve

Here is an example:

ROC Curve

In our example, we can see that the curve well up into the left hand corner for the ROC curve. When we look on the precision and recall curve, we can see that precision and recall are high figures, and this leads to a high F1 score. This means that the model is effective in terms of how precisely it classifies the data, and that it covers a good proportion of the cases that it should have classified correctly.

Precision and Recall

Precision and recall are very useful for assessing models in terms of business questions. They offer more detail and insights into the model’s performance. Here is an example:

Precision and Recall Precision can be described as the fraction of times that the model classifies the number of cases correctly. It can be considered as a measure of confirmation, and it indicates how often the model is correct. Recall is a measure of utility, which means that it identifies how much that the model finds of all that there is to find within the search space. Both scores combine to make the F1 score. The F1 score combines Precision and Recall. If either precision and recall are small, then the F1 score value will be small.

Lift Visualisation

Lift Chart visually represents the improvement that a model provides when compared against a random guess.This is called a lift score. With a lift chart, you can compare the accuracy of predictions for the models that have the same predictable attribute. Lift Visualisation

Summary

In my next blog, I’ll talk a little about how we can make the Neural Network perform better.

To summarise, we have examined various key metrics in evaluating a neural network in AzureML. These scores also apply to other technologies, such as R.

These criteria can help us to evaluate our models, which, in turn, can help us to fundamentally evaluate our business questions. Understanding the numbers helps to drive the business forward, and visualizing these numbers helps to convey the message of the numbers.

See you at Techorama?

Techorama

Why should you go to Techorama?

Techorama is a yearly international technology conference which takes place at Metropolis Antwerp. We welcome about 1500 attendees, a healthy mix between developers, IT Professionals, Data Professionals and SharePoint professionals.

I’m delighted to announce I’m speaking, and I’d like to take this opportunity to thank the Techorama team for all of their hard work and effort in putting on a great show.

Guthrie_print[1]

First off, there will be a keynote by Scott Guthrie, EVP of Cloud + Enterprise, Microsoft Corporation – now this is BIG NEWS.

Scott Guthrie, EVP of Cloud + Enterprise, Microsoft Corporation will be keynoting at Techorama 2017 (May 23). In his keynote, “Azure, The Intelligent Cloud”, Scott will open the event with a strategic vision on the Microsoft cloud.

Scott Guthrie will also give another breakout session on May 23 which will be a Q & A session. Come with your questions!

 

 

 

The event itself will be top-notch content for Developers, IT Professionals, Data Professionals and SharePoint Professionals: 11 parallel breakout sessions with top speakers from all over the world: experts in their field, offering meaningful networking opportunities with partners and like-minded people

There will also be a unique conference experience in a movie theatre with lots of surprises!

What will I be talking about? You can find out more here at my dedicated Techorama page.

Data Visualisation Lies and How to Spot them

During the acrimonious US election, both sides used a combination of cherry-picked polls and misleading data visualization to paint different pictures with data. In this session, we will use a range of Microsoft Power BI and SSRS technologies in order to examine how people can mislead with data and how to fix it. We will also look at best practices with data visualisation. We will examine the data with Microsoft SSRS and Power BI so that you can see the differences and similarities in these reporting tools when selecting your own Data Visualisation toolkit.

Whether you are a Trump supporter, a Clinton supporter or you don’t really care, join this session to spot data lies better in order to make up your own mind.

We hope to welcome you at Techorama 2017!

 

Simple explanation of a t-test and its meaning using Excel 2013

For those of us you say that stats are ‘dry’ – you are clearly being shown the wrong numbers! Statistics are everywhere and it’s well worth understanding what you’re talking about, when you talk about data, numbers or languages such as R or Python for data and number crunching. Statistics knowledge is extremely useful, and it is accessible when you put your brain to it!

So, for example, what does a pint of Guinness teaches us about statistics? In a visit to Ireland, Barack Obama declared that the Guinness tasted better in Ireland, and the Irish keep the ‘good stuff’ to themselves.
Can we use science to identify whether there is a difference between the enjoyment of a pint of Guinness consumed in Ireland, and pints consumed outside of Ireland?
A Journal of Food Science investigation detailed research where four investigators travelled around different countries in order to taste-test the Guinness poured inside and outside of Ireland. To do this, researchers compared the average score of pints poured inside of Ireland versus pints poured outside of Ireland.

How did they do the comparison? they used a t-test, which was devised by William Searly Gosset, who worked at the Guinness factory as a scientist, with the objective of using science to produce the perfect pint.
The t-test helps us to work out whether two sets of data are actually different.
It takes two sets of data, and calculates:

the count – the number of data points
the mean, also known as the average i.e. the sum total of the data, divided by the number of data points
The standard deviation – tells you roughly how far, on average, each number in your list of data varies from the average value of the list itself.

The t-test is more sophisticated test to tell us if those two means
of those two groups of data are different.

In the video, I go ahead and try it, using Excel formulas:

COUNT – count up the number of data points. The formula is simply COUNT, and we calculate this
AVERAGE – This is calculated using the Average Excel calculated formula.
STDEV – again, another Excel calculation, to tell us the standard deviation.
TTEST – the Excel calculation, which wants to know:

Array 1 – your first group of data
Array 2 – your second group of data
Tail – do you know if the mean of the second group will definitely by higher OR lower than the second group, and it’s only likely to go in that direction? If so, use 1. If you are not sure if it’s higher or lower, then use 2.
Type –
if your data points are related to one another in each group, use 1
if the data points in each group are unrelated to each other, and there is equal variances, use 2
if the data points in each group are unrelated to each other, and if you are unsure if there are equal variances, use 3

And that’s your result. But how do you know what it means? It produces a number, called p, which is simply the probability.

The t-test: simple way of establishing whether there are significant differences between two groups of data. It uses the null hypothesis: this is the devil’s advocate position, which says that there is no difference between the two groups It uses the sample size, the mean, and the standard deviation to produce a p value.
The lower the p value, the more likely that there is a difference in the two groups i.e. that something happened to make a difference in the two groups.

In social science, p is usually set to 5% i.e. only 1 time in 20, is the difference due to chance.
In the video, the first comparison does not have a difference ‘proven, but the second comparison does.
So next time you have a good pint of Guinness, raise your glass to statistics!

 


 

Love,
Jen

The Data Analysts Toolkit Day 3: Introduction to running and understanding commands in R

I have located a number of R Scripts over at my OneDrive account and you can head over there to download them. There are some other goodies over there too, so why not take a look?

how do you open files? To create a new file, you use the File -> New menu.
To open an existing file you use either the File -> Open menu.
Alternatively, you can use the Open Recent menu to select from recently opened files.
If you open several files within RStudio, you can see them as tabs to facilitate quick switching between open documents. 
If you have a large number of open documents, you can also navigate between them using the >> icon.
You could also use the tab bar to navigate between files, or the View -> Switch to Tab menu item.
Let’s open the Women file and take a look at the commands.
Comments in R are preceded with a hash symbol, so that is what we are using here.
# this loads the dataset
data(women)
# You can see what is in the dataset
women
# This allows you to see the column names
names(women)
# You can see the output of the height column here, in different ways
women$height
women[,1]
women[seq(1,2),]
women[1:5,1]
women[,2]
# we can start to have a little fun!
# we are going to tell R that we are going to build a model of the data
attach(women)
model
model
print(model)
mode(model)
predict(model, women, interval=”predict”)
newdata = data.frame(height=60)
newdata
predict(model, newdata, interval=”predict”)
women
r
summary(model)
Now we can see some strange output at the bottom of the page:
Call:
lm(formula = weight ~ height)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.7333 -1.1333 -0.3833  0.7417  3.1167 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
height        3.45000    0.09114   37.85 1.09e-14 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.525 on 13 degrees of freedom
Multiple R-squared:  0.991, Adjusted R-squared:  0.9903 
F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14
What does this actually mean? As part of my course, I normally discuss this in detail. For those of you reading online, here is the potted summary:

Basically, you want big F, small p, at the 30,000 feet helicopter level. We will start from there, and move down as we go.
The stars are shorthand for significance levels, with the number of asterisks 
displayed according to the p-value computed. 
*** for high significance and * for low significance. 
In this case, *** indicates that there is likely to be a relationship.
Pro Tip
Is the model significant or insignificant? This is the purpose of the F statistic.
Check the F statistic first because if it is not significant, then the model doesn’t matter.
In the next Day, we will look at more commands in R.